L. Mehl, A. Jahedi, J. Schmalfuss, and A. Bruhn, “M-FUSE: Multi-frame Fusion for Scene Flow Estimation,” in
Proc. Winter Conference on Applications of Computer Vision (WACV), in Proc. Winter Conference on Applications of Computer Vision (WACV). Jan. 2023. doi:
10.48550/arXiv.2207.05704.
Abstract
Recently, neural network for scene flow estimation show impressive results on automotive data such as the KITTI benchmark. However, despite of using sophisticated rigidity assumptions and parametrizations, such networks are typically limited to only two frame pairs which does not allow them to exploit temporal information. In our paper we address this shortcoming by proposing a novel multi-frame approach that considers an additional preceding stereo pair. To this end, we proceed in two steps: Firstly, building upon the recent RAFT-3D approach, we develop an improved two-frame baseline by incorporating an advanced stereo method. Secondly, and even more importantly, exploiting the specific modeling concepts of RAFT-3D, we propose a U-Net architecture that performs a fusion of forward and backward flow estimates and hence allows to integrate temporal information on demand. Experiments on the KITTI benchmark do not only show that the advantages of the improved baseline and the temporal fusion approach complement each other, they also demonstrate that the computed scene flow is highly accurate. More precisely, our approach ranks second overall and first for the even more challenging foreground objects, in total outperforming the original RAFT-3D method by more than 16%. Code is available at https://github.com/cv-stuttgart/M-FUSE.BibTeX
J. Schmalfuss, E. Scheurer, H. Zhao, N. Karantzas, A. Bruhn, and D. Labate, “Blind image inpainting with sparse directional filter dictionaries for lightweight CNNs,”
Journal of Mathematical Imaging and Vision (JMIV), vol. 65, pp. 323--339, 2023, doi:
10.1007/s10851-022-01119-6.
Abstract
Blind inpainting algorithms based on deep learning architectures have shown a remarkable performance in recent years, typically outperforming model-based methods both in terms of image quality and run time. However, neural network strategies typically lack a theoretical explanation, which contrasts with the well-understood theory underlying model-based methods. In this work, we leverage the advantages of both approaches by integrating theoretically founded concepts from transform domain methods and sparse approximations into a CNN-based approach for blind image inpainting. To this end, we present a novel strategy to learn convolutional kernels that applies a specifically designed filter dictionary whose elements are linearly combined with trainable weights. Numerical experiments demonstrate the competitiveness of this approach. Our results show not only an improved inpainting quality compared to conventional CNNs but also significantly faster network convergence within a lightweight network design. Our code is available at https://github.com/cv-stuttgart/SDPF_Blind-Inpainting.BibTeX
T. Krake, A. Bruhn, B. Eberhardt, and D. Weiskopf, “Efficient and Robust Background Modeling with Dynamic Mode Decomposition,”
Journal of Mathematical Imaging and Vision (2022), 2022, doi:
10.1007/s10851-022-01068-0.
Abstract
A large number of modern video background modeling algorithms deal with computational costly minimization problems that often need parameter adjustments. While in most cases spatial and temporal constraints are added artificially to the minimization process, our approach is to exploit Dynamic Mode Decomposition (DMD), a spectral decomposition technique that naturally extracts spatio-temporal patterns from data. Applied to video data, DMD can compute background models. However, the original DMD algorithm for background modeling is neither efficient nor robust. In this paper, we present an equivalent reformulation with constraints leading to a more suitable decomposition into fore- and background. Due to the reformulation, which uses sparse and low-dimensional structures, an efficient and robust algorithm is derived that computes accurate background models. Moreover, we show how our approach can be extended to RGB data, data with periodic parts, and streaming data enabling a versatile use.BibTeX
M. Philipp, N. Bacher, S. Sauer, F. Mathis-Ullrich, and A. Bruhn, “From Chairs To Brains: Customizing Optical Flow For Surgical Activity Localization,” in
Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), in Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, Mar. 2022, pp. 1–5. doi:
10.1109/ISBI52829.2022.9761704.
Abstract
Recent approaches for surgical activity localization rely on motion features derived from the optical flow (OF). However, although they consider state-of-the-art CNNs when computing the OF, they typically resort to pre-trained implementations which are domain-unaware. We address this problem in two ways: (i) Using the pre-trained OF-CNN of recent localization approach, we analyze the impact of video properties such as reflections, motion and blur on the quality of the OF from neurosurgical data. (ii) Based on this analysis, we design a specifically tailored synthetic training dataset which allows us to customize the pre-trained OF-CNN for surgical activity localization. Our evaluation clearly shows the benefit of this customization approach. It not only leads to an improved accuracy of the OF itself but, even more importantly, also to an improved performance for the actual localization task.BibTeX
J. Schmalfuss, L. Mehl, and A. Bruhn, “Attacking Motion Estimation with Adversarial Snow,” in
Proc. ECCV Workshop on Adversarial Robustness in the Real World (AROW), in Proc. ECCV Workshop on Adversarial Robustness in the Real World (AROW). 2022. doi:
10.48550/arXiv.2210.11242.
Abstract
Current adversarial attacks for motion estimation (optical flow) optimize small per-pixel perturbations, which are unlikely to appear in the real world. In contrast, we exploit a real-world weather phenomenon for a novel attack with adversarially optimized snow. At the core of our attack is a differentiable renderer that consistently integrates photorealistic snowflakes with realistic motion into the 3D scene. Through optimization we obtain adversarial snow that significantly impacts the optical flow while being indistinguishable from ordinary snow. Surprisingly, the impact of our novel attack is largest on methods that previously showed a high robustness to small L_p perturbations.BibTeX
A. Jahedi, L. Mehl, M. Rivinius, and A. Bruhn, “Multi-Scale RAFT: combining hierarchical concepts for learning-based optical flow estimation,”
Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 1236–1240, Oct. 2022, doi:
10.1109/ICIP46576.2022.9898048.
Abstract
Many classical and learning-based optical flow methods rely on hierarchical concepts to improve both accuracy and robustness. However, one of the currently most successful approaches -- RAFT -- hardly exploits such concepts. In this work, we show that multi-scale ideas are still valuable. More precisely, using RAFT as a baseline, we propose a novel multi-scale neural network that combines several hierarchical concepts within a single estimation framework. These concepts include (i) a partially shared coarse-to-fine architecture, (ii) multi-scale features, (iii) a hierarchical cost volume and (iv) a multi-scale multi-iteration loss. Experiments on MPI Sintel and KITTI clearly demonstrate the benefits of our approach. They show not only substantial improvements compared to RAFT, but also state-of-the-art results -- in particular in non-occluded regions.BibTeX
J. Schmalfuss, P. Scholze, and A. Bruhn, “A Perturbation-Constrained Adversarial Attack for Evaluating the Robustness of Optical Flow,” Proceedings of the European Conference on Computer Vision (ECCV), Oct. 2022.
Abstract
Recent optical flow methods are almost exclusively judged in terms of accuracy, while their robustness is often neglected. Although adversarial attacks offer a useful tool to perform such an analysis, current attacks on optical flow methods focus on real-world attacking scenarios rather than a worst case robustness assessment. Hence, in this work, we propose a novel adversarial attack - the Perturbation-Constrained Flow Attack (PCFA) - that emphasizes destructivity over applicability as a real-world attack. PCFA is a global attack that optimizes adversarial perturbations to shift the predicted flow towards a specified target flow, while keeping the L2 norm of the perturbation below a chosen bound. Our experiments demonstrate PCFA's applicability in white- and black-box settings, and show it finds stronger adversarial samples than previous attacks. Based on these strong samples, we provide the first joint ranking of optical flow methods considering both prediction quality and adversarial robustness, which reveals state-of-the-art methods to be particularly vulnerable.BibTeX
L. Mehl, C. Beschle, A. Barth, and A. Bruhn, “An Anisotropic Selection Scheme for Variational Optical Flow Methods with Order-Adaptive Regularisation,” in
Proceedings of the International Conference on Scale Space and Variational Methods in Computer Vision (SSVM), in Proceedings of the International Conference on Scale Space and Variational Methods in Computer Vision (SSVM). Springer, 2021, pp. 140--152. doi:
10.1007/978-3-030-75549-2_12.
Abstract
Approaches based on order-adaptive regularisation belong to the most accurate variational methods for computing the optical flow. By locally deciding between first- and second-order regularisation, they are applicable to scenes with both fronto-parallel and ego-motion. So far, however, existing order-adaptive methods have a decisive drawback. While the involved first- and second-order smoothness terms already make use of anisotropic concepts, the underlying selection process itself is still isotropic in that sense that it locally chooses the same regularisation order for all directions. In our paper, we address this shortcoming. We propose a generalised order-adaptive approach that allows to select the local regularisation order for each direction individually. To this end, we split the order-adaptive regularisation across and along the locally dominant direction and perform an energy competition for each direction separately. This in turn offers another advantage. Since the parameters can be chosen differently for both directions, the approach allows for a better adaption to the underlying scene. Experiments for MPI Sintel and KITTI 2015 demonstrate the usefulness of our approach. They not only show improvements compared to an isotropic selection scheme. They also make explicit that our approach is able to improve the results from state-of-the-art learning-based approaches, if applied as a final refinement step – thereby achieving top results in both benchmarks.BibTeX
H. Men, V. Hosu, H. Lin, A. Bruhn, and D. Saupe, “Subjective annotation for a frame interpolation benchmark using artefact amplification,”
Quality and User Experience, vol. 5, no. 1, Art. no. 1, 2020, doi:
10.1007/s41233-020-00037-y.
Abstract
Current benchmarks for optical flow algorithms evaluate the estimation either directly by comparing the predicted flow fields with the ground truth or indirectly by using the predicted flow fields for frame interpolation and then comparing the interpolated frames with the actual frames. In the latter case, objective quality measures such as the mean squared error are typically employed. However, it is well known that for image quality assessment, the actual quality experienced by the user cannot be fully deduced from such simple measures. Hence, we conducted a subjective quality assessment crowdscouring study for the interpolated frames provided by one of the optical flow benchmarks, the Middlebury benchmark. It contains interpolated frames from 155 methods applied to each of 8 contents. For this purpose, we collected forced-choice paired comparisons between interpolated images and corresponding ground truth. To increase the sensitivity of observers when judging minute difference in paired comparisons we introduced a new method to the field of full-reference quality assessment, called artefact amplification. From the crowdsourcing data (3720 comparisons of 20 votes each) we reconstructed absolute quality scale values according to Thurstone’s model. As a result, we obtained a re-ranking of the 155 participating algorithms w.r.t. the visual quality of the interpolated frames. This re-ranking not only shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assessment (IQA) methods dedicated to perceptual quality of interpolated images. As a first step, we proposed such a new full-reference method, called WAE-IQA, which weights the local differences between an interpolated image and its ground truth.BibTeX
K. Kurzhals
et al., “Visual Analytics and Annotation of Pervasive Eye Tracking Video,” in
Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA), in Proceedings of the Symposium on Eye Tracking Research & Applications (ETRA). Stuttgart, Germany: ACM, 2020, pp. 16:1-16:9. doi:
10.1145/3379155.3391326.
Abstract
We propose a new technique for visual analytics and annotation of long-term pervasive eye tracking data for which a combined analysis of gaze and egocentric video is necessary. Our approach enables two important tasks for such data for hour-long videos from individual participants: (1) efficient annotation and (2) direct interpretation of the results. Exemplary time spans can be selected by the user and are then used as a query that initiates a fuzzy search of similar time spans based on gaze and video features. In an iterative refinement loop, the query interface then provides suggestions for the importance of individual features to improve the search results. A multi-layered timeline visualization shows an overview of annotated time spans. We demonstrate the efficiency of our approach for analyzing activities in about seven hours of video in a case study and discuss feedback on our approach from novices and experts performing the annotation task.BibTeX
H. Men, V. Hosu, H. Lin, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Interpolated Slow-Motion Videos Based on a Novel Database,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). 2020, pp. 1–6. doi:
10.1109/QoMEX48832.2020.9123096.
Abstract
Professional video editing tools can generate slow-motion video by interpolating frames from video recorded at astandard frame rate. Thereby the perceptual quality of such in-terpolated slow-motion videos strongly depends on the underlyinginterpolation techniques. We built a novel benchmark databasethat is specifically tailored for interpolated slow-motion videos(KoSMo-1k). It consists of 1,350 interpolated video sequences,from 30 different content sources, along with their subjectivequality ratings from up to ten subjective comparisons per videopair. Moreover, we evaluated the performance of twelve exist-ing full-reference (FR) image/video quality assessment (I/VQA)methods on the benchmark. In this way, we are able to show thatspecifically tailored quality assessment methods for interpolatedslow-motion videos are needed, since the evaluated methods –despite their good performance on real-time video databases – donot give satisfying results when it comes to frame interpolation.BibTeX
H. Men, H. Lin, V. Hosu, D. Maurer, A. Bruhn, and D. Saupe, “Visual Quality Assessment for Motion Compensated Frame Interpolation,” in
Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX), in Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2019, pp. 1–6. doi:
10.1109/QoMEX.2019.8743221.
Abstract
Current benchmarks for optical flow algorithms evaluate the estimation quality by comparing their predicted flow field with the ground truth, and additionally may compare interpolated frames, based on these predictions, with the correct frames from the actual image sequences. For the latter comparisons, objective measures such as mean square errors are applied. However, for applications like image interpolation, the expected user's quality of experience cannot be fully deduced from such simple quality measures. Therefore, we conducted a subjective quality assessment study by crowdsourcing for the interpolated images provided in one of the optical flow benchmarks, the Middlebury benchmark. We used paired comparisons with forced choice and reconstructed absolute quality scale values according to Thurstone's model using the classical least squares method. The results give rise to a re-ranking of 141 participating algorithms w.r.t. visual quality of interpolated frames mostly based on optical flow estimation. Our re-ranking result shows the necessity of visual quality assessment as another evaluation metric for optical flow and frame interpolation benchmarks.BibTeX
D. Maurer and A. Bruhn, “ProFlow: Learning to Predict Optical Flow,” in
Proceedings of the British Machine Vision Conference (BMVC), in Proceedings of the British Machine Vision Conference (BMVC), vol. 86:1-86:13. BMVA Press, 2018. doi:
arXiv:1806.00800.
Abstract
Temporal coherence is a valuable source of information in the context of optical flow estimation. However, finding a suitable motion model to leverage this information is a non-trivial task. In this paper we propose an unsupervised online learning approach based on a convolutional neural network (CNN) that estimates such a motion model individually for each frame. By relating forward and backward motion these learned models not only allow to infer valuable motion information based on the backward flow, they also help to improve the performance at occlusions, where a reliable prediction is particularly useful. Moreover, our learned models are spatially variant and hence allow to estimate non-rigid motion per construction. This, in turns, allows to overcome the major limitation of recent rigidity-based approaches that seek to improve the estimation by incorporating additional stereo/SfM constraints. Experiments demonstrate the usefulness of our new approach. They not only show a consistent improvement of up to 27% for all major benchmarks (KITTI 2012, KITTI 2015, MPI Sintel) compared to a baseline without prediction, they also show top results for the MPI Sintel benchmark -- the one of the three benchmarks that contains the largest amount of non-rigid motion.BibTeX
D. Maurer, Y. C. Ju, M. Breuß, and A. Bruhn, “Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo,”
International Journal of Computer Vision, vol. 126, no. 12, Art. no. 12, 2018, doi:
10.1007/s11263-018-1079-1.
Abstract
Shape from shading (SfS) and stereo are two fundamentally different strategies for image-based 3-D reconstruction. While approaches for SfS infer the depth solely from pixel intensities, methods for stereo are based on a matching process that establishes correspondences across images. This difference in approaching the reconstruction problem yields complementary advantages that are worthwhile being combined. So far, however, most “joint” approaches are based on an initial stereo mesh that is subsequently refined using shading information. In this paper we follow a completely different approach. We propose a joint variational method that combines both cues within a single minimisation framework. To this end, we fuse a Lambertian SfS approach with a robust stereo model and supplement the resulting energy functional with a detail-preserving anisotropic second-order smoothness term. Moreover, we extend the resulting model in such a way that it jointly estimates depth, albedo and illumination. This in turn makes the approach applicable to objects with non-uniform albedo as well as to scenes with unknown illumination. Experiments for synthetic and real-world images demonstrate the benefits of our combined approach: They not only show that our method is capable of generating very detailed reconstructions, but also that joint approaches are feasible in practice.BibTeX
D. Maurer, N. Marniok, B. Goldluecke, and A. Bruhn, “Structure-from-motion-aware PatchMatch for Adaptive Optical Flow Estimation,” in
Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., in Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol. 11212. Springer International Publishing, 2018, pp. 575–592. doi:
10.1007/978-3-030-01237-3_35.
Abstract
Many recent energy-based methods for optical flow estimation rely on a good initialization that is typically provided by some kind of feature matching. So far, however, these initial matching approaches are rather general: They do not incorporate any additional information that could help to improve the accuracy or the robustness of the estimation. In particular, they do not exploit potential cues on the camera poses and the thereby induced rigid motion of the scene. In the present paper, we tackle this problem. To this end, we propose a novel structure-from-motion-aware PatchMatch approach that, in contrast to existing matching techniques, combines two hierarchical feature matching methods: a recent two-frame PatchMatch approach for optical flow estimation (general motion) and a specifically tailored three-frame PatchMatch approach for rigid scene reconstruction (SfM). While the motion PatchMatch serves as baseline with good accuracy, the SfM counterpart takes over at occlusions and other regions with insufficient information. Experiments with our novel SfM-aware PatchMatch approach demonstrate its usefulness. They not only show excellent results for all major benchmarks (KITTI 2012/2015, MPI Sintel), but also improvements up to 50% compared to a PatchMatch approach without structure information.BibTeX
D. Maurer, M. Stoll, and A. Bruhn, “Directional Priors for Multi-Frame Optical Flow,” in
Proceedings of the British Machine Vision Conference (BMVC), in Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, 2018, pp. 106:1-106:13. [Online]. Available:
http://bmvc2018.org/contents/papers/0377.pdfAbstract
Pipeline approaches that interpolate and refine an initial set of point correspondenceshave recently shown a good performance in the field of optical flow estimation. However,so far, these methods are typically restricted to two frames which makes exploiting tem-poral information difficult. In this paper, we show how such pipeline approaches can beextended to the temporal domain and how directional constraints can be incorporated tofurther improve the estimation. To this end, we not only suggest to exploit temporal infor-mation in the prefiltering step, we also propose a trajectorial refinement method that liftssuccessful concepts of recent variational two-frame methods to the multi-frame domain.Experiments demonstrate the usefulness of our pipeline approach. They do not only showgood results in general, they also demonstrate the clear benefits of using multiple framesand of imposing directional constraints on the prefiltering step and the refinement.BibTeX
D. Maurer, M. Stoll, S. Volz, P. Gairing, and A. Bruhn, “A Comparison of Isotropic and Anisotropic Second Order Regularisers for Optical Flow,” in
Scale Space and Variational Methods in Computer Vision. SSVM 2017. Lecture Notes in Computer Science, F. Lauze, Y. Dong, and A. B. Dahl, Eds., in Scale Space and Variational Methods in Computer Vision. SSVM 2017. Lecture Notes in Computer Science, vol. 10302. Springer International Publishing, 2017, pp. 537–549. doi:
10.1007/978-3-319-58771-4_43.
BibTeX
D. Maurer, A. Bruhn, and M. Stoll, “Order-adaptive and Illumination-aware Variational Optical Flow Refinement,” in
Proceedings of the British Machine Vision Conference (BMVC), in Proceedings of the British Machine Vision Conference (BMVC). BMVA Press, 2017, pp. 150:1-150:13. doi:
10.5244/C.31.150.
Abstract
Variational approaches form an inherent part of most state-of-the-art pipeline approaches for optical flow computation. As the final step of the pipeline, the aim is to refine an initial flow field typically obtained by inpainting non-dense matches in order to provide highly accurate results. In this paper, we take advantage of recent improvements in variational optical flow estimation to construct an advanced variational model for this final refinement step. By combining an illumination aware data term with an order adaptive smoothness term, we obtain a highly flexible model that is able to cope well with a broad variety of different scenarios. Moreover, we propose the use of an additional reduced coarse-to-fine scheme instead of an exclusive initialisation scheme, which not only allows to refine the initialisation but also allows to correct larger erroneous displacements. Experiments on recent optical flow benchmarks show the advantages of the advanced variational refinement and the reduced coarse to fine scheme.BibTeX
D. Maurer, M. Stoll, and A. Bruhn, “Order-adaptive Regularisation for Variational Optical Flow: Global, Local and in Between.,” in
Scale Space and Variational Methods in Computer Vision. SSVM 2017. Lecture Notes in Computer Science, F. Lauze, Y. Dong, and A. B. Dahl, Eds., in Scale Space and Variational Methods in Computer Vision. SSVM 2017. Lecture Notes in Computer Science, vol. 10302. Springer International Publishing, 2017, pp. 550–562. doi:
10.1007/978-3-319-58771-4_44.
Abstract
Recent approaches for variational motion estimation typically either rely on first or second order regularisation strategies. While first order strategies are more appropriate for scenes with fronto-parallel motion, second order constraints are superior if it comes to the estimation of affine flow fields. Since using the wrong regularisation order may lead to a significant deterioration of the results, it is surprising that there has not been much effort in the literature so far to determine this order automatically. In our work, we address the aforementioned problem in two ways. (i) First, we discuss two anisotropic smoothness terms of first and second order, respectively, that share important structural properties and that are thus particularly suited for being combined within an order-adaptive variational framework. (ii) Secondly, based on these two smoothness terms, we develop four different variational methods and with it four different strategies for adaptively selecting the regularisation order: a global and a local strategy based on half-quadratic regularisation, a non-local approach that relies on neighbourhood information, and a region based method using level sets. Experiments on recent benchmarks show the benefits of each of the strategies. Moreover, they demonstrate that adaptively combining different regularisation orders not only allows to outperform single-order strategies but also to obtain advantages beyond the ones of a frame-wise selection.BibTeX
K. Kurzhals, M. Stoll, A. Bruhn, and D. Weiskopf, “FlowBrush: Optical Flow Art,” in
Symposium on Computational Aesthetics, Sketch-Based Interfaces and Modeling, and Non-Photorealistic Animation and Rendering (EXPRESSIVE, co-located with SIGGRAPH)., in Symposium on Computational Aesthetics, Sketch-Based Interfaces and Modeling, and Non-Photorealistic Animation and Rendering (EXPRESSIVE, co-located with SIGGRAPH). 2017, pp. 1:1-1:9. doi:
10.1145/3092912.3092914.
Abstract
The depiction of motion in static representations has a long tradition in art and science alike. Often, motion is depicted by spatio-temporal summarizations that try to preserve as much information of the original dynamic content as possible. In our approach to depicting motion, we remove the spatial constraints and generate new content steered by the temporal changes in motion. Applying particle steering in combination with the dynamic color palette of the video content, we can create a wide range of different image styles. With recorded videos, or by live interaction with a webcam, one can influence the resulting image. We provide a set of intuitive parameters to affect the style of the result, the final image content depends on the video input. Based on a collection of results gathered from test users, we discuss example styles that can be achieved with FlowBrush. In general, our approach provides an open sandbox for creative people to generate aesthetic images from any video content they apply.BibTeX
M. Stoll, D. Maurer, and A. Bruhn, “Variational Large Displacement Optical Flow Without Feature Matches.,” in
Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2017. Lecture Notes in Computer Science, M. Pelillo and E. R. Hancock, Eds., in Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2017. Lecture Notes in Computer Science, vol. 10746. Springer International Publishing, 2017, pp. 79–92. doi:
10.1007/978-3-319-78199-0_6.
Abstract
The optical flow within a scene can be an arbitrarily complex composition of motion patterns that typically differ regarding their scale. Hence, using a single algorithm with a single set of parameters is often not sufficient to capture the variety of these motion patterns. In particular, the estimation of large displacements of small objects poses a problem. In order to cope with this problem, many recent methods estimate the optical flow by a fusion of flow candidates obtained either from different algorithms or from the same algorithm using different parameters. This, however, typically results in a pipeline of methods for estimating and fusing the candidate flows, each requiring an individual model with a dedicated solution strategy. In this paper, we investigate what results can be achieved with a pure variational approach based on a standard coarse-to-fine optimization. To this end, we propose a novel variational method for the simultaneous estimation and fusion of flow candidates. By jointly using multiple smoothness weights within a single energy functional, we are able to capture different motion patterns and hence to estimate large displacements even without additional feature matches. In the same functional, an intrinsic model-based fusion allows to integrate all these candidates into a single flow field, combining sufficiently smooth overall motion with locally large displacements. Experiments on large displacement sequences and the Sintel benchmark demonstrate the feasibility of our approach and show improved results compared to a single-smoothness baseline method.BibTeX
M. Stoll, D. Maurer, S. Volz, and A. Bruhn, “Illumination-aware Large Displacement Optical Flow,” in
Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2017. Lecture Notes in Computer Science, M. Pelillo and E. R. Hancock, Eds., in Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2017. Lecture Notes in Computer Science, vol. 10746. Springer International Publishing, 2017, pp. 139–154. doi:
10.1007/978-3-319-78199-0_10.
Abstract
The integration of feature matches for handling large displacements is one of the key concepts of recent variational optical flow methods. In this context, many existing approaches rely on confidence measures to identify locations where a poor initial match can potentially be improved by adaptively integrating flow proposals. One very intuitive confidence measure to identify such locations is the matching cost of the data term. Problems arise, however, in the presence of illumination changes, since brightness constancy does not hold and invariant constancy assumptions typically discard too much information for an identification of poor matches. In this paper, we suggest a pipeline approach that addresses the aforementioned problem in two ways. First, we propose a novel confidence measure based on the illumination-compensated brightness constancy assumption. By estimating illumination changes from a pre-computed flow this measure allows us to reliably identify poor matches even in the presence of varying illumination. Secondly, in contrast to many existing pipeline approaches, we propose to integrate only feature matches that have been obtained from dense variational methods. This in turn not only provides robust matches due to the inherent regularization, it also demonstrates that in many cases sparse descriptor matches are not needed for large displacement optical flow. Experiments on the Sintel benchmark and on common large displacement sequences demonstrate the benefits of our strategy. They show a clear improvement over the baseline method and a comparable performance as similar methods from the literature based on sparse feature matches.BibTeX