F. Petersen, B. Goldluecke, C. Borgelt, and O. Deussen, “GenDR: A Generalized Differentiable Renderer,” in
Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 2022, pp. 3992–4001. doi:
10.1109/CVPR52688.2022.00397.
Abstract
In this work, we present and study a generalized family of differentiable renderers. We discuss from scratch which components are necessary for differentiable rendering and formalize the requirements for each component. We instantiate our general differentiable renderer, which generalizes existing differentiable renderers like SoftRas and DIB-R, with an array of different smoothing distributions to cover a large spectrum of reasonable settings. We evaluate an array of differentiable renderer instantiations on the popular ShapeNet 3D reconstruction benchmark and analyze the implications of our results. Surprisingly, the simple uniform distribution yields the best overall results when averaged over 13 classes; in general, however, the optimal choice of distribution heavily depends on the task.BibTeX
F. Petersen, B. Goldluecke, O. Deussen, and H. Kuehne, “Style Agnostic 3D Reconstruction via Adversarial Style Transfer,” in
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, Jan. 2022, pp. 2273–2282. doi:
10.1109/WACV51458.2022.00233.
Abstract
Reconstructing the 3D geometry of an object from an image is a major challenge in computer vision. Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image. This can be scene information or constraints such as object silhouettes, uniform backgrounds, material, texture, and lighting. In this paper, we propose an approach that enables a differentiable rendering-based learning of 3D objects from images with backgrounds without the need for silhouette supervision. Instead of trying to render an image close to the input, we propose an adversarial style-transfer and domain adaptation pipeline that allows to translate the input image domain to the rendered image domain. This allows us to directly compare between a translated image and the differentiable rendering of a 3D object reconstruction in order to train the 3D object reconstruction network. We show that the approach learns 3D geometry from images with backgrounds and provides a better performance than constrained methods for single-view 3D object reconstruction on this task.BibTeX
S. Giebenhain and B. Goldlücke, “AIR-Nets: An Attention-Based Framework for Locally Conditioned Implicit Representations,” in
2021 International Conference on 3D Vision (3DV), in 2021 International Conference on 3D Vision (3DV). 2021, pp. 1054–1064. doi:
10.1109/3DV53792.2021.00113.
Abstract
This paper introduces Attentive Implicit Representation Networks (AIR-Nets), a simple, but highly effective architecture for 3D reconstruction from point clouds. Since representing 3D shapes in a local and modular fashion increases generalization and reconstruction quality, AIR-Nets encode an input point cloud into a set of local latent vectors anchored in 3D space, which locally describe the object’s geometry, as well as a global latent description, enforcing global consistency. Our model is the first grid-free, encoder-based approach that locally describes an implicit function. The vector attention mechanism from 62 serves as main point cloud processing module, and allows for permutation invariance and translation equivariance. When queried with a 3D coordinate, our decoder gathers information from the global and nearby local latent vectors in order to predict an occupancy value. Experiments on the ShapeNet dataset 7 show that AIR-Nets significantly outperform previous state-of-the-art encoder-based, implicit shape learning methods and especially dominate in the sparse setting. Furthermore, our model generalizes well to the FAUST dataset 1 in a zero-shot setting. Finally, since AIR-Nets use a sparse latent representation and follow a simple operating scheme, the model offers several exiting avenues for future work. Our code is available at https: //github.com/SimonGiebenhain/AIR-Nets.BibTeX
V. Hosu, B. Goldlücke, and D. Saupe, “Effective Aesthetics Prediction with Multi-level Spatially Pooled Features,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9367–9375, 2019, doi:
10.1109/CVPR.2019.00960.
Abstract
We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. While previous approaches miss some of the information in the original images, due to taking small crops, down-scaling or warping the originals during training, we propose the first method that efficiently supports full resolution images as an input, and can be trained on variable input sizes. This allows us to significantly improve upon the state of the art, increasing the Spearman rank-order correlation coefficient (SRCC) of ground-truth mean opinion scores (MOS) from the existing best reported of 0.612 to 0.756. To achieve this performance, we extract multi-level spatially pooled (MLSP) features from all convolutional blocks of a pre-trained InceptionResNet-v2 network, and train a custom shallow Convolutional Neural Network (CNN) architecture on these new features.BibTeX
D. Maurer, N. Marniok, B. Goldluecke, and A. Bruhn, “Structure-from-motion-aware PatchMatch for Adaptive Optical Flow Estimation,” in
Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., in Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol. 11212. Springer International Publishing, 2018, pp. 575–592. doi:
10.1007/978-3-030-01237-3_35.
Abstract
Many recent energy-based methods for optical flow estimation rely on a good initialization that is typically provided by some kind of feature matching. So far, however, these initial matching approaches are rather general: They do not incorporate any additional information that could help to improve the accuracy or the robustness of the estimation. In particular, they do not exploit potential cues on the camera poses and the thereby induced rigid motion of the scene. In the present paper, we tackle this problem. To this end, we propose a novel structure-from-motion-aware PatchMatch approach that, in contrast to existing matching techniques, combines two hierarchical feature matching methods: a recent two-frame PatchMatch approach for optical flow estimation (general motion) and a specifically tailored three-frame PatchMatch approach for rigid scene reconstruction (SfM). While the motion PatchMatch serves as baseline with good accuracy, the SfM counterpart takes over at occlusions and other regions with insufficient information. Experiments with our novel SfM-aware PatchMatch approach demonstrate its usefulness. They not only show excellent results for all major benchmarks (KITTI 2012/2015, MPI Sintel), but also improvements up to 50% compared to a PatchMatch approach without structure information.BibTeX
N. Marniok and B. Goldluecke, “Real-time Variational Range Image Fusion and Visualization for Large-scale Scenes using GPU Hash Tables,” in
Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). 2018, pp. 912–920. doi:
10.1109/WACV.2018.00105.
Abstract
We present a real-time pipeline for large-scale 3D scenereconstruction from a single moving RGB-D camera to-gether with interactive visualization. Our approach com-bines a time and space efficient data structure capable ofrepresenting large scenes, a local variational update algo-rithm and a visualization system. The environment’s struc-ture is reconstructed by integrating the depth image of eachcamera view into a sparse volume representation using atruncated signed distance function, which is organized viaa hash table. Noise from real-world data is efficiently elim-inated by immediately performing local variational refine-ments on newly integrated data. The whole pipeline is ableto perform in real-time on consumer-available hardwareand allows for simultaneous inspection of the currently re-constructed scene.BibTeX
N. Marniok, O. Johannsen, and B. Goldluecke, “An Efficient Octree Design for Local Variational Range Image Fusion,” in
Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science, V. Roth and T. Vetter, Eds., in Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science, vol. 10496. Springer International Publishing, 2017, pp. 401–412. doi:
10.1007/978-3-319-66709-6_32.
Abstract
We present a reconstruction pipeline for a large-scale 3D environment viewed by a single moving RGB-D camera. Our approach combines advantages of fast and direct, regularization-free depth fusion and accurate, but costly variational schemes. The scene’s depth geometry is extracted from each camera view and efficiently integrated into a large, dense grid as a truncated signed distance function, which is organized in an octree. To account for noisy real-world input data, variational range image integration is performed in local regions of the volume directly on this octree structure. We focus on algorithms which are easily parallelizable on GPUs, allowing the pipeline to be used in real-time scenarios where the user can interactively view the reconstruction and adapt camera motion as required.BibTeX
O. Johannsen
et al., “A Taxonomy and Evaluation of Dense Light Field Depth Estimation Algorithms,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshops, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshops. IEEE, 2017, pp. 1795–1812. doi:
10.1109/CVPRW.2017.226.
Abstract
This paper presents the results of the depth estimation challenge for dense light fields, which took place at the second workshop on Light Fields for Computer Vision (LF4CV) in conjunction with CVPR 2017. The challenge consisted of submission to a recent benchmark 7, which allows a thorough performance analysis. While individual results are readily available on the benchmark web page http://www.lightfield-analysis.net, we take this opportunity to give a detailed overview of the current participants. Based on the algorithms submitted to our challenge, we develop a taxonomy of light field disparity estimation algorithms and give a report on the current state-of-the-art. In addition, we include more comparative metrics, and discuss the relative strengths and weaknesses of the algorithms. Thus, we obtain a snapshot of where light field algorithm development stands at the moment and identify aspects with potential for further improvement.BibTeX
M. Stein
et al., “Bring it to the Pitch: Combining Video and Movement Data to Enhance Team Sport Analysis,” in
IEEE Transactions on Visualization and Computer Graphics, in IEEE Transactions on Visualization and Computer Graphics, vol. 24. 2017, pp. 13–22. doi:
10.1109/TVCG.2017.2745181.
Abstract
Analysts in professional team sport regularly perform analysis to gain strategic and tactical insights into player and team behavior. Goals of team sport analysis regularly include identification of weaknesses of opposing teams, or assessing performance and improvement potential of a coached team. Current analysis workflows are typically based on the analysis of team videos. Also, analysts can rely on techniques from Information Visualization, to depict e.g., player or ball trajectories. However, video analysis is typically a time-consuming process, where the analyst needs to memorize and annotate scenes. In contrast, visualization typically relies on an abstract data model, often using abstract visual mappings, and is not directly linked to the observed movement context anymore. We propose a visual analytics system that tightly integrates team sport video recordings with abstract visualization of underlying trajectory data. We apply appropriate computer vision techniques to extract trajectory data from video input. Furthermore, we apply advanced trajectory and movement analysis techniques to derive relevant team sport analytic measures for region, event and player analysis in the case of soccer analysis. Our system seamlessly integrates video and visualization modalities, enabling analysts to draw on the advantages of both analysis forms. Several expert studies conducted with team sport analysts indicate the effectiveness of our integrated approach.BibTeX
J. Iseringhausen
et al., “4D Imaging through Spray-on Optics,”
ACM Transactions on Graphics, vol. 36, no. 4, Art. no. 4, 2017, doi:
10.1145/3072959.3073589.
Abstract
Light fields are a powerful concept in computational imaging and a mainstay in image-based rendering; however, so far their acquisition required either carefully designed and calibrated optical systems (micro-lens arrays), or multi-camera/multi-shot settings. Here, we show that fully calibrated light field data can be obtained from a single ordinary photograph taken through a partially wetted window. Each drop of water produces a distorted view on the scene, and the challenge of recovering the unknown mapping from pixel coordinates to refracted rays in space is a severely underconstrained problem. The key idea behind our solution is to combine ray tracing and low-level image analysis techniques (extraction of 2D drop contours and locations of scene features seen through drops) with state-of-the-art drop shape simulation and an iterative refinement scheme to enforce photo-consistency across features that are seen in multiple views. This novel approach not only recovers a dense pixel-to-ray mapping, but also the refractive geometry through which the scene is observed, to high accuracy. We therefore anticipate that our inherently self-calibrating scheme might also find applications in other fields, for instance in materials science where the wetting properties of liquids on surfaces are investigated.BibTeX
O. Johannsen, A. Sulc, N. Marniok, and B. Goldluecke, “Layered Scene Reconstruction from Multiple Light Field Camera Views,” in
Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, Eds., in Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science, vol. 10113. Springer International Publishing, 2016, pp. 3–18. doi:
10.1007/978-3-319-54187-7_1.
BibTeX