by Simon Hadfield, Karel Lebeda and Richard Bowden
Abstract:
We present a framework which allows standard stereo reconstruction to be unified with a wide range of classic top-down cues from urban scene understanding. The resulting algorithm is analogous to the human visual system where conflicting interpretations of the scene due to ambiguous data can be resolved based on a higher level understanding of urban environments. The cues which are reformulated within the framework include: recognising common arrangements of surface normals and semantic edges (e.g. concave, convex and occlusion boundaries), recognising connected or coplanar structures such as walls, and recognising collinear edges (which are common on repetitive structures such as windows). Recognition of these common configurations has only recently become feasible, thanks to the emergence of large-scale reconstruction datasets. To demonstrate the importance and generality of scene understanding during stereo-reconstruction, the proposed approach is integrated with 3 different state-of-the-art techniques for bottom-up stereo reconstruction. The use of high-level cues is shown to improve performance by up to 15% on the Middlebury 2014 and KITTI datasets. We further evaluate the technique using the recently proposed HCI stereo metrics, finding significant improvements in the quality of depth discontinuities, planar surfaces and thin structures.
Reference:
Stereo reconstruction using top-down cues (Simon Hadfield, Karel Lebeda and Richard Bowden), In journal of Computer Vision and Image Understanding (CVIU), 2016. (Spotlight video)
Bibtex Entry:
@Article{Hadfield16b,
Title = {Stereo reconstruction using top-down cues},
Author = {Simon Hadfield and Karel Lebeda and Richard Bowden},
Journal = {journal of Computer Vision and Image Understanding (CVIU)},
Year = {2016},
% Month = {},
% Number = {},
% Pages = {},
% Volume = {},
Abstract = {We present a framework which allows standard stereo reconstruction to be unified with a wide range of classic top-down cues from urban scene understanding. The resulting algorithm is analogous to the human visual system where conflicting interpretations of the scene due to ambiguous data can be resolved based on a higher level understanding of urban environments. The cues which are reformulated within the framework include: recognising common arrangements of surface normals and semantic edges (e.g. concave, convex and occlusion boundaries), recognising connected or coplanar structures such as walls, and recognising collinear edges (which are common on repetitive structures such as windows). Recognition of these common configurations has only recently become feasible, thanks to the emergence of large-scale reconstruction datasets. To demonstrate the importance and generality of scene understanding during stereo-reconstruction, the proposed approach is integrated with 3 different state-of-the-art techniques for bottom-up stereo reconstruction. The use of high-level cues is shown to improve performance by up to 15\% on the Middlebury 2014 and KITTI datasets. We further evaluate the technique using the recently proposed HCI stereo metrics, finding significant improvements in the quality of depth discontinuities, planar surfaces and thin structures.},
Doi = {10.1016/j.cviu.2016.08.001},
Keywords = {Stereo reconstruction, Scene understanding, biologically inspired, high level cues, bottom up, top down},
Timestamp = {2016.08.09},
gsid = {14439992960842131598},
Comment = {<a href="https://youtu.be/tgvHHkxis4A">Spotlight video</a>},
Url = {http://personalpages.surrey.ac.uk/s.hadfield/papers/stereo_CVIU16.pdf}
}