NO.046 Computer Visualization – Concepts and Challenges

Shonan Village Center

March 10 - 13, 2014 (Check-in: March 9, 2014 )


  • Arie E. Kaufman
    • Stony Brook University, USA
  • Issei Fujishiro
    • Keio University, Japan


Computer vision is concerned with inferring properties of the world from observations in the form of visual images. Such inverse problems typically take shape as optimization problems, that aim to find the best explanation, for the complex visual phenomenon that gave rise to a set of noisy and incomplete visual measurements. For computer vision applications to be successful, the underlying optimization problems must be supported by efficient and dependable solution methods.

The proposed meeting focuses on a broad subclass of computer vision problems called geometric vision problems. Roughly, these are problems that exploit fundamental geometrical constraints arising from the image formation process or physical properties of the scene (e.g., lighting conditions, characteristics of motions), to extract information of the scene (e.g., depth, 3D shape, camera trajectory, object identities) from the given visual data. Example geometric vision problems include structure-from-motion (SfM), simultaneous localization and mapping (SLAM), pose averaging [1], photometric stereo [2], and motion segmentation [3]. Methods for solving geometric vision problems underpin many useful applications, such as 3D reconstruction, robot navigation, object recognition/tracking, and computational photography [4].

Geometric vision is replete with hard optimization problems. By “hard”, we mean that the time needed to solve the optimization problems grows quickly with the size of the input data. Take, for example, the task of robustly estimating the planar perspective transformation (a.k.a. homography) from outlier-contaminated point correspondences between two images. Due to the inherent intractability of robust homography estimation [5], practitioners often rely on simple randomized heuristics to find rough approximate solutions, which neither guarantee optimality nor provide bounds on the approximation error.

The computational difficulty of geometric vision problems is also often compounded by the extremely large size of the input. Take, for example, the task of bundle adjustment, i.e., calculate 3D points and camera poses that are consistent with a set of images of a scene. In the age of big data, the input image set is often obtained by “scraping” Internet photo collections, or by conducting long-term surveillance of a scene using a robot. Such input sizes easily overwhelm traditional computing architectures, and distributed or parallel versions of bundle adjustment must be used [6].