Abstract:
In this thesis, we focus on methods for automatic reconstruction of large 3D scenes
directly from images. In the literature, methods solving this problem are referred to as
multi-view stereo (MVS) algorithms, and they are a very interesting alternative to the
acquisition of geometry with laser scanners, as the equipment - digital cameras - is not
expensive. As the MVS reconstruction is a well-researched topic, current efforts are
shifted towards a large scale reconstruction. City models require millions of images
to capture their geometry. Processing such amounts of data requires a lot of computa-
tional effort, even for current super-computers. Exploiting parallelization is often not
sufficient, as it leads only to a linear improvement in computational complexity. This
effort can be reduced, as described in this thesis, not only by using parallelization, but
also with a smart algorithmic approach.
The need of quality evaluation for MVS algorithms and a large number of different
approaches has led researchers to establish a ranking [SCD+ 06]. The most promising
approaches are from the year 2009, and recently two new publications were released
in 2011, which shows a loss of interest in improving the quality, as there is not much
improvement to achieve. It can be clearly seen, that the focus of research in this area
has shifted to the application of current methods to large data sets.
In this thesis, we present a new approach to the large scale reconstruction problem.
The general outline of this approach is as follows: First we gather data as video or
image sequences. We extract image features and build compact descriptors for each
sequence. We calibrate cameras for each sequence to obtain camera parameters and
sparse 3D point clouds. With our compact descriptors, we compute a similarity graph,
where each node is a sequence, and edges are joining sequences representing scenes
with overlapping geometry. The next step is to compute transformation matrices be-
tween sparse 3D point clouds obtained during the camera calibration process. We com-
pute transformations of sub-models to a global coordinate system. We perform a large
scale bundle adjustment to improve camera matrices, 3D points, and transformation
matrices. For each image sequence, we compute a dense point cloud with traditional
MVS methods. Using the matrices, we bring dense sub-models to a global coordinate
system, to obtain a final large model.
As it can be seen, the most time consuming steps of the algorithm can be performed
in parallel. However, there are certain steps of our approach, that do not parallelize
in an easy, natural way. These are the similarity graph construction, and the large
scale bundle adjustment. Thanks to our compact descriptor and our large scale bundle
adjustment algorithm these steps can be performed on a single PC. One of the big
advantages of our approach is a possibility of incremental model construction. The
data does not need to be available at the beginning of the process, and the quality of the
global model will be refined as more data will become available.