This paper studies photogrammetric procedure performed by miniature unmanned rotorcraft, a quadrocopter. As a solution to build up 3Dmodel for buildings facades, UAV can be a real asset as it can reach difficult corners ofthebuilding and take repeated captures. The high maneuverability ofthe quadrocopter permits us to take diagonal pictures containing information about the depth ofthebuilding, which is very critical factor while generating the3Dmodel. The aim ofthe paper is to have the best quality of2D captured pictures in order to superpose them to generate 3Dmodelofthebuilding. Therefore, an essential part ofthe paper concentrates on the capturing angles techniques. The generated trajectory takes into consideration the deviation factor ofthe ultrasonic altitude sensor and GPS module. Tracking the desired coordinates allows us to automate the photogrammetric procedure and wisely use manpower to generate the3Dmodel.
Therefore, the first and the most important part of generating 3D models ofbuilding parts containing tilted roof structures is extracting ridge lines. Arefi (2009) proposed an algorithm to extract the ridge location from high resolution airborne LIDAR data using morphological geodesic reconstruction (Gonzalez and Woods, 2008). Due to a lower quality of DEM created from Worldview stereo images comparing to the LIDAR data, a method relying on only height data does not produce appropriate ridge pixels. In this paper, a method by integrating orthorectified im- age and DEM information is applied for a high quality ridge line extraction (cf. Figure 1). The procedure to extract all the ridge lines corresponding to a building with tilted roofs begins with feature extraction. For this purpose, three feature descriptors are extracted from DEM and ortho image as follows (cf. Figure 2):
In order to reveal the differences of user experience of test participants while watching 3D video sequences compared to 2D video sequences, alternative evaluation schemes have been considered by Seunti¨ens. They address other di- mensions like naturalness or immersion, or investigate specific factors like depth perception or eye-strain , which provides some insight into isolated factors ofthe general QoE, but not the overall QoE. In the present study, as global measure of QoE, the subjective preference has been considered. It is believed that when subjects are asked for prefer- ence between two videos, they may consider all factors (picture quality, both depth quantity and depth quality, visual discomfort and probably other factors) to take the decision which of two versions of a sequence they prefer. This way, the entire multidimensionality of3D QoE is considered. Missing factors of2D video QoE when evaluating it using ACR were shown by Belmudez , where another multidimensional question was studied. Here, image size and image resolution were compared in terms of quality ratings, one using ACR, one using paired comparison (PC). Re- sults showed that the two test methods do not provide the same results: using ACR, observers give higher QoE ratings for images at their native resolution; using PC, observers prefer larger images obtained after upscaling. The results are different, and show that usingthe ACR methodology observers only judge image quality, but with paired comparison they extend their rating to other dimensions, including the image size. PC however has an important drawback: its cost and time consumption. To obtain scale value quality scores from PC data, two models exist: the Bradley-Terry model or the Thurstone-Mosteller model . Both need a full PC matrix: each condition has to be compared to another. However, several efficient approaches have been developed in the literature to reduce the number of required compar- isons , . In  six video sequences were recorded. Each of these videos were captured at six inter-camera distances (10 cm to 60 cm). The 36 video sequences were then compared through paired comparison, and the Bradley- Terry scores of each condition were determined. Results show that the Bradley-Terry scores reveal quality fluctuations due to the different depth and comfort. The relation between inter-camera distance and QoE was found highly content dependent. In , 3D was compared to 2Dusing a PC approach on an auto-stereoscopic display. 3D was produced internally by the display based on a texture and a depth map. The texture was used at four different quality levels (three encodings and a reference). Results show that 3D was rejected in 70% ofthe cases and for the lowest quality rejected at 56%. However, the results may be influenced by the technology used at the time ofthe experiment and the quality of3D rendering ofthe3D display as mentioned by the authors .
During the last decades, several approaches for thereconstructionof3Dbuilding models have been developed. Starting in the 1980s with manual and semi-automatic reconstruction methods of3Dbuilding models from aerial images, the degree of automation has increased in recent years so that they became applicable to various areas. Some typical applications and examples are shown in section 1.1. Especially since the 1990s, when airborne light detection and ranging (LiDAR) technology became widely available, approaches for (semi-)automatic buildingreconstructionof large urban areas turned out to be of particular interest. Only in recent years, some large cities have built detailed 3D city models. Although much effort has been put into the development of a fully automatic reconstruction strategy in order to overcome the high costs of semi-automatic reconstructions, no solution proposed so far meets all requirements (e.g., in terms of completeness, correctness, and accuracy). The reasons for this are manifold as discussed in section 1.2. Some of them are manageable, for example, either by using modern sensors which provide denser and more accurate point clouds than before or by incorporating additional data sources such as high-resolution images. However, there is quite a big demand for 3Dbuilding models in areas where such modern sensors or additional data sources are not available. Therefore, in this thesis a new fully automatic reconstruction approach of semantic 3Dbuilding models for low- and high-density airborne laser scanning (ALS) data of large urban areas is presented and discussed. Additionally, it is shown how automatically derived building knowledge can be used to enhance existing buildingreconstruction approaches. The specific research objectives are outlined in section 1.3. It includes an overview ofthe proposed reconstruction workflows and the contribution of this thesis. In order to have lean workflows with good performance, some general assumptions on the buildings to be reconstructed are imposed and explained in section 1.4. The introduction ends with an outline of this thesis in section 1.5.
Different from global schemes, the use of local constraints leads to an independent linear sub-problem in every iteration. For each of them an appropriate λ can be determined, e.g., by the L–curve criterion. Many authors using local regularization schemes as Marquardt (1963), Loke and Barker (1996b) and Kemna (2000) discuss the use of decreasing λ beginning from a large starting value down to a minimum value. Also, Farquharson and Oldenburg (2004) applying a global smoothness constrained inversion use a cooling type schedule of decreasing λ. However, there are practical reasons such as the limited accuracy ofthe forward routines, which can entail interpretation failures. To prevent overshooting in the early iterations, a line search parameter can be applied to ensure convergence. Generally, the use of larger λ yields similar, but smoothed, structures with less magnitude, which represents an easy-to-control alternative to the line search procedure. However, the resolution analysis shows that themodel is pre- dicted by the final λ, which has to be chosen appropriately. The ultimate criterion is, whether the target value Φ =
Presegmentation typically classifies the LiDAR point cloud into buildings, terrain, and vegetation (including other non-terrain objects and clutter). Presegmentation can be performed in one step or sequentially, by first separating elevated points from those on ground, and then removing vegetation from the remaining data, or vice versa. Popular ground filtering method is to set a height threshold on a Digital Terrain Model (DTM), which can be produced e.g. by morphological filter operations [Morgan and Tempfli, 2000, Zhang et al., 2006, Ameri and Fritsch, 2000]. Other approaches are to identify planar LiDAR points2 and to create connected components ofthe latter, assuming the largest connected component to be ground [Verma et al., 2006]. Connected components can also be used for vegetation filtering, assuming connected components of small size [Verma et al., 2006] or of low planarity [Sampath and Shan, 2010] to be vegetation. The sequential process can be inverted, e.g. Sun and Salvaggio  classify first vegetation with a graph-cuts method, and then use Euclidean clustering to identify buildings. A one-step scene classification can be achieved e.g. by graph-cut optimization. Lafarge and Mallet  define expectation values for buildings, vegetation, ground and clutter by combining different covariance-based measures and height information in an energy optimization term. Dorninger and Pfeifer  extract all planar regions ofthe scene using a region growing segmentation 3 in feature space and group the extracted points to buildings with a mean-shift algorithm. Alternatively, building point clouds can be directly extracted from 2Dbuilding footprints, which are available beforehand[Rau and Lin, 2011], or which are provided interactively by user inputs [You et al., 2003].
Presegmentation typically classifies the point cloud into building points and other points, mainly terrain and vegetation in LiDAR point clouds. If 2Dbuilding footprints are avail- able beforehand, building point clouds can be directly extracted [Rau and Lin, 2011]. A popular way is ground filtering method, in which a Digital Terrain Model (DTM) is pro- duced by morphological filter operations [Morgan and Tempfli, 2000] [Zhang et al., 2003] [Pingel et al., 2013], then a height threshold is set on the DTM. Another approach is to fit planes to points clouds, and clustering points. The largest cluster is assumed to be ground [Verma et al., 2006]. [Lafarge and Mallet, 2012] defineed expectation values for buildings, vegetation, ground and clutter by combining different covariance-based measures and height information by energy optimization. [Dorninger and Pfeifer, 2008] extracted all planar regions ofthe scene using region growing method in feature space and group the extracted points to buildings with a mean-shift algorithm.
188.8.131.52.4 Edge-Based Technique Edge-detection technique is a technique to detect edge pix- els using Gradient, Laplacian or Canny filtering and then link those pixels to form contours at the end. Linking of edges, in a predefined neighborhood, depends on two criteria. The first one is the magnitude ofthe gradient and the second is the direction ofthe gradient vector. Since edges are important features in an image to separate regions, a large variety of edge detection algo- rithms have been developed for image segmentation in computer vision area (Shapiro u. Stockman ). (Heath u. a. ) demonstrate a proposed experimental strategy by comparing four well- known edge detectors: Canny, Nalwa–Binford, Sarkar–Boyer, and Sobel. (Jiang u. Bunke ) presented a novel edge detection algorithm for range images based on a scan line approximation technique. LiDAR data are converted into range image, e.g. DSM (Digital Surface Model) to make it suitable to image edge-detection methods. The performance of segmentation is largely dependent on the edge detector. However, the operation of converting 3D point clouds to 2.5D range images inevitably causes information loss. For airborne LiDAR data, the overlapping surface such as multi- layer building roofs, bridges, and tree branches on top of roofs cause buildings and bridges either under segmented or wrongly classified. The point clouds obtained by terrestrial LiDAR are usually combined from the scans in several different positions, converting such kind of true 3D data into 2.5D would cause great loss of information (Wang u. Shan ).
This paper demonstrates for the first time the potential of explicitly modelling the individual roof surfaces to reconstruct 3-D prismatic building models using spaceborne tomographic synthetic aperture radar (TomoSAR) point clouds. The proposed approach is modular and works as follows: it first extracts the buildings via DSM generation and cutting-off the ground terrain. The DSM is smoothed using BM3D denoising method proposed in (Dabov et al., 2007) and a gradient map ofthe smoothed DSM is generated based on height jumps. Watershed segmentation is then adopted to oversegment the DSM into different regions. Subsequently, height and polygon complexity constrained merging is employed to refine (i.e., to reduce) the retrieved number of roof segments. Coarse outline of each roof segment is then reconstructed and later refined using quadtree based regularization plus zig-zag line simplification scheme. Finally, height is associated to each refined roof segment to obtain the 3-D prismatic modelofthebuilding. The proposed approach is illustrated and validated over a large building (convention center) in the city of Las Vegas using TomoSAR point clouds generated from a stack of 25 imagesusing Tomo-GENESIS software developed at DLR.
Unfortunately, building models derived from ALS data are restricted by the point spacing of datasets. Hence, it is difficult to achieve high planimetric accuracy of a reconstructed scene. While ridge lines inherit a good precision from the redundancy in extracted plane intersection processes, the precision ofthe outer perimeter of reconstructed roofs suffers from the limited point spacing. An alternative 3D modeling approach is to make use of aerial imagery [10–14]. Compared to laser scanning, optical imagery with its much higher spatial resolution allows for a more accurate extraction ofbuilding edges . On the other hand, a major shortcoming of image-based modeling relates to their 2D nature, requiring reliable image matching techniques to achieve automation. Moreover, common problems encountered by image processing (such as shadows, occlusions, and poor contrast) can hinder effective reconstruction . For the two aforementioned groups of methods that are based on one type of data, it is hard to obtain both, planimetric and height accuracy at the same level . All of these facts are the motivation for the third type ofbuildingreconstruction approach, which benefits from synergetic properties of LiDAR and image data and uses both of these sources. Employing multiple data types enables the combination of modeling cues and covers shortcomings inherited from the acquisition technique. It can be expected that the limitations coming from one sensor (such as data gaps, occlusions, shadows, and resolution issues) will be compensated for by the information provided by the second sensor.
6.3. M ODEL R ECONSTRUCTION FOR A UGMENTED R EALITY 127 available. Data from arbitrary sensors can be presented visually in a meaning- ful way, for example, in an industrial plant, the sensed temperature or flow rate in coolant pipes could be visually represented by color or motion, directly su- perimposed on a user’s view ofthe plant. Besides visualizing real data, which is otherwise invisible, AR can be used to preview objects which do not exist, for example in architecture or design: Virtual furniture or fittings could be re- arranged in a walk-through of a real building. The primary technical hurdle for AR is a robust and accurate registration between the real images and the vir- tual objects. Without accurate alignment, a convincing AR is impossible. The most promising approaches for accurate and robust registration use visual track- ing [Klein and Murray, 2007, Davison et al., 2007]. Visual Tracking attempts to track the head pose by analyzing features detected in a video stream. Typically, a camera is mounted to the head-mounted display, and a computer calculates this camera’s pose in relation to known features seen in the world. Unfortunately, real- time visual tracking is a very difficult problem. Extracting a pose from a video frame requires software to make correspondences between elements in the image and known 3D locations in the world, and establishing these correspondences in live video streams is challenging. The majority of current AR systems operate with prior knowledge ofthe user’s environment. This could be CAD model or a sparse map of fiducials known to be present in the scene.
In this study, first a DSM is generated from the stereo satellite images for each epoch. Both DSMs contains some inevitable artifacts especially around the buildings and on the object boundaries. Before being used for building3D change detection, they should be refined. Figure 1 shows the work-flow ofthe proposed method. For each epoch, a 3D surface model can be effectively generated with Semi-Global Matching (SGM) algorithm, first proposed by Hirschmuller , and improved by  for satellite data. Vegetation extracted by the Normalized Difference Vegeta-
Suveg and Vosselman (2004) developed a method for 3Dreconstructionof buildings by integrating 3D information from stereo images and a large scale GIS map. First a building on ground plan is subdivided into primitives. Optimum schemes of partitioning are determined by minimum description length criteria and a search tree. Then thebuilding primitives are verified by a least square fitting algorithm. The approximated values ofthe fitting method are obtained from map and 3D information from the stereo images. Kada and McKinley (2009) proposed a method for decomposition ofthe footprint by cell decomposition. In this method, the buildings are partitioned into non-intersecting cells along its façade polygons using vertical planes. The roof shapes are determined by directions generated from LiDAR points. Then models of buildings blocks are reconstructed using a library of parameterized standard shapes ofmodel. Arefi and Reinartz (2013) proposed a model-driven approach based on the analysis ofthe3D points of DSM from satellite images in a 2D projection plane. In this method parametric models are generated through single ridgeline reconstruction and subsequent merging of all ridgelines for a building. The edge information is extracted from the orthorectified image.
The approach starts with small point clouds containing one building at the time extracted from laser scanner data set by applying a pre-segmentation scheme. The laser scanner point cloud of each building is analysed separately. A 2.5D-Delaunay triangle mesh structure (TIN) is calculated into the laser scanner point cloud. For each triangle the orientation parameters in space (orientation, slope and perpendicular distance to the barycentre ofthe laser scanner point cloud) are determined and mapped into a parameter space. As buildings are composed of planar features, primitives, triangles representing these features should group in parameter space. A cluster analysis technique is utilised to find and outline these groups/clusters. The clusters found in parameter space represent plane objects in object space. Grouping adjacent triangles in object space – which represent points in parameter space – enables the interpolation of planes in the ALS points that form the triangles. In each cluster point group a plane in object space is interpolated. All planes derived from the data set are intersected with their appropriate neighbours. From this, a roof topology is established, which describes the shape ofthe roof. This ensures that each plane has knowledge on its direct adjacent neighbours. Walls are added to the intersected roof planes and the virtual 3Dbuildingmodel is presented in a file written in VRML (Virtual Reality Macro Language).
In this paper a method for building detection in aerial images based on variational inference of logistic regression is proposed. It consists of three steps. In order to characterize the appearances of buildings in aerial images, an effective bag-of-Words (BoW) method is applied for feature extraction in the first step. In the second step, a classifier of logistic regression is learned using these local features. The logistic regression can be trained using different methods. In this paper we adopt a fully Bayesian treatment for learning the classifier, which has a number of obvious advantages over other learning methods. Due to the presence of hyper prior in the probabilistic modelof logistic regression, approximate inference methods have to be applied for prediction. In order to speed up the inference, a variational inference method based on mean field instead of stochastic approximation such as Markov Chain Monte Carlo is applied. After the prediction, a probabilistic map is obtained. In the third step, a fully connected conditional random field model is formulated and the probabilistic map is used as the data term in themodel. A mean field inference is utilized in order to obtain a binary building mask. A benchmark data set consisting of aerial images and digital surfaced model (DSM) released by ISPRS for 2D semantic labeling is used for performance evaluation. The results demonstrate the effectiveness ofthe proposed method.
This paper presents a novel workflow for data-driven buildingreconstruction from Light Detection and Ranging (LiDAR) point clouds. The method comprises building extraction, a detailed roof segmentation using region growing with adaptive thresholds, segment bound- ary creation, and a structural 3Dbuildingreconstruction approach using adaptive 2.5D Dual Contouring. First, a 2D-grid is overlain on the segmented point cloud. Second, in each grid cell 3D vertices ofthebuildingmodel are estimated from the corresponding LiDAR points. Then, the number of3D vertices is reduced in a quad-tree collapsing procedure, and the remaining vertices are connected according to their adjacency in the grid. Roof segments are represented by a Triangular Irregular Network (TIN) and are connected to each other by common vertices or - at height discrepancies - by vertical walls. Resulting 3Dbuilding models show a very high accuracy and level of detail, including roof superstructures such as dormers. The workflow is tested and evaluated for two data sets, usingthe evaluation method and test data ofthe “ISPRS Test Project on Urban Classification and 3DBuildingReconstruction” (Rottensteiner et al., 2012). Results show that the proposed method is comparable with the state ofthe art approaches, and outperforms them regarding undersegmentation and completeness ofthe scene reconstruction.
Once the façade model parameters are estimated, the final step is to describe the overall shape ofthebuilding footprint by further identifying adjacent façades pairs and determining the intersection ofthe façade surfaces. The adjacency of facades is usually described by an adjacency matrix that is built up via connectivity analysis. Identified adjacent façade segments are then used to determine the vertex points (i.e., façade intersection lines in 3D). They are found by computing the intersection points between any adjacent façade pair. Since polynomial models are used for façade parameter estimation, the problem of finding vertex points boils down to find the intersection point between the two polynomials corresponding to the two adjacent façades. The computed vertex points and the estimated model parameters are then used to finally reconstruct the3Dmodelofthe buildings façades.
The procedure continues by projecting the localized points onto a 2D plane perpendicular to the ridge direction (cf. Figure 7 (b)). In this step, the overall aim is to decrease one dimension from the3D data and continue the further processing in 2D space. In 2D space, which is shown in Figure 7 (b), a 2Dmodel containing four lines (green lines) that are supported by a maximum number of points (blue) related to the front- and back-side ofthebuilding part is extracted. Therefore, two vertical lines relating to the walls and two inclined lines relating to the roof faces are defined (cf. Figure 7 (b)). The quality ofthe2Dmodel in this step depends on the existence of a sufficient number of height points relating to each side ofthe wall. It is common in complex buildings that the number of supporting height points is not sufficient to extract the corresponding vertical line. To cope with this problem, a vertical line located symmetrically to the side with maximum supporting points is defined. Hence, the algorithm in this step only extracts the side walls having equal distances to the ridge position (symmetry constraint).
A distinction is often made between rigid 2D/3D registration, where the aim is to align known 3D representations with 2Dimages in terms of translation and rotation (and sometimes scale) only, and 2D/3Dreconstruction via non-rigid registration, where the shape and/or volumetric properties of3D representations are unknown in addition to their rigid transformations in the X-ray setup. Rigid registration methods have, for instance, been proposed to align given 3D magnetic resonance imaging (MRI) or com- puted tomography (CT) datasets with 2D X-ray fluoroscopy images for intraoperative guidance (Livyatan et al., 2003; George et al., 2011). The task of matching CTs rigidly to X-rays has further been well studied for image guided interventions (Markelj et al., 2012). Here, methods often follow an intensity-based approach by projecting digitally reconstructed radiographs (DRRs) from CTs that emulate the X-ray attenuation under the same view than the given 2D radiographs were taken. To evaluate the fit of rigidly transformed CTs, the DRRs are then compared to the given X-ray imagesusing pixel- wise similarity measures such as sum of squared diﬀerences, mutual information, and cross correlation (Russakoﬀ et al., 2003a; Wu et al., 2009).
The third model includes shading techniques. Shadows are essential for realistic 3Dimages and are a region of relative darkness within an illuminated region caused by an object totally or partially occluding the light. Without them scenes often feel unnatural and flat. In addition the relative depths of objects in the scene is hard to define. Shadow algorithms are not predefined in OpenGL. Generally the algorithms are divided into three groups: shadow mapping, shadow volume and shadow projection algorithms (Woo et al., 1990). The shadow volume algorithm is a geometry based shadow algorithm which requires connectivity information ofthe polygonal meshes in the scene to efficiently compute the silhouette of each shadow casting object. It is a per pixel algorithm, which performs an in shadow test for each rendered fragment. This operation can be accelerated using graphics hardware. Generally the scene complexity has direct influence on the performance of this algorithm. Shadow mapping is an intrinsic image-space algorithm, which means that no knowledge ofthe scene’s geometry is required to carry out the necessary computations. As it uses discrete sampling it must deal with various aliasing artifacts, the major drawback ofthe technique.