Detecting stems as point clusters - Stem detection in 2D data structure

4. Materials and methods

4.8. Stem detection in 2D data structure

4.8.1. Detecting stems as point clusters

Stem slice sections can be distinguished from point groups of low vegetation because the stem surface points are arranged in groups with characteristic form and size. The distinctions in size and shape are captured by two theoretical concepts resulting in two routines. The first one is the partitioning of the point cloud into clusters with an iterative method. The second step is the classification of each cluster as either stem slice section or low vegetation. This step includes a proposal for geometric circle fitting by means of an iterative least-squares adjustment that in contrast to the classic Newton numerical scheme requires less accurate initial values for the parameters to be computed.

4.8.1.1. Partitioning the point cloud

Clustering is tool for data mining in statistics that assigns data into separate groups, i.e.

clusters, according to a defined similarity criterion (Fogaras and Lukács 2005). In case of point coordinates, the clusters should be delineated so that the sum of squared radial distance of points within the clusters is minimal and, at the same time, the sum of distance across cluster centres is maximal. This method has been adapted assuming each stem cross-section is represented by a cluster, thus the clusters describe the locations of the stems. The clustering expected to assign the coherent stem surface points to identical clusters. There are various clustering methods applied in the fields of digital image processing for unsupervised image classification and segmentation (Czimber, 1997). The algorithm presented in this study is a kind of partitioning technique because any given point is assigned to only one cluster and each cluster has to contain at least one point.

Clusters are described by their centre and radius. The centre is located in the centroid of the member points; the radius is the Euclidean distance between the centre and the farthest member point. The distance between two clusters is identical to the Euclidean distance between the corresponding cluster centres. Clusters are the geometric representations of the stem cross-sections, so the radius of the largest cluster (R) must be identical to the radius of the largest stem. The algorithm needs a rough estimation on the radius of the expected largest stem in the study area, which is a limitation to the maximum cluster size (Figure 4-13). The algorithm is unable to separate multiple stems within the same cluster. It follows that (1) Two trees can be separated only if the distance between their centre exceeds R, otherwise the centres are assigned to the same cluster. (2) The minimum distance among the clusters is limited; therefore, the maximum number of clusters is also limited. As a result, there is no need for the number of clusters as input.

Figure 4-13. Stem point measurements within the maximum cluster radius are assumed to belong to the same tree. The vertical thickness of the subset is 1 meter. (Data from sample plot H1)

The partitioning algorithm is iterative, where the initial number of clusters is zero. An iteration cycle contains the following steps: The point measurements are taken out sequentially from a list. The cluster centres are queried around each point within the distance of R. If the query was successful, the point is assigned to the closest cluster; else, a new cluster is created whose centre is identical to the location of the point. Finally, all the points are assigned into a cluster and the list is empty. The cluster centres are recalculated as the centroid of the member points. If any pairs of the recalculated centres are closer to each other than R, the corresponding clusters are merged and the new centre will be the centroid of the member points. The points are took out from the clusters and reloaded into the sequential list.

The cycle is repeated iteratively, until the average change of cluster centres exceeds the threshold of 1.0 mm. In the last step, the points are queried within a distance of R around each cluster centres and the unique label of the corresponding cluster is assigned to each of the member point. The clusters are considered as stem candidates, although some of them are composed of point measurements from low vegetation and branches. The algorithm of clustering was implemented using quad-tree structure (Czimber, 1997) as spatial index to speed up the searching procedure of the closest cluster centroids.

4.8.1.2. Classification of clusters

The filtering of stem clusters is achieved by analysing the point pattern. Stem point measurements are arranged along cylindrical stem surfaces, which resulted in circles with approximately identical radius at horizontal cross-sections. In order to check the cylindrical feature of the clusters, horizontal circles are fit to the points located at the bottom, middle and top height of each cluster (Figure 4-14).

If the number of data points exceeds three, the parameters of the circle are calculated through geometric fit where the sum of squared Euclidian distance of the data points to the circular arc is to be minimised. The calculation concept is referred to as least squares adjustment or minimization in L2-norm. In addition, if the measurement errors have normal

distribution, the solution of least squares adjustment corresponds to the maximum likelihood estimates (Závoti, 2001). The data points are of uniform weight. The parameters to be estimated are the (x,y) centre position and R radius being analogue to the tree centre position and stem radius (Figure 4-15). The goodness of fit is indicated by the root-mean-square error (RMSE) with three degrees of freedom. A major concern in geometric circle fit is that the respective minimization algorithms include non-linear equations without explicit formula.

Thus, they usually require linearization and iterative numeric schemes such as the general Gauss-Newton method. The Gauss-Newton method needs ‘sufficiently accurate’ initial values otherwise the routine converge to a local minimum (Henrici, 1985). The accurate estimation of initial values cannot be guaranteed in our case as the pattern of stem surface points is often asymmetric and covers merely a partial sector of the circle. The proposed routine requires no linearization of the equations used hence it is less sensible to the initial values. The procedure refines the parameters of radius and position in separate steps resulting in convergent solution but the calculation procedure is computationally more demanding than the classic Gauss-Newton method. Running the routine with input of some thousand points has no influence on computation time from practical point of view.

A point pattern in a height section is accepted as circular if the RMSE of the circle fitting is below a given tolerance otherwise the fitting is imprecise. A given cluster is classified as stem if minimum two of its cross-sections are circular and the standard deviation of the circle radii is below the predefined thresholds.

The results of the clustering-based stem detection method were validated on the sample plot H1 with the following user-defined parameters. The subset of points used as input for the clustering was in the elevation interval from 1.0 to 2.0 meter. The maximum cluster radius (namely the largest tree radius to be expected) was 0.5 m. Clusters with less than 50 points were omitted. The horizontal point cloud sections were cut from the point clusters at the levels of 1.0, 1.5 and 2.0 meter, with thickness of 10 cm. A horizontal point cluster section was considered circular if the RMSE of the circle fit was below ±3.0 cm. A cluster was accepted as stem if at least two circle fits were reasonable and the maximum absolute difference in radius was below 5 cm.

Clustering Checking circularity

Figure 4-14. Point slice used for the clustering and its sub-sections for checking the circularity of the clusters.

C (xc, yc)

Figure 4-15. Notations for the geometric circle fit.

4.8.1.3. Routine for geometric circle fitting

Let denote n (n≥3) the number of input points, R (R>0) the circle radius, c the vector pointing into the circle centre, and pi (i = 1…n) the vector of the i-th data point. The objective function:

The initial values of the circle centre coordinates (x_c,y_c) can be obtained as the average of the point coordinates representing the centroid of the data points. Introducing the notation ri for the Euclidean distance of the i-th data point and the circle yields the following formula:

In the first step, R is considered as variable resulting in the following object function:

 

This function may have its extreme where its first derivative is set to zero:

 

As (4-6) is summing up squared expressions, the extreme exists, and it is the minimum.

The value of R can be yielded as the average of r_i:

The objective function in the second step is:



^x ^x ^R ^y ^y ^R



^MIN

As the rooted expression is a sum of two non-negative items, it can be simplified:



^x ^x ^R ^y ^y ^R



^MIN

The variables are separate so the function can be rewritten as:

MIN

Solving the equations yields the refined centre coordinates:

n Step 1 and 2 are iterated until the change in any of the parameters exceeds 0.1 mm.

In document L OCATING AND PARAMETER RETRIEVAL OF INDIVIDUAL TREES FROM TERRESTRIAL LASER SCANNER DATA (Pldal 53-57)