Discussion - 3D Shape Recognition Methods for Tangible User Interfaces

Table 3.9: Comparison of methods for selecting the rank of the within instance scatter matrix.

Note that the number of dimensions is also reported as a percent of the maximum

Dataset Synthetic Shape Recogn. Image Classif.

Results (%) awi ac #dim awi ac #dim awi ac #dim

breakpoint 94 97 13 59 70 1.3 48 75 1.2

information retained 93 99 29 67 80 1 52 63 10

class-number based 86 98 26 65 73 1.4 54 64 5

iterative 100 100 21 71 81 1.1 54 74 1.8

no rank adjustment 89 78 49 46 31 26 7 73 99

dimension reduction. Among the rank adjustment methods, iterative trial and error clearly outperforms all other methods on accuracy. It is worth noting however, that the number of dimensions retained by the other methods is usually similar. This means that it may be a viable strategy to use one of these methods to find an initial value, and check the surrounding few values for the optimum. The full results are found in Figures A.20-A.33 in the Appendix.

Table 3.10: The95% credible intervals of the Bayesian t-tests

Baseline no rank adjustment iterative

Metric a_c #dim a_c #dim

breakpoint [1.1, 15] [−81, −36] [−5, −1.9] [−2.5, 1.5]

information retained [2.0, 14] [−75, −43] [−6.8, −1.8] [−3.7, 8.0]

class-number based [2.3, 15] [−77, −38] [−5.2, 1.9] [−0.79, 6.9]

iterative [5.0, 19] [−82, −40] N/A N/A

(though still likely positive). The SCDA and SSCDA methods are also the highest performing on the 3D shape-graph databases, meaning these methods are viable choices for the shape recognition system.

Notably, the proposed structured composite discriminant analysis methods are not limited in their application. As mentioned before, structured composite classes occur in numerous perception problems, such as object classification and detection. The SSCDA method is arguably viable as an image feature descriptor, although - unlike SURF - it is not necessarily invariant to standard image transformations, such as rotation or scaling.

In general, the SSCDA method can be used to perform discriminant analysis for component-based data structured, as long as the individual components can be described using the same set of variables. Furthermore, all proposed discriminant analysis methods assumed that the classes have a Mixture of Gaussians distribution, which is an important limitation of these algorithms. If this assumption is not true, the results are likely to be suboptimal.

With all things considered, the second thesis is stated as follows:

Thesis 2

I created a novel method to perform dimension reduction for classes com-posed of sets or graphs of vectors (structured composite classes). I showed on multiple datasets that the proposed method provides a viable solution not only for separating different classes, but also for simultaneously sepa-rating individual components within a single instance. I demonstrated via statistical testing that the new method provides descriptors that allow for higher classification accuracy compared to previous methods for structured composite classes. [6, 7, 8]

4 Arrangement in Scenes

In previous chapters, a method for classifying individual parts of larger structures was introduced. The method proposed in Chapter 2 was shown to work reasonably well for shape graphs, while it can take the larger context of the entire scene into account by including the local neighborhood of the parts in the embedding, therefore it could be used to determine the final scene arrangement.

This naive arrangement technique comes with significant limitations nonetheless.

First, the optimal labels for the individual parts are interdependent: since most real-world objects are spatially constrained, it can be argued that nearby parts are more likely to belong to the same object, and therefore have the same label.

Consequently, the choosing the final label of a given part of the scene influences the optimal label for other nearby parts.

Moreover, the application might pose other constraints and requirements that the final pairing has to satisfy. In the Tangible Mixed Reality setting the presence of some objects might be required, or the total number of objects placed in the scene might be limited. To address these issues, a global optimization mechanism is proposed that is capable of finding a near-optimal arrangement reliably.

In the first section an overview of global optimization methods is provided. In the second section the problem of optimal scene arrangement is examined, and the cost function and constraints of the problem are established. The third section details the global optimization methods used to solve the problem. To increase the likelihood of finding the optimal arrangement, novel operators are proposed and integrated into the optimization methods. The fourth section presents the method and the results of evaluation, while also demonstrating the eﬀiciency of the optimization methods. The tests performed prove that the proposed operators provide significant improvement.

4.1 Global Optimization

Optimization problems are frequent in learning computer vision and scene under-standing [111]. In many cases, the optimization problem is relatively simple (for

Márton Szemenyei 64/130 ARRANGEMENT IN SCENES

instance, if the cost function and constraints are convex), and may be solved using standard gradient-based or second order methods. In constrained cases, methods for solving linear or quadratic programs may be used [112].

Notably, the base problem is rather similar to the assignment problem, which can be solved in polynomial time using the Hungarian algorithm [113]. In our case, however, it is not necessary to assign at least a single part of the scene to all classes, since all virtual object classes might not be present. Moreover, the problem posed has other restrictions and criteria, which are elaborated in Section 4.2.

To solve more complex optimization problems, one needs to rely on heuristic meth-ods. These algorithms are capable of finding global optima even in hard problems, however, it is diﬀicult (and in most cases impossible) to guarantee their convergence.

Moreover, according to the No Free Lunch Theorem (NFLT) [114], no optimization method can outperform brute-force search when averaged over all the optimization problems.

Although gradient-based or second order methods are notorious for getting stuck in local minima, they can be extended for global optimization. Momentum-based extensions are frequently used in the field of deep learning, even though - surprisingly - the stochastic gradient descent works reasonably well in this setting. In other cases heuristic methods, such as Random-restart (shotgun) hill climbing, or the Q-Gradient extensions [115] may be used.

These algorithms, however, rely on the gradient of the cost function to some extent.

If the objective function or the gradient cannot be computed, these methods are unusable. Moreover, there may be constraints on some variables that make the problem NP-hard, as is the case with integer or binary linear programming.

The Simulated Annealing (SA) [116] algorithm offers solutions to both of these problems. In essence SA is a hill-climbing method, except that instead of using the gradient, SA uses a neighborhood definition to find new solutions to evaluate. This way SA is usable for problems where the gradient cannot be computed as long as a neighborhood criteria is defined, and neighbors can be found eﬀiciently.

Moreover, the SA algorithm is willing to take steps that result in a worse-than-current solution with a certain probability, meaning that the method is able to get out of local minima. The probability of accepting a worse solution is controlled by a variable usually referred to as the temperature T. To ensure high exploration in the beginning phases and convergence at the final stages of the algorithm’s run, the temperature is initially set to a high value and decreased at every step.

In cases when it is diﬀicult to define neighborhood or to generate random neighbors,

one may rely on simply evaluating random solution candidates and selecting the best.

This brute-force method is rather costly, however, and may be improved greatly by selecting the candidates to evaluate wisely. Bayesian Optimization [75] achieves this by treating the objective function as a random function with some (assumed) prior distribution. Random samples are then treated as observations, and used to construct a posterior distribution of the objective function. The posterior is then used to determine the next sample to evaluate.

4.1.1 Biologically Inspired Optimization

An especially relevant class of global optimization algorithms are inspired by mech-anisms present in nature. These methods are usually similar to Bayesian Optimiza-tion in the sense that they rely on trial-and-error evaluaOptimiza-tion of candidate soluOptimiza-tion, while adopting a well-chosen scheme for generating new candidates. Algorithms based on bee or ant colonies [117] and Particle Swarm Optimization [118] are pri-mary examples of this.

One of the most popular biologically inspired methods are genetic or evolutionary algorithms [119]. The idea behind these methods is to implement a scheme similar to biological evolution: simulating subsequent generations of solution candidates (indi-viduals), using fitness-based selection for offspring generation, and random mutation to create a (hopefully) better population.

One great advantage of genetic algorithms is that by implementing problem specific crossover and mutation operators, it is possible to apply these algorithms eﬀiciently to almost any kind of problem. Weise [120] details the most commonly used op-erators for genetic methods. There are also numerous fitness scaling and selection schemes, which influence convergence greatly [120].

Moreover, a great number of further heuristics exist, aiming to increase the eﬀiciency of genetic algorithms. On such idea is to use multiple populations instead of just one, and implement some form of migration [121] between them. This enables the algorithm to explore the parameter space more easily, since populations share similar genes, therefore tend to converge [122].

Genetic algorithms are not guaranteed to improve the best individual every situa-tion, which might result in the optimal solution disappearing from the population.

In order to avoid this, it is possible to introduce elitism to the selection strategy [123].

This means, that the best few individuals always survive, and become a part of the next generation unchanged.

Márton Szemenyei 66/130 ARRANGEMENT IN SCENES

In document 3D Shape Recognition Methods for Tangible User Interfaces (Pldal 63-68)