Scene Optimization - 3D Shape Recognition Methods for Tangible User Interfaces

classes, andβ is the hyperparameter controlling the relative importance of this part of the cost function. With all extensions covered, we present the complete cost function.

C_total =c^tx+C_comp+C_cont−

∑

i=1

r_iN_i (4.4)

whereri is the reward for the presence of thei^th class, andNi is the number of node clusters that have the label i.

close to the optimal solution, making it likely for the algorithm to converge to the optimum.

For the greedy and the SA algorithms, neighborhood is defined as a single difference in the labels of the nodes, resulting in a cheap way of generating all neighbors of a given solution. It is important to note, however, that not all neighbors of a feasible solution are feasible, since they are not guaranteed to contain an instance of all required classes. This is not a problem, however, since the set of feasible solutions is not guaranteed to be convex or connected, meaning the algorithm may have to step through infeasible solutions to achieve better exploration.

One significant drawback of both methods is that they depend heavily on the initial point, since their ability to explore is somewhat limited. If the classification method has low accuracy, then the initialization may be too far from the true solution for these methods to converge reliably. It may make sense to use alternatives that achieve better exploration at the cost of more function evaluations.

For this reason, a simple single population genetic algorithm with elitist strategy is used for scene optimization. The algorithm employed random initial population generation, and a random n-point crossover. The mutation operator used assigns a random label to a single random node, meaning it is equivalent to generating a single random neighbor of the original individual using the neighborhood definition above. For selection stochastic uniform sampling and rank-based fitness scaling are used.

A major drawback of all three optimization methods that they cannot naturally handle constrained optimization problems, meaning the cost function has to be modified. This is achieved by explicitly checking whether a solution satisfies the constraints and adding a large constant penalty to the objective function if it doesn’t.

Note, that all parts of the cost function are limited due to normalization, therefore an appropriate value can be chosen.

It is worth noting that genetic algorithms also depend on the initialization, mutation and crossover operators to converge well. This opens the opportunity to vastly improve the eﬀiciency of the optimization scheme by creating new operators that improve the method’s ability to explore the problem space eﬀiciently, while wasting as little time as possible on infeasible solutions.

4.3.1 Class Score Optimal Initialization

The first step of initialization is to determine the population size. Since the number of parameters - and consequently the dimensionality of the search space - depends

on the size of the scene, determining the size of the initial population based on the size of the scene might be reasonable. In the case of shape graphs, setting the population size to 10times the number of nodes worked relatively well.

The custom initialization proposed here has three major requirements to satisfy:

First, the initial population should be relatively close to the global optimum to ensure quick convergence. Second, the initial population should not contain any infeasible solutions. Third, the initial population should be relatively diverse, other-wise the algorithm might converge too quickly in a local minimum without properly exploring the problems space first.

Keeping these requirements in mind, the proposed Class Score Optimal Initialization (CSOI) method begins by adding a single individual to the population: the indi-vidual that maximizes the classification scores. Since this indiindi-vidual might not be feasible, since one or more required classes may be missing, a quick neighborhood search is performed to find a nearby feasible solution that results in the smallest drop in the objective function.

Then, the mutation operator is used to generate the rest of the initial population.

Since the mutation operator is not guaranteed to produce solutions that satisfy the class requirements, the same technique is applied on the mutated individuals.

To ensure the diversity of the initial population, identical individuals are checked for and replaced with newly mutated objects. Moreover, to further increase the diversity, a certain percentage of the individual population is generated by applying multiple mutations in succession.

4.3.2 Random Drag and Shuffle Mutation

The point of the mutation operator is to allow the genetic algorithm to escape local minima by randomly finding better regions of the parameter space. The standard mutation operator for binary/nominal integer genomes is the random flip operator, which randomly changes the label of a single node. In this case, however, the random flip operator is very likely to create a significantly worse candidate solution, because of the compactness element in the cost function. This means, that the mutated solution is not particularly likely to survive many generations, and allow the algorithm to explore other regions of the parameter space.

To solve this, we introduce the concept of random drag: the node whose label is changed may “drag” other nearby nodes with it with a certain probability, mean-ing these other nodes will be assigned the same new label. The drag probability influences the trade-off between node mutation and cluster mutation. Note, that

Márton Szemenyei 72/130 ARRANGEMENT IN SCENES

the former is still vital to allow the genetic algorithm to correct mistakes in the classification or to separate close objects. By allowing both kinds of mutation to occur, the parameter space can be explored more eﬀiciently.

A further idea is to allow the mutation operator to permute the labels themselves with a given probability. This allows the algorithm to consider other combinations of classes, which would require several rounds of subsequent mutations otherwise. The Random Drag and Shuffle Mutation (RDSM) operator is illustrated in Figure 4.1.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(a)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(b)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(c)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(d)

Figure 4.1: The mutation operator: (a) is the original setup, (b) is the traditional flip mutation. (c) is flip mutation using drag, and (d) shows label permutation.

A further advantage of introducing this operator is that by using it for the CSOI initialization scheme, the diversity of the initial population can be increased even further, especially if the probabilities of the random drag and shuffle occurring are set higher than normal.

4.3.3 Clustered N-Point Crossover

The standard crossover operator for binary/nominal integer genomes is the N-point intersection. This operator randomly places N points in the genome, effectively

dividing it into N + 1 parts. Then, the offspring inherits these sections from the two parents in an alternating way. The problem with this operator is similar to the case of the random flip mutation, namely, that defining the crossover on the level of nodes may lead to the creation of a high number of inferior offspring.

This problem can be solved by applying an idea similar to the random drag: the intersection operator should be defined on node clusters instead of the nodes them-selves. This means that all nodes are assigned to a cluster based on their proximity, using an adaptive distance threshold to divide clusters. Then, the clusters are or-dered randomly, and divided into N intervals. The labels are then inherited from the parents alternatingly.

Note that in this case the cluster-based crossover occurs 100% of the time, instead of randomly switching between node-based and cluster-based inheritance. Our rea-soning for doing this is that the RDSM mutation is already capable of introduc-ing sintroduc-ingle-point mutations, so addintroduc-ing this capability twice would be superfluous.

Arguably, the mutation operator is the right place to perform small-scale, single-point modifications on the genome conceptually speaking. The Clustered N-Point Crossover (CNPC) operator is illustrated in Figure 4.2.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(a)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(b)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5 -1 -0.5 0 0.5 1 1.5 2

(c)

Figure 4.2: The crossover operator: (a) and (b) are the parents, (c) is the offspring.

Márton Szemenyei 74/130 ARRANGEMENT IN SCENES

4.3.4 Context Reward Optimization

In Section 4.2.1, the idea of using context information in the cost function was in-troduced. This way, the optimization algorithm is enabled to prefer certain virtual objects being close or distant. This is especially useful for countering the compact-ness criterion for certain categories only, for reasons explained in previous sections.

However the earlier discussion has not clarified how the context rewards (σ_ij from Eq. 4.3) are determined for all the possible class combinations. The simplest solution is to let the researcher set them using trial and error methods. This, however, is not only time consuming, it is also likely to underperform automated optimization.

For this reason, a simple optimization procedure is applied to determine the context reward parameters. In order to achieve this, the genetic algorithm is run using the operators introduced in the previous subsection, and the n best unique solutions from the last generation are selected. Using these, we propose the following cost function:

C =−

∑n i=1

c_i−c_true (4.5)

wherec_i is the cost of thei^thbest solution, whilec_trueis the cost of the ground truth solution. Using this cost function, the stochastic gradient descent (SGD) algorithm is used to find the optimal values of the context rewards, using to the derivative computed from the cost function (Eq. 4.3):

∂C

∂σ_i,j =β

∑N i=1

d_k,ij −d_true,ij (4.6)

where d_k,ij and d_true,ij are the minimum distance between nodes belonging to the class i and j in the k^th best, and the true candidates respectively.

There is one important modification made to the algorithm described above, namely, that after every few epochs of the SGD, the genetic optimization algorithm is used again to find the bestn candidates again before continuing with the SGD optimiza-tion. This technique prevents the algorithm from returning context rewards that although make the true solution better than the original best, but only to result in a new incorrect solution to become the optimum.

In document 3D Shape Recognition Methods for Tangible User Interfaces (Pldal 71-77)