Discussion - 3D Shape Recognition Methods for Tangible User Interfaces

0.0 0.2 0.4 0.6

0.00.51.01.52.02.53.0

Data w. Post. Pred.

SceneClassGlobValRef and NNScene[, 1]

Probability

N=22 Mean difference

µdiff

0.0 0.1 0.2 0.3 0.4 0.5

median=0.30

0% < 0 < 100%

95% HDI

0.19 0.42

Std. Dev. of difference

σdiff

0.2 0.3 0.4 0.5

median=0.25

95% HDI

0.18 0.35

Effect Size

(^µdiff−0) ^σdiff

0.0 0.5 1.0 1.5 2.0 2.5

median=1.2

0% < 0 < 100%

95% HDI

0.65 1.8

Figure 4.8: Bayesian t-test comparing the raw and explicit embedding on the scene versions datasets.

Metric e_c e_cost

Context No Yes No Yes

Synthetic Images 71.5 82.1 92.7 66.4 Real Images 82.7 91.3 47.9 27.9 Table 4.3: Results before and after the context optimization

Note that running Bayesian t-tests on these results would not be advised due to the low number of independent datasets, therefore we do not make statements about the effect of context optimization, except that is it viable on these two datasets.

0.5 0.6 0.7 0.8 0.9 1.0

051015

Data w. Post. Pred.

GAOpt[, 1] and GAOpt[, 4]

Probability

N=24 Mean difference

µdiff

0.0 0.2 0.4 0.6 0.8 1.0

median=0.93

0% < 0 < 100%

95% HDI 0.910.95

Std. Dev. of difference

σdiff

0.02 0.04 0.06 0.08 0.10 0.12

median=0.038

95% HDI

0.021 0.060

Effect Size

(µdiff−0) σdiff

0 20 40 60 80

median=25

0% < 0 < 100%

95% HDI

13 39

−0.8 −0.7 −0.6 −0.5 −0.4 −0.3

0123456

Data w. Post. Pred.

GAClass[, 1] and GAClass[, 4]

Probability

N=24 Mean difference

µdiff

−0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0

median=−0.63

100% < 0 < 0%

95% HDI

−0.66 −0.59

Std. Dev. of difference

σdiff

0.04 0.06 0.08 0.10 0.12 0.14 0.16

median=0.083

95% HDI

0.054 0.12

Effect Size

(µdiff−0) σdiff

−20 −15 −10 −5 0

median=−7.6

100% < 0 < 0%

95% HDI

−11 −4.9

Figure 4.9: Bayesian t-test between vanilla GA and using all custom operators for eopt(top) and ec (bottom).

the performance of this method, three novel operators were proposed for initializa-tion, mutation and crossover. The feasibility of the proposed objective function and genetic operators, several Bayesian t-tests were performed which demonstrated the feasibility and eﬀiciency of these methods. The proposed localization method was also compared to graph-attention networks, demonstrating the superiority of our algorithm in terms of classification accuracy.

Exploiting spatial compactness of physical objects was a key element of both the objective function and the genetic operators, which means that these methods are likely to result in inferior performance if this assumption does not hold. While on paper this is undeniably a limitation, still, in reality, the spatial compactness of logical entities is a relatively universal principle. This is not to say that exceptions do not exist, rather that in our view they are rare.

As emphasized in previous discussion sections, the usability of the proposed methods is not limited to localization and pairing in shape-graphs, but can be applied to

various image-based localization tasks, since the aforementioned assumption usually holds in case of images, or generally when physical objects are concerned.

With these considerations, the third thesis statement is given as follows:

Thesis 3

I developed a comprehensive, eﬀicient search method for globally optimal ar-rangement of classes in scenes composed of objects by exploiting geometric properties of the problem. I formulated of the problem so that the opti-mization method could consider design concerns relevant to Tangible Mixed Reality Systems. I verified via statistical testing that the new algorithm surpasses previous graph-based object detection methods on numerous scene graph datasets.

(a) I developed novel genetic operators that provide a statistically sig-nificant increase in the probability of finding the global optimum in optimization problems where compactness of object can be assumed.

[2, 3, 1, 9]

Márton Szemenyei 86/130 ARRANGEMENT IN SCENES

5 Conclusion

In this thesis our stated goal was to propose a scheme composed of novel methods that can eﬀiciently search for a globally optimal arrangement of virtual objects in a complex 3D scene. The algorithm’s object matching is based on 3D shape similarity, while it is augmented by several further optional criteria that may be of use to the designer of the Adaptive Mixed Reality (AMR) environment. These criteria were designed to aid the automatic placement of virtual objects so that the final arrangement would be valid and intuitive.

The basic principle of the AMR system is to break down the 3D scene into primitive shapes using a variant of the RANSAC algorithm. From these primitive shapes, a graph is constructed, with the shapes themselves serving as nodes, while the edges describe the geometric relations between the nodes. First a node-by node classification is performed, which aims to sort these building blocks into virtual object categories. Then, based on these classification scores, a global optimization method determines the final arrangement of the virtual objects.

The thesis introduced three novel solutions to enhance the quality of the final ar-rangement. The first proposed method is for embedding the local context of graph nodes into a vector space thus increasing the accuracy of the node-by-node classi-fication. The second method aims to perform discriminant analysis on structured composite classes, which helps retain dimensions useful not only for classification but for pose estimation as well. The final method attempts to find the globally optimal arrangement of labels using part-based classification results as initialization.

The first proposed method is a novel graph node embedding framework, allowing for robust classification of graph nodes in situations, where the local context of the node also contains features necessary for determining the correct class. Our work introduced two solutions for this problem: an explicit embedding framework based on the spectral decomposition of graphs, and an extension of the random walk kernel.

The proposed framework is generally applicable: The connectivity of the graph and the type of features pose no restrictions, as long as kernel functions can be defined between the nodes and edges.

The proposed methods were evaluated on several datasets, and the paired samples Bayesian t-test was used to compare the methods against the baseline (classifying

nodes using only their own features) and each other. The tests show that both proposed methods outperformed the baseline with a probability close to100%, while the random walk extension is similarly likely to outperform the explicit embedding.

However, the explicit embedding is considerably less likely to overfit. The probability of a positive effect size is 98.2% in this case.

The second proposed algorithm was a discriminant analysis method aiming to find the dimensions that separate Structured Composite Classes where elements of the objects are drawn from Gaussian distributions. The goal of the method is to retain dimensions necessary for separating the classes, as well as the different nodes within a single object instance. The proposed method solves this by introducing an extra term into the cost function of Subclass Discriminant Analysis (SDA). Furthermore, a novel method for selecting the number of clusters needed by subclass methods was introduced.

The methods were evaluated using two criteria, one measuring the separation be-tween classes, the other measuring it within object instances. The evaluation showed that the two main proposed methods (SCDA and SSCDA) credibly outperformed previous methods, including LDA and SDA on both metrics. Moreover, the com-parison of these two algorithms show SSCDA clearly outperforming SCDA on at least one of the metrics. Finally, the methods for selecting the number of subclasses performed well, managing to find solutions close to the optimum at considerably lower computation cost.

The third proposed method aims to find the globally optimal arrangement of virtual object categories in a 3D scene. First, an appropriate cost function was constructed, including criteria enforcing the compactness of objects, as well as user-settable cri-teria, such as required object categories and context rewards. Then, a genetic algo-rithm and simulated annealing were applied to the problem. To further increase the effectiveness and eﬀiciency of the genetic algorithm, custom initialization, mutation and crossover operators were proposed. Finally, an automated way of determining the context reward coeﬀicients was introduced.

After evaluating the methods on the datasets, the results clearly show the global optimization scheme outperforming the individual classification scheme. Moreover, the genetic algorithm with custom operators was shown to be far superior to other methods, with each proposed operators contributing credibly positively to the final accuracy. Finally, it was show that the embedding method credibly increases the accuracy of the final arrangement.

We have evaluated the results of the proposed algorithmic set on numerous different datasets, and our results show low error rates achieved at reasonable speed. Since

Márton Szemenyei 88/130 CONCLUSION

the initial setup of the scene is performed only once at the beginning, a few seconds of processing time is well within the acceptable range. Our experiments also showed that the proposed methods provided an increase in accuracy compared to previous or standard solutions. Thus the proposed algorithms were shown to be eﬀicient for the problem described above, furthermore, they were shown not to be limited to a single application.

In conclusion, our methods provide a feasible scheme for automatically pairing vir-tual objects to real placeholders in an Adaptive Mixed Reality environment. It is worth noting, that while the error rate of our method is relatively low (0.62% on the real image-based dataset), it still provides erroneous assignments at times. These, however only present a mild inconvenience to the user, as the occasional error could be easily corrected manually. Without automatic pairing, manual assignment would have to be performed for all virtual objects. For reproducibility, the implementa-tion of these methods, tests, as well as the datasets used for evaluaimplementa-tion are available online at Szemenyei [37].

In document 3D Shape Recognition Methods for Tangible User Interfaces (Pldal 85-90)