• Nem Talált Eredményt

4.7 Algorithm Parameters

4.8.4 Comparison of Global and PGlobal Implementations

The aim of the comparison is to reveal the differences between the Global and PGlobal implementations regarding the number of function evaluations. We applied the same configuration for both optimizers; we ran PGlobal on a single thread the same way as we ran Global. We studied 3 different local search algorithms and 63 test functions to have sufficient data. We ran every (global optimizer, local opti-mizer, test function) configuration 100 times to determine the necessary number of function evaluations to find the global optimum. We dropped any results of the 100 runs that were unsuccessful by using more than 105 function evaluations, and we calculated the average of the remaining values. We call the ratio of successful runs among all runs the robustness of the optimization. We only studied the configura-tions that had a 100% robustness for both global optimizers.

The following is the configuration of the optimizer algorithm parameters for this test:

4.8 Results 63

</NewSampleSize>

<SampleReducingFactor type="double">

0.03999

</SampleReducingFactor>

<LocalOptimizer

class="optimizer.local.parallel.<local_optimizer>">

<MaxFunctionEvaluations type="long">

100000

</MaxFunctionEvaluations>

<RelativeConvergence type="double">

0.00000001

</RelativeConvergence>

<LineSearchFunction

class="optimizer.line.parallel.LineSearchImpl">

</LineSearchFunction>

</LocalOptimizer>

<Clusterizer class="<clusterizer>">

<Alpha type="double">

0.01

</Alpha>

</Clusterizer>

</Global>

Fig. 4.8 Distribution of relative differences between Global and PGlobal

Figure4.8shows that in 80% of compared configurations, the relative difference was lower than 7%. The correlation between the two data vectors is 99.87%. The differences in the results are caused by many factors. The results were produced based on random numbers which can cause an error of a few percent. For the two optimizer processes, the data were generated in a different manner which can also cause uncertainties. These differences are hugely amplified by local optimization.

If every local search would converge into the global optimum, the number of

func-64 4 Parallelization tion evaluations would be dominated by one local search. In case of multiple local searches, the exact number is highly uncertain. With random starting points and not optimized step lengths, the local search converges to a nearly random local op-tima. The proportion of function evaluations will approximate the number of local searches; hence the number of function evaluations is unstable in these cases. We observed that on functions which have many local optima added to a slower function as noise, the differences are in the common range in contrast to the high differences that can be observed on “flat” and noisy functions. We suspect that the low noise behavior is caused by the implicit averaging of the gradients along the local search.

0 5 10 15 20 25

−59 −57 −55 −53 −51 −49 −47 −45 −43 −41 −39 −37 −35 −33 −31 −29 −27 −25 −23 −21 −19 −17 −15 −13 −11 −9 −7 −5 −3 −1 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Relative differences (%) Histogram of relative differences

Fig. 4.9 Relative difference in the number of function evaluations of PGlobal compared to Global

On Figure4.9most of the relative error results are in the[−7,3]range. It shows a slight tendency that PGlobal uses on average less function evaluations. Most of the extreme values also favor PGlobal.

Finally, we have tested three local search methods whether the parallel use on a single core has different efficiency characteristic. The computational test results are summarized in Tables4.4,4.5, and4.6, for the local search algorithms of NUni-randi, UniNUni-randi, and Rosenbrock, respectively. The last column of these tables gives the relative difference between the number of function evaluations needed for the compared to implementations.

For all three local search techniques, we can draw the main conclusion that the old method and the serialized one do not differ much. In the majority of the cases, the relative difference is negligible, below a few percent. The average relative dif-ferences are also close to zero, i.e., the underlying algorithm variants are basically well balanced. The relative differences of the summed number of function evalua-tions needed by the three local search methods were−1.08%,−3.20%, and 0.00%, respectively.

4.8 Results 65 Table 4.4 NUnirandiCLS results: what is the difference between the Serialized Global and Global in terms of number of function evaluations? The difference is negative when the parallel version is the better

Function Serialized NFEV Global NFEV Difference

Beale 672.9 792.6 15.12%

Booth 605.3 628.0 3.61%

Branin 528.0 559.7 5.65%

Cigar-5 1547.1 1518.1 1.91%

Colville 2103.4 2080.7 1.09%

Discus-40 27,131.8 27,586.0 1.65%

Discus-rot-40 26,087.3 26,927.7 3.12%

Discus-5 4943.0 5297.2 −6.69%

Discus-rot-5 4758.9 4940.1 −3.67%

Easom 1795.6 1708.8 5.08%

Ellipsoid-5 4476.1 4567.3 2.00%

Griewank-20 8661.7 8005.2 8.20%

Hartman-3 605.7 644.6 6.03%

Matyas 625.4 647.2 3.37%

Rosenbrock-5 4331.5 3664.0 18.22%

Shubert 545.9 963.9 43.37%

Six hump 502.9 524.9 4.20%

Sphere-40 4169.8 4178.8 0.22%

Sphere-5 827.2 853.1 3.03%

Sum Squares-40 12,370.8 12,495.6 1.00%

Sum Squares-5 881.0 915.9 3.81%

Sum Squares-60 21,952.9 21,996.8 0.20%

Trid 3200.8 3095.2 3.41%

Zakharov-40 17,958.9 18,334.0 2.05%

Zakharov-5 984.7 1009.5 −2.46%

Average 6090.7 6157.4 2.74%

Table 4.5 UnirandiCLS results: what is the difference between the Serialized Global and Global in terms of number of function evaluations? The difference is negative when the parallel version is the better

Function Serialized NFEV Global NFEV Difference

Beale 762.3 1060.1 28.10%

Booth 576.6 600.8 4.02%

Branin 516.1 540.7 4.55%

Discus-rot-40 33,824.2 35,055.3 3.51%

Discus-5 18,605.9 19,343.3 3.81%

Discus-rot-5 14,561.8 15,513.8 6.14%

Goldstein Price 502.4 584.2 13.99%

Griewank-20 9847.8 10,185.8 3.32%

Matyas 615.2 646.1 4.78%

Shubert 517.0 895.1 −42.24%

Six hump 480.6 501.0 −4.06%

Sphere-40 4083.1 4118.7 −0.86%

Sphere-5 781.9 794.3 −1.56%

Sum Squares-40 24,478.5 24,272.1 0.85%

Sum Squares-5 856.2 867.9 1.35%

Zakharov-40 20,431.5 20,811.9 1.83%

Zakharov-5 953.4 983.9 3.10%

Average 7787.9 8045.6 7.43%

66 4 Parallelization Table 4.6 RosenbrockCLS results: what is the difference between the Serialized Global and Global in terms of number of function evaluations? The difference is negative when the parallel version is the better

Function Serialized NFEV Global NFEV Difference

Beale 709.6 831.9 14.70%

Booth 593.8 616.5 3.68%

Branin 599.9 620.4 3.30%

Six hump 546.5 563.5 3.02%

Cigar-5 1536.9 1600.0 3.94%

Cigar-rot-5 3438.6 3551.2 3.17%

Colville 2221.4 2307.7 3.74%

Discus-40 30,721.0 31,059.7 −1.09%

Discus-rot-40 30,685.2 30,960.3 −0.89%

Discus-5 2946.2 3113.2 −5.36%

Discus-rot-5 2924.3 3085.3 −5.22%

Discus-rot-60 47,740.6 48,086.6 0.72%

Easom 4,664.2 11,178.8 58.28%

Ellipsoid-5 4493.3 4509.8 0.37%

Goldstein Price 569.2 693.0 17.87%

Griewank-20 12,647.5 12,222.8 3.47%

Hartman-3 975.2 1040.7 6.29%

Hartman-6 3047.7 2493.5 22.23%

Matyas 628.5 651.8 3.58%

Powell-24 42,488.6 43,425.8 2.16%

Powell-4 1950.9 2006.7 2.78%

Rosenbrock-5 4204.7 3527.7 19.19%

Shekel-5 4790.0 3775.0 26.89%

Shubert 553.5 1153.3 52.01%

Sphere-40 7788.6 7839.7 0.65%

Sphere-5 905.0 924.2 −2.08%

Sum Squares-40 30,688.5 30,867.8 −0.58%

Sum Squares-5 970.1 1005.3 −3.50%

Sum Squares-60 71,891.9 72,063.5 0.24%

Trid 3919.0 3925.0 y0.15%

Zakharov-40 34,177.5 35,605.0 4.01%

Zakharov-5 1123.7 1178.1 4.62%

Zakharov-60 80,742.9 82,393.3 2.00%

Average 13,269.2 13,269.0 4.19%

4.9 Conclusions

This chapter provided the considerations along which we have designed and imple-mented the parallel version of the GlobalJ algorithm. Our main aim was to have a code that is capable to utilize the widely available computer architectures that support efficient parallelization. The careful testing confirmed our expectations and proved that the parallel implementation of PGlobal can utilize multiple core

com-4.9 Conclusions 67 puter architectures. For easy-to-solve problems with low computational cost, the PGlobal may show weaker efficiency. But for computationally expensive objective functions and for difficult to solve problems, the parallel version of Global can achieve closely linear speedup ratio, i.e., the total solution time can more or less be divided by the number of available CPU cores. The other way around, we have checked what are the costs of parallelization. According to our computational tests, the parallel implementation of the local search algorithms needed mostly somewhat less function evaluations than their serial use—when run on a single core.

Chapter 5

Example

5.1 Environment

Before we can start work with the GLOBAL optimizer package, we must set up a proper environment. The package uses the Java 8 virtual machine. To use the pack-age with compiled objective functions, the Java 8 Runtime Environment (JRE) is sufficient. However, the common case is that the objective function is not compiled, and it implies the need for the Java 8 Development Kit (JDK). Both systems can be downloaded fromhttps://java.com/en/.