• Nem Talált Eredményt

On Benchmarking Frequent Itemset Mining Algorithms

N/A
N/A
Protected

Academic year: 2023

Ossza meg "On Benchmarking Frequent Itemset Mining Algorithms"

Copied!
20
0
0

Teljes szövegt

(1)

On Benchmarking Frequent Itemset Mining Algorithms

Balázs Rácz, Ferenc Bodon, Lars Schmidt-Thieme

Budapest University of Technology and Economics Computer and Automation

Research Institute of the Hungarian Academy of Sciences

Computer-Based New Media Group, Institute for

Computer Science

(2)

History

Over 100 papers on Frequent Itemset Mining

Many of them claim to be the ‘best’

Based on benchmarks run against some publicly available implementation on some datasets

FIMI03, 04 workshop: extensive benchmarks with many implementations and data sets

Serves as a guideline ever since

How ‘fair’ was the benchmark and what did it

measure?

(3)

On FIMI contests

Problem 1: We are interested in the quality of algorithms, but we can only measure

implementations.

No good theoretical data model yet for analytical comparison

We’ll see later: would need good hardware model

Problem 2: If we gave our algorithms and ideas to a very talented and experienced low-level

programmer, that could completely re-draw the current FIMI rankings.

A FIMI contest is all about the ‘constant factor’

(4)

On FIMI contests (2)

Problem 3: Seemingly unimportant implementation details can hide all algorithmic features when

benchmarking.

These details are often unnoticed even by the author and almost never published.

(5)

On FIMI contests (3)

Problem 4: FIM implementations are

complete ‘suites’ of a basic algorithm and several algorithmic/implementational

optimizations. Comparing such complete

‘suites’ tells us what is fast, but does not tell us why.

Recommendation:

Modular programming

Benchmarks on the individual features

(6)

On FIMI contests (4)

Problem 5: All ‘dense’ mining tasks’ run time is dominated by I/O.

Problem 6: On ‘dense’ datasets FIMI

benchmarks are measuring the ability of submitters to code a fast integer-to-string conversion function.

Recommendation:

Have as much identical code as possible

 library of FIM functions

(7)

On FIMI contests (5)

Problem 7: Run time differences are small

Problem 8: Run time varies from run to run

The very same executable on the very same input

Bug or feature of modern hardware?

What to measure?

Recommendation: ‘winner takes all’

evaluation of a mining task is unfair

(8)

On FIMI contests (6)

Problem 9: Traditional run-time (+memory need) benchmarks do not tell us whether an implementation is better than an other in

algorithmic aspects, or implementational (hardware-friendliness) aspects.

Problem 10: Traditional benchmarks do not show whether on a slightly different hardware architecture (like AMD vs. Intel) the

conclusions would still hold or not.

Recommendation: extend benchmarks

(9)

Library and pluggability

Code reusal, pluggable components, data structures

Object oriented design

Do not sacrifice efficiency

No virtual method calls allowed in the core

Then how?

C++ templates

Allow pluggability with inlining

Plugging requires source code change, but several versions can coexist

Sometimes tricky to code with templates

(10)

I/O efficiency

Variations of output routine:

normal-simple: renders each itemset and each item separately to text

normal-cache: caches the string representation of item identifiers

df-buffered: (depth-first) reuses the string

representation of the last line, appends the last item

df-cache: like df-buffered, but also caches the string representation of item identifiers

(11)

0.1 1 10 100

Time (seconds, log-scale)

decoder-test

df-buffered df-cache normal-cache normal-simple

(12)

Benchmarking: desiderata

1. The benchmark should be stable, and

reproducible. Ideally it should have no variation, surely not on the same hardware.

2. The benchmark numbers should reflect the actual performance. The benchmark should be a fairly accurate model of actual hardware.

3. The benchmark should be hardware-independent, in the sense that it should be stable against the

slight variation of the underlying hardware architecture, like changing the processor manufacturer or model.

(13)

Benchmarking: reality

Different implementations stress different aspects of the hardware

Migrating to other hardware:

May be better in one aspect, worse in another one

Ranking cannot be migrated between HW

Complex benchmark results are necessary

Win due to algorithmic or HW-friendliness reason?

Performance is not as simple as ‘run time in

seconds’

(14)

Benchmark platform

Virtual machine

How to define?

How to code the implementations?

Cost function?

Instrumentation (simulation of actual CPU)

Slow (100-fold slower than plain run time)

Accuracy?

Cost function?

(15)

Benchmark platform (2)

Run-time measurement

Performance counters

Present in all modern processor (since i586)

Count performance-related events real-time

PerfCtr kernel patch under Linux, vendor-specific software under Windows

Problem: measured numbers reflect the actual execution, thus are subject to variation

(16)

1 10 100

Time (seconds, log-scale)

BMS-POS.dat apriori-noprune

eclat-cover eclat-diffset nonordfp-classic-td

nonordfp-dense nonordfp-sparse

(17)

0 10 20 30 40 50 60

GClockticks

all uops on BMS-POS at 1000

3 uops/tick 2 uops/tick 1 uop/tick

stall bogus uops

nbogus uops prefetch pending

r/w pending

Three sets of bars:

wide, centered

• total size shows total clockticks

used, i.e. run-time,

• purple shows

time of stall (CPU waiting for sth)

Three sets of bars:

narrow, centered

• brown shows # of instructions (u-ops) executed – stable,

• cyan shows

wasted u-ops due to branch mis-

predictions

Three sets of bars:

narrow, right

• lbrown shows ticks of memory r/w (mostly wait)

• black shows

read-ahead

(prefetch)

(18)

0 10 20 30 40 50 60

GClockticks

all uops on BMS-POS at 1000

3 uops/tick 2 uops/tick 1 uop/tick

stall bogus uops

nbogus uops prefetch pending

r/w pending

(19)

Conclusion

We cannot measure algorithms, only implementations

Modular implementations with pluggable features

Shared code for the common functionality (like I/O)

FIMI library with C++ templates

Benchmark: run time varies, depends on hardware used

Complex benchmarks needed

Conclusions on algorithmic aspects or hardware friendliness?

(20)

Thank you for your attention

Big question: how does the choice of

compiler influence the performance and the

ranking?

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or

In Section 4 we describe the Branch and Bound method designed to solve the leader’s problem, the bounds used in the solution and the pseudocode of the

Keywords: Visual programming, Building information modeling, Architecture, Automate, Algorithm, Computer aided design, Building industry workflow, Database

Whether academics like it or not, Jeffrey Beall continues to exert influence on global academia as a direct result of the creation of his blacklists of “predatory”

The hardware availability for network functions imple- menting uRLL services in LL DCs depends on both the availability of single hardware units (commodity server or hard switch,

the IDLS and having class in a traditional classroom students would often comment that “it was not that different,” yet they would follow the comment with a statement that specified

The notions of algorithms, associative algorithms, the regular, the quasi- regular and the anti-regular algorithm were introduced in [3], [4] and [5] for interval filling sequences