Department of Computer Science, University of Waikato, New Zealand
Eibe Frank
WEKA: A Machine Learning Toolkit
The Explorer
•
Classification and Regression
•
Clustering
•
Association Rules
•
Attribute Selection
•
Data Visualization
The Experimenter
The Knowledge Flow GUI
Conclusions
Machine Learning with
WEKA
03/25/23 University of Waikato 2
WEKA: the bird
Copyright: Martin Kramer (mkramer@wxs.nl)
03/25/23 University of Waikato 3
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
03/25/23 University of Waikato 4
WEKA: versions
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with description in data mining book
WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of improvements
This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
03/25/23 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present ...
WEKA only deals with
“flat” files
03/25/23 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present ...
WEKA only deals with
“flat” files
03/25/23 University of Waikato 7
03/25/23 University of Waikato 8
03/25/23 University of Waikato 9
03/25/23 University of Waikato 10
Explorer: pre-
processing the data
Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
03/25/23 University of Waikato 11
03/25/23 University of Waikato 12
03/25/23 University of Waikato 13
03/25/23 University of Waikato 14
03/25/23 University of Waikato 15
03/25/23 University of Waikato 16
03/25/23 University of Waikato 17
03/25/23 University of Waikato 18
03/25/23 University of Waikato 19
03/25/23 University of Waikato 20
03/25/23 University of Waikato 21
03/25/23 University of Waikato 22
03/25/23 University of Waikato 23
03/25/23 University of Waikato 24
03/25/23 University of Waikato 25
03/25/23 University of Waikato 26
03/25/23 University of Waikato 27
03/25/23 University of Waikato 28
03/25/23 University of Waikato 29
03/25/23 University of Waikato 30
03/25/23 University of Waikato 31
03/25/23 University of Waikato 32
Explorer: building
“classifiers”
Classifiers in WEKA are models for predicting nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
“Meta”-classifiers include:
Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
03/25/23 University of Waikato 33
03/25/23 University of Waikato 34
03/25/23 University of Waikato 35
03/25/23 University of Waikato 36
03/25/23 University of Waikato 37
03/25/23 University of Waikato 38
03/25/23 University of Waikato 39
03/25/23 University of Waikato 40
03/25/23 University of Waikato 41
03/25/23 University of Waikato 42
03/25/23 University of Waikato 43
03/25/23 University of Waikato 44
03/25/23 University of Waikato 45
03/25/23 University of Waikato 46
03/25/23 University of Waikato 47
03/25/23 University of Waikato 48
03/25/23 University of Waikato 49
03/25/23 University of Waikato 50
03/25/23 University of Waikato 51
03/25/23 University of Waikato 52
03/25/23 University of Waikato 53
03/25/23 University of Waikato 54
03/25/23 University of Waikato 55
03/25/23 University of Waikato 56
03/25/23 University of Waikato 57
03/25/23 University of Waikato 58
03/25/23 University of Waikato 59
03/25/23 University of Waikato 60
03/25/23 University of Waikato 61
03/25/23 University of Waikato 62
03/25/23 University of Waikato 63
03/25/23 University of Waikato 64
03/25/23 University of Waikato 65
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 66
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 67
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 68
03/25/23 University of Waikato 69
03/25/23 University of Waikato 70
03/25/23 University of Waikato 71
03/25/23 University of Waikato 72
03/25/23 University of Waikato 73
03/25/23 University of Waikato 74
03/25/23 University of Waikato 75
Quic kTime™ and a TIFF (LZW) dec ompres s or are needed to s ee this pic ture.
03/25/23 University of Waikato 76
03/25/23 University of Waikato 77
03/25/23 University of Waikato 78
03/25/23 University of Waikato 79
03/25/23 University of Waikato 80
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 81
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 82
03/25/23 University of Waikato 83
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
03/25/23 University of Waikato 84
03/25/23 University of Waikato 85
03/25/23 University of Waikato 86
03/25/23 University of Waikato 87
03/25/23 University of Waikato 88
03/25/23 University of Waikato 89
03/25/23 University of Waikato 90
03/25/23 University of Waikato 91
03/25/23 University of Waikato 92
Explorer: clustering data
WEKA contains “clusterers” for finding groups of similar instances in a dataset
Implemented schemes are:
k-Means, EM, Cobweb, X-means, FarthestFirst
Clusters can be visualized and compared to “true”
clusters (if given)
Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
03/25/23 University of Waikato 93
03/25/23 University of Waikato 94
03/25/23 University of Waikato 95
03/25/23 University of Waikato 96
03/25/23 University of Waikato 97
03/25/23 University of Waikato 98
03/25/23 University of Waikato 99
03/25/23 University of Waikato 100
03/25/23 University of Waikato 101
03/25/23 University of Waikato 102
03/25/23 University of Waikato 103
03/25/23 University of Waikato 104
03/25/23 University of Waikato 105
03/25/23 University of Waikato 106
03/25/23 University of Waikato 107
03/25/23 University of Waikato 108
Explorer: finding associations
WEKA contains an implementation of the Apriori algorithm for learning association rules
Works only with discrete data
Can identify statistical dependencies between groups of attributes:
milk, butter bread, eggs (with confidence 0.9 and support 2000)
Apriori can compute all rules that have a given
minimum support and exceed a given confidence
03/25/23 University of Waikato 109
03/25/23 University of Waikato 110
03/25/23 University of Waikato 111
03/25/23 University of Waikato 112
03/25/23 University of Waikato 113
03/25/23 University of Waikato 114
03/25/23 University of Waikato 115
03/25/23 University of Waikato 116
Explorer: attribute selection
Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper, information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary
combinations of these two
03/25/23 University of Waikato 117
03/25/23 University of Waikato 118
03/25/23 University of Waikato 119
03/25/23 University of Waikato 120
03/25/23 University of Waikato 121
03/25/23 University of Waikato 122
03/25/23 University of Waikato 123
03/25/23 University of Waikato 124
03/25/23 University of Waikato 125
Explorer: data visualization
Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem
WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
Color-coded class values
“Jitter” option to deal with nominal attributes (and to detect “hidden” data points)
“Zoom-in” function
03/25/23 University of Waikato 126
03/25/23 University of Waikato 127
03/25/23 University of Waikato 128
03/25/23 University of Waikato 129
03/25/23 University of Waikato 130
03/25/23 University of Waikato 131
03/25/23 University of Waikato 132
03/25/23 University of Waikato 133
03/25/23 University of Waikato 134
03/25/23 University of Waikato 135
03/25/23 University of Waikato 136
03/25/23 University of Waikato 137
03/25/23 University of Waikato 138
Performing experiments
Experimenter makes it easy to compare the performance of different learning schemes
For classification and regression problems
Results can be written into file or database
Evaluation options: cross-validation, learning curve, hold-out
Can also iterate over different parameter settings
Significance-testing built in!
03/25/23 University of Waikato 139
03/25/23 University of Waikato 140
03/25/23 University of Waikato 141
03/25/23 University of Waikato 142
03/25/23 University of Waikato 143
03/25/23 University of Waikato 144
03/25/23 University of Waikato 145
03/25/23 University of Waikato 146
03/25/23 University of Waikato 147
03/25/23 University of Waikato 148
03/25/23 University of Waikato 149
03/25/23 University of Waikato 150
03/25/23 University of Waikato 151
03/25/23 University of Waikato 152
The Knowledge Flow GUI
New graphical user interface for WEKA
Java-Beans-based interface for setting up and running machine learning experiments
Data sources, classifiers, etc. are beans and can be connected graphically
Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
Layouts can be saved and loaded again later
03/25/23 University of Waikato 153
03/25/23 University of Waikato 154
03/25/23 University of Waikato 155
03/25/23 University of Waikato 156
03/25/23 University of Waikato 157
03/25/23 University of Waikato 158
03/25/23 University of Waikato 159
03/25/23 University of Waikato 160
03/25/23 University of Waikato 161
03/25/23 University of Waikato 162
03/25/23 University of Waikato 163
03/25/23 University of Waikato 164
03/25/23 University of Waikato 165
03/25/23 University of Waikato 166
03/25/23 University of Waikato 167
03/25/23 University of Waikato 168
03/25/23 University of Waikato 169
03/25/23 University of Waikato 170
03/25/23 University of Waikato 171
03/25/23 University of Waikato 172
03/25/23 University of Waikato 173