• Nem Talált Eredményt

Machine Learning with WEKA

N/A
N/A
Protected

Academic year: 2023

Ossza meg "Machine Learning with WEKA"

Copied!
173
0
0

Teljes szövegt

(1)

Department of Computer Science, University of Waikato, New Zealand

Eibe Frank

WEKA: A Machine Learning Toolkit

The Explorer

Classification and Regression

Clustering

Association Rules

Attribute Selection

Data Visualization

The Experimenter

The Knowledge Flow GUI

Conclusions

Machine Learning with

WEKA

(2)

03/25/23 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)

(3)

03/25/23 University of Waikato 3

WEKA: the software

 Machine learning/data mining software written in Java (distributed under the GNU Public License)

 Used for research, education, and applications

 Complements “Data Mining” by Witten & Frank

 Main features:

 Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

 Graphical user interfaces (incl. data visualization)

 Environment for comparing learning algorithms

(4)

03/25/23 University of Waikato 4

WEKA: versions

 There are several versions of WEKA:

 WEKA 3.0: “book version” compatible with description in data mining book

 WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)

 WEKA 3.3: “development version” with lots of improvements

 This talk is based on the latest snapshot of WEKA

3.3 (soon to be WEKA 3.4)

(5)

03/25/23 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present ...

WEKA only deals with

“flat” files

(6)

03/25/23 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric

@attribute sex { female, male}

@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}

@attribute cholesterol numeric

@attribute exercise_induced_angina { no, yes}

@attribute class { present, not_present}

@data

63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present

67,male,asympt,229,yes,present

38,female,non_anginal,?,no,not_present ...

WEKA only deals with

“flat” files

(7)

03/25/23 University of Waikato 7

(8)

03/25/23 University of Waikato 8

(9)

03/25/23 University of Waikato 9

(10)

03/25/23 University of Waikato 10

Explorer: pre-

processing the data

 Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary

 Data can also be read from a URL or from an SQL database (using JDBC)

 Pre-processing tools in WEKA are called “filters”

 WEKA contains filters for:

 Discretization, normalization, resampling, attribute

selection, transforming and combining attributes, …

(11)

03/25/23 University of Waikato 11

(12)

03/25/23 University of Waikato 12

(13)

03/25/23 University of Waikato 13

(14)

03/25/23 University of Waikato 14

(15)

03/25/23 University of Waikato 15

(16)

03/25/23 University of Waikato 16

(17)

03/25/23 University of Waikato 17

(18)

03/25/23 University of Waikato 18

(19)

03/25/23 University of Waikato 19

(20)

03/25/23 University of Waikato 20

(21)

03/25/23 University of Waikato 21

(22)

03/25/23 University of Waikato 22

(23)

03/25/23 University of Waikato 23

(24)

03/25/23 University of Waikato 24

(25)

03/25/23 University of Waikato 25

(26)

03/25/23 University of Waikato 26

(27)

03/25/23 University of Waikato 27

(28)

03/25/23 University of Waikato 28

(29)

03/25/23 University of Waikato 29

(30)

03/25/23 University of Waikato 30

(31)

03/25/23 University of Waikato 31

(32)

03/25/23 University of Waikato 32

Explorer: building

“classifiers”

 Classifiers in WEKA are models for predicting nominal or numeric quantities

 Implemented learning schemes include:

 Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

 “Meta”-classifiers include:

 Bagging, boosting, stacking, error-correcting output

codes, locally weighted learning, …

(33)

03/25/23 University of Waikato 33

(34)

03/25/23 University of Waikato 34

(35)

03/25/23 University of Waikato 35

(36)

03/25/23 University of Waikato 36

(37)

03/25/23 University of Waikato 37

(38)

03/25/23 University of Waikato 38

(39)

03/25/23 University of Waikato 39

(40)

03/25/23 University of Waikato 40

(41)

03/25/23 University of Waikato 41

(42)

03/25/23 University of Waikato 42

(43)

03/25/23 University of Waikato 43

(44)

03/25/23 University of Waikato 44

(45)

03/25/23 University of Waikato 45

(46)

03/25/23 University of Waikato 46

(47)

03/25/23 University of Waikato 47

(48)

03/25/23 University of Waikato 48

(49)

03/25/23 University of Waikato 49

(50)

03/25/23 University of Waikato 50

(51)

03/25/23 University of Waikato 51

(52)

03/25/23 University of Waikato 52

(53)

03/25/23 University of Waikato 53

(54)

03/25/23 University of Waikato 54

(55)

03/25/23 University of Waikato 55

(56)

03/25/23 University of Waikato 56

(57)

03/25/23 University of Waikato 57

(58)

03/25/23 University of Waikato 58

(59)

03/25/23 University of Waikato 59

(60)

03/25/23 University of Waikato 60

(61)

03/25/23 University of Waikato 61

(62)

03/25/23 University of Waikato 62

(63)

03/25/23 University of Waikato 63

(64)

03/25/23 University of Waikato 64

(65)

03/25/23 University of Waikato 65

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(66)

03/25/23 University of Waikato 66

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(67)

03/25/23 University of Waikato 67

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(68)

03/25/23 University of Waikato 68

(69)

03/25/23 University of Waikato 69

(70)

03/25/23 University of Waikato 70

(71)

03/25/23 University of Waikato 71

(72)

03/25/23 University of Waikato 72

(73)

03/25/23 University of Waikato 73

(74)

03/25/23 University of Waikato 74

(75)

03/25/23 University of Waikato 75

Quic kTime™ and a TIFF (LZW) dec ompres s or are needed to s ee this pic ture.

(76)

03/25/23 University of Waikato 76

(77)

03/25/23 University of Waikato 77

(78)

03/25/23 University of Waikato 78

(79)

03/25/23 University of Waikato 79

(80)

03/25/23 University of Waikato 80

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(81)

03/25/23 University of Waikato 81

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(82)

03/25/23 University of Waikato 82

(83)

03/25/23 University of Waikato 83

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

(84)

03/25/23 University of Waikato 84

(85)

03/25/23 University of Waikato 85

(86)

03/25/23 University of Waikato 86

(87)

03/25/23 University of Waikato 87

(88)

03/25/23 University of Waikato 88

(89)

03/25/23 University of Waikato 89

(90)

03/25/23 University of Waikato 90

(91)

03/25/23 University of Waikato 91

(92)

03/25/23 University of Waikato 92

Explorer: clustering data

 WEKA contains “clusterers” for finding groups of similar instances in a dataset

 Implemented schemes are:

k-Means, EM, Cobweb, X-means, FarthestFirst

 Clusters can be visualized and compared to “true”

clusters (if given)

 Evaluation based on loglikelihood if clustering

scheme produces a probability distribution

(93)

03/25/23 University of Waikato 93

(94)

03/25/23 University of Waikato 94

(95)

03/25/23 University of Waikato 95

(96)

03/25/23 University of Waikato 96

(97)

03/25/23 University of Waikato 97

(98)

03/25/23 University of Waikato 98

(99)

03/25/23 University of Waikato 99

(100)

03/25/23 University of Waikato 100

(101)

03/25/23 University of Waikato 101

(102)

03/25/23 University of Waikato 102

(103)

03/25/23 University of Waikato 103

(104)

03/25/23 University of Waikato 104

(105)

03/25/23 University of Waikato 105

(106)

03/25/23 University of Waikato 106

(107)

03/25/23 University of Waikato 107

(108)

03/25/23 University of Waikato 108

Explorer: finding associations

 WEKA contains an implementation of the Apriori algorithm for learning association rules

 Works only with discrete data

 Can identify statistical dependencies between groups of attributes:

 milk, butter  bread, eggs (with confidence 0.9 and support 2000)

 Apriori can compute all rules that have a given

minimum support and exceed a given confidence

(109)

03/25/23 University of Waikato 109

(110)

03/25/23 University of Waikato 110

(111)

03/25/23 University of Waikato 111

(112)

03/25/23 University of Waikato 112

(113)

03/25/23 University of Waikato 113

(114)

03/25/23 University of Waikato 114

(115)

03/25/23 University of Waikato 115

(116)

03/25/23 University of Waikato 116

Explorer: attribute selection

 Panel that can be used to investigate which

(subsets of) attributes are the most predictive ones

 Attribute selection methods contain two parts:

 A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking

 An evaluation method: correlation-based, wrapper, information gain, chi-squared, …

 Very flexible: WEKA allows (almost) arbitrary

combinations of these two

(117)

03/25/23 University of Waikato 117

(118)

03/25/23 University of Waikato 118

(119)

03/25/23 University of Waikato 119

(120)

03/25/23 University of Waikato 120

(121)

03/25/23 University of Waikato 121

(122)

03/25/23 University of Waikato 122

(123)

03/25/23 University of Waikato 123

(124)

03/25/23 University of Waikato 124

(125)

03/25/23 University of Waikato 125

Explorer: data visualization

 Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem

 WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)

 To do: rotating 3-d visualizations (Xgobi-style)

 Color-coded class values

 “Jitter” option to deal with nominal attributes (and to detect “hidden” data points)

 “Zoom-in” function

(126)

03/25/23 University of Waikato 126

(127)

03/25/23 University of Waikato 127

(128)

03/25/23 University of Waikato 128

(129)

03/25/23 University of Waikato 129

(130)

03/25/23 University of Waikato 130

(131)

03/25/23 University of Waikato 131

(132)

03/25/23 University of Waikato 132

(133)

03/25/23 University of Waikato 133

(134)

03/25/23 University of Waikato 134

(135)

03/25/23 University of Waikato 135

(136)

03/25/23 University of Waikato 136

(137)

03/25/23 University of Waikato 137

(138)

03/25/23 University of Waikato 138

Performing experiments

 Experimenter makes it easy to compare the performance of different learning schemes

 For classification and regression problems

 Results can be written into file or database

 Evaluation options: cross-validation, learning curve, hold-out

 Can also iterate over different parameter settings

 Significance-testing built in!

(139)

03/25/23 University of Waikato 139

(140)

03/25/23 University of Waikato 140

(141)

03/25/23 University of Waikato 141

(142)

03/25/23 University of Waikato 142

(143)

03/25/23 University of Waikato 143

(144)

03/25/23 University of Waikato 144

(145)

03/25/23 University of Waikato 145

(146)

03/25/23 University of Waikato 146

(147)

03/25/23 University of Waikato 147

(148)

03/25/23 University of Waikato 148

(149)

03/25/23 University of Waikato 149

(150)

03/25/23 University of Waikato 150

(151)

03/25/23 University of Waikato 151

(152)

03/25/23 University of Waikato 152

The Knowledge Flow GUI

 New graphical user interface for WEKA

 Java-Beans-based interface for setting up and running machine learning experiments

 Data sources, classifiers, etc. are beans and can be connected graphically

 Data “flows” through components: e.g.,

“data source” -> “filter” -> “classifier” -> “evaluator”

 Layouts can be saved and loaded again later

(153)

03/25/23 University of Waikato 153

(154)

03/25/23 University of Waikato 154

(155)

03/25/23 University of Waikato 155

(156)

03/25/23 University of Waikato 156

(157)

03/25/23 University of Waikato 157

(158)

03/25/23 University of Waikato 158

(159)

03/25/23 University of Waikato 159

(160)

03/25/23 University of Waikato 160

(161)

03/25/23 University of Waikato 161

(162)

03/25/23 University of Waikato 162

(163)

03/25/23 University of Waikato 163

(164)

03/25/23 University of Waikato 164

(165)

03/25/23 University of Waikato 165

(166)

03/25/23 University of Waikato 166

(167)

03/25/23 University of Waikato 167

(168)

03/25/23 University of Waikato 168

(169)

03/25/23 University of Waikato 169

(170)

03/25/23 University of Waikato 170

(171)

03/25/23 University of Waikato 171

(172)

03/25/23 University of Waikato 172

(173)

03/25/23 University of Waikato 173

Conclusion: try it yourself!

 WEKA is available at

http://www.cs.waikato.ac.nz/ml/weka

 Also has a list of projects based on WEKA

 WEKA contributors:

Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya,

Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark

Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane

Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang,

Zhihai Wang

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

Weighted version is #W[1]-hard [Bläser and Curticapean 2012] Unweighted version is #W[1]-hard [Curticapean 2013] — complicated proof. Unweighted version

A legelső természetesen az lett volna, hogy a magam rajongásából vonjak védelmet apám köré, s mi ketten igazán összemelegedjünk, ő még csak félig volt köz- tünk: a

The growth of the mortgage market was strong in Brazil, where mortgage lending figures have quintupled since 2007, although there is a generally low credit level

Kókay György többször is visszatér rá tanulmányaiban, hogy Révai Miklós mint a Magyar Hírmondó szerkesztője már a nyolcvanas évek elején arra törekedett, hogy lapjával

Volt ezek szerint (verseiben meg sem jelenített) apai gondja Teleki Ádámnak Önéletírásában Székely László arról is beszámol, hogy a gróf (akivel második felesége,

Spoločným kódom avantgardných časopisov dvadsiatych rokov, vrátane tých, ktoré vychádzali v strednej a východnej Európe, bolo takzvané „synte- tické“ redigovanie,

[r]

[r]