Learning in Logic
Tamás Horváth
University of Bonn &
Fraunhofer IAIS, Sankt Augustin, Germany
tamas.horvath@iais.fraunhofer.de
PhD Course, Szeged, 2012 - © T.Horváth 2
Learning in Logic - Inductive Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 3
Learning in Logic
inductive generalization of first-order clauses
1. preliminaries
2. generalization of words (terms and literals)
3. generalization of clauses
PhD Course, Szeged, 2012 - © T.Horváth 4
First-Order Logic – Syntax (Alphabet)
PhD Course, Szeged, 2012 - © T.Horváth 5
First-Order Logic – Syntax (Terms)
PhD Course, Szeged, 2012 - © T.Horváth 6
First-Order Logic – Syntax (Formulas)
PhD Course, Szeged, 2012 - © T.Horváth 7
First-Order Logic – Syntax (First-Order Language)
PhD Course, Szeged, 2012 - © T.Horváth 8
First-Order Logic – Semantics (Pre-Interpretation)
PhD Course, Szeged, 2012 - © T.Horváth 9
First-Order Logic – Semantics (Term Assignment)
PhD Course, Szeged, 2012 - © T.Horváth 10
First-Order Logic – Semantics (Interpretation, Truth Value)
PhD Course, Szeged, 2012 - © T.Horváth 11
First-Order Logic – Semantics (Interpretation, Truth Value)
PhD Course, Szeged, 2012 - © T.Horváth 12
First-Order Logic – Semantics (Model, Implication)
PhD Course, Szeged, 2012 - © T.Horváth 13
Clauses
PhD Course, Szeged, 2012 - © T.Horváth 14
Substitutions
PhD Course, Szeged, 2012 - © T.Horváth 15
Substitutions
PhD Course, Szeged, 2012 - © T.Horváth 16
Substitutions
PhD Course, Szeged, 2012 - © T.Horváth 17
Substitutions
PhD Course, Szeged, 2012 - © T.Horváth 18
Learning in Logic
inductive generalization of first-order clauses
1. preliminaries
2. generalization of words (terms and literals)
3. generalization of clauses
PhD Course, Szeged, 2012 - © T.Horváth 19
Generalization of Words: Notions
PhD Course, Szeged, 2012 - © T.Horváth 20
Generalization of Words: Notions
PhD Course, Szeged, 2012 - © T.Horváth 21
Inductive Generalization of Words
PhD Course, Szeged, 2012 - © T.Horváth 22
Inductive Generalization of Words
PhD Course, Szeged, 2012 - © T.Horváth 23
Inductive Generalization of Words
PhD Course, Szeged, 2012 - © T.Horváth 24
Inductive Generalization of Words
PhD Course, Szeged, 2012 - © T.Horváth 25
The Algorithm
PhD Course, Szeged, 2012 - © T.Horváth 26
Example
PhD Course, Szeged, 2012 - © T.Horváth 27
Example (Cont‘d)
PhD Course, Szeged, 2012 - © T.Horváth 28
Proof of Plotkin‘s Theorem
PhD Course, Szeged, 2012 - © T.Horváth 29
Proof of Plotkin‘s Theorem – Definitions and Notions
PhD Course, Szeged, 2012 - © T.Horváth 30
Main Steps of the Proof of Plotkin‘s Theorem
PhD Course, Szeged, 2012 - © T.Horváth 31
Lemma 2
PhD Course, Szeged, 2012 - © T.Horváth 32
Lemma 3
PhD Course, Szeged, 2012 - © T.Horváth 33
Proof of Lemma 3 (cont‘d)
PhD Course, Szeged, 2012 - © T.Horváth 34
Lemma 4
PhD Course, Szeged, 2012 - © T.Horváth 35
Lemma 5
PhD Course, Szeged, 2012 - © T.Horváth 36
Proof of Lemma 5 (cont‘d)
PhD Course, Szeged, 2012 - © T.Horváth 37
Learning in Logic
inductive generalization of first-order clauses
1. generalization of words (terms and literals) 2. generalization of clauses
3. example
PhD Course, Szeged, 2012 - © T.Horváth 38
Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 39
Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 40
Subsumption vs. Implication
PhD Course, Szeged, 2012 - © T.Horváth 41
Subsumption vs. Implication
PhD Course, Szeged, 2012 - © T.Horváth 42
Least General Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 43
Least General Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 44
Least General Generalization of Clauses
PhD Course, Szeged, 2012 - © T.Horváth 45
Proof of (ii): The Algorithm
PhD Course, Szeged, 2012 - © T.Horváth 46
Example
PhD Course, Szeged, 2012 - © T.Horváth 47
Lemma A
PhD Course, Szeged, 2012 - © T.Horváth 48
Proof of (ii) on Slide 44
PhD Course, Szeged, 2012 - © T.Horváth 49
Proof of (ii) on Slide 46
PhD Course, Szeged, 2012 - © T.Horváth 50
Proof of (ii) on Slide 46
PhD Course, Szeged, 2012 - © T.Horváth 51
Proof of (ii) on Slide 46
PhD Course, Szeged, 2012 - © T.Horváth 52
Reduced Clauses
PhD Course, Szeged, 2012 - © T.Horváth 53
Clause Reduction
PhD Course, Szeged, 2012 - © T.Horváth 54
Clause Reduction Algorithm
PhD Course, Szeged, 2012 - © T.Horváth 55
Example
PhD Course, Szeged, 2012 - © T.Horváth 56
Proof of the Theorem on Slide 53
PhD Course, Szeged, 2012 - © T.Horváth 57
Lemma B
PhD Course, Szeged, 2012 - © T.Horváth 58
Proof of (ii) in the Theorem on Slide 53
PhD Course, Szeged, 2012 - © T.Horváth 59
Proof of (ii) in the Theorem on Slide 53 (cont‘d)
PhD Course, Szeged, 2012 - © T.Horváth 60
Putting All Together
PhD Course, Szeged, 2012 - © T.Horváth 61
Learning in Logic
notions and notations
inductive generalization of first-order clauses
1. generalization of words (terms and literals) 2. generalization of clauses
3. example
PhD Course, Szeged, 2012 - © T.Horváth 62
definition of the unknown target predicate Is_Physicist( X , Y ) Learning:
Application Example: Relation Extraction from Texts
Prediction: Einstein was a German theoretical physicist.
Training Data: Fraunhofer was a German optician.
Schrödinger was an Austrian - Irish physicist.
Planck was a German physicist.
Heisenberg was a celebrated German physicist and Nobel laureate.
Problem: Automatic extraction of semantic relations between entities from natural language texts.
Example:
PhD Course, Szeged, 2012 - © T.Horváth 63
Data Preprocessing
sentences dependency trees
labeled rooted directed trees representing grammatical dependencies among the words in a sentence
capture a low-level syntactic structure of sentences
bijective map between words in a sentence and nodes in the tree
generated by the Stanford Parser
nodes defining the same entity are merged into a single node
- e.g., Ludwig Wittgenstein Ludwig_Wittgenstein
PhD Course, Szeged, 2012 - © T.Horváth 64
Heisenberg was a celebrated German physicist and Nobel laureate.
Fraunhofer was a German optician.
unknown target relation: Is_Physicist (unary)
POS: { Is_Physicist(Fraunhofer), Is_Physicist(Heisenberg) } NEG: { Is_Physicist(Brecht) }
Brecht was a German poet, playwright,
and theatre director.
PhD Course, Szeged, 2012 - © T.Horváth 65
Heisenberg was a celebrated German physicist and Nobel laureate.
Fraunhofer was a German optician.
POS: { Is_Physicist(Fraunhofer), Is_Physicist(Heisenberg)}
We want to generalize these two structures!
Consider them as ground clauses and use Plotkin‘s LGG algorithm!
PhD Course, Szeged, 2012 - © T.Horváth 66
Dependency Trees as Relational Structures
labeled trees are considered as relational structures - unique constant for each vertex
- unary and binary predicates only - ground
training examples: m-tuples of vertices of the dependency trees - P: m-ary target relation to be learned
POS: set of instances (m-tuples) of the target relation P
NEG: set of non-instances (m-tuples) of the target relation P
- ground atoms of the target predicate P
PhD Course, Szeged, 2012 - © T.Horváth 67
Fraunhofer was a German optician.
comes from semantic hierarchy
(e.g., Wordnet)
PhD Course, Szeged, 2012 - © T.Horváth 68
Heisenberg was a celebrated German
physicist and Nobel laureate.
PhD Course, Szeged, 2012 - © T.Horváth 69
Example – LGG of the Two Ground Clauses
the two structures correspond to labeled directed trees
LGG: direct product of labeled trees
introduce a product vertex for each pair (u,v) of constants
each such product vertex corresponds to a new variable x (u,v) in the LGG
add an edge from (u 1 ,v 1 ) to (u 2 ,v 2 ) in the product iff there is an edge from u 1 to u 2 and there is an edge from v 1 to v 2
add the literal ¬R(x (u1,v1) , x (u2,v2) ) to the LGG
„color“ the product vertex (u,v) by the unary predicate Q iff u and v are both colored by Q
add the literal ¬Q(x (u,v) ) to the LGG
PhD Course, Szeged, 2012 - © T.Horváth 70
Fraunhofer
was
optician
a
German
Heisenberg was physicist a celebrated German and laureate Nobel
X
1X
2X
3X
4X
5X
6X
7was
physicist
a
German
X
0X
8PhD Course, Szeged, 2012 - © T.Horváth 71
Fraunhofer
was
optician
a
German
Heisenberg was physicist a celebrated German and laureate Nobel
X
1X
2X
3X
4X
5X
6X
7was
physicist
a
German
X
0X
8PhD Course, Szeged, 2012 - © T.Horváth 72
Fraunhofer
was
optician
a
German
Heisenberg was physicist a celebrated German and laureate Nobel
X
1X
2X
3X
4X
5X
6X
7was
physicist
a
German
X
0X
8PhD Course, Szeged, 2012 - © T.Horváth 73
Fraunhofer
was
optician
a
German
Heisenberg was physicist a celebrated German and laureate Nobel
X
1X
2X
3X
4X
5X
6X
7was
physicist
a
German
X
0X
8PhD Course, Szeged, 2012 - © T.Horváth 74
Heisenberg was a celebrated German physicist and Nobel laureate.
Fraunhofer was a German optician.
target relation: Is_Physicist (unary)
POS: { Is_Physicist(Fraunhofer), Is_Physicist(Heisenberg) }
x
the pattern
representing the concept generated by POS
y
z
German
PhD Course, Szeged, 2012 - © T.Horváth 75
Example – The Reduced Pattern as First-Order Clause
PhD Course, Szeged, 2012 - © T.Horváth 76
Example (cont‘d)
the pattern representing the concept generated by {Is_Physicist(Fraunhofer), Is_Physicist(Heisenberg)}
Einstein was a German theoretical physicist.
Is_Physicist(Einstein)
x
y
z
German
PhD Course, Szeged, 2012 - © T.Horváth 77
Example (cont‘d)
the pattern representing the concept generated by
{Is_Physicist(Fraunhofer), Is_Physicist(Heisenberg)} NOT Is_Physicist(Brecht)
Brecht was a German poet, playwright, and theatre director.
X
X
Y
Z
German
X
PhD Course, Szeged, 2012 - © T.Horváth 78
Summary
LGG is a natural notion for the generalization of first-order clauses with respect to subsumption
computing the LGG of clauses is reduced to computing the LGG of words
a reduced non-empty LGG, if it exists, is unique up to variable renaming
problems with the LGG
- the size of the reduced LGG can grow exponentially with the number of clauses
- as subsumption is NP-complete, deciding whether the LGG generalizes
(i.e., subsumes) a clause is NP-complete
PhD Course, Szeged, 2012 - © T.Horváth 79
Outline
complexity of learning function-free definite Horn clauses
bottom-up induction of clauses
the relative least general generalization (RLGG)
a generic algorithm
on the length of the reduced RLGG
top-down induction of clauses
the FOIL algorithm
PhD Course, Szeged, 2012 - © T.Horváth 80
Horn Clauses
PhD Course, Szeged, 2012 - © T.Horváth 81
Finding a Consistent Clause
PhD Course, Szeged, 2012 - © T.Horváth 82
Finding a Consistent Clause
PhD Course, Szeged, 2012 - © T.Horváth 83
Learning Function-Free Definite Horn Clauses
complexity of learning function-free definite Horn clauses
bottom-up induction of clauses
the relative least general generalization (RLGG)
a generic algorithm
on the length of the reduced RLGG
top-down induction of clauses
the FOIL algorithm
PhD Course, Szeged, 2012 - © T.Horváth 84
Learning Non-Recursive Definite Horn Clauses
PhD Course, Szeged, 2012 - © T.Horváth 85
Learning Non-Recursive Definite Horn Clauses
PhD Course, Szeged, 2012 - © T.Horváth 86
Example
PhD Course, Szeged, 2012 - © T.Horváth 87
Finding Consistent Clauses wrt. Background Knowledge
PhD Course, Szeged, 2012 - © T.Horváth 88
Relative Least General Generalization
PhD Course, Szeged, 2012 - © T.Horváth 89
Problem Reformulation: Notions
PhD Course, Szeged, 2012 - © T.Horváth 90
Reformulation of the Problem on Slide 87
PhD Course, Szeged, 2012 - © T.Horváth 91
Reformulation of the Problem on Slide 87
PhD Course, Szeged, 2012 - © T.Horváth 92
Reformulation of the Problem on Slide 87
PhD Course, Szeged, 2012 - © T.Horváth 93
Reformulation of the Problem on Slide 87
PhD Course, Szeged, 2012 - © T.Horváth 94
Example
PhD Course, Szeged, 2012 - © T.Horváth 95
Example
PhD Course, Szeged, 2012 - © T.Horváth 96
Example
PhD Course, Szeged, 2012 - © T.Horváth 97
Bottom-Up Induction of First-Order Clauses
PhD Course, Szeged, 2012 - © T.Horváth 98
Bottom-Up Induction of First-Order Clauses
PhD Course, Szeged, 2012 - © T.Horváth 99
Problem 3: The Length of the Reduced RLGG
PhD Course, Szeged, 2012 - © T.Horváth 100
Problem 3: The Length of the Reduced RLGG
PhD Course, Szeged, 2012 - © T.Horváth 101
Problem 3: The Length of the Reduced RLGG
PhD Course, Szeged, 2012 - © T.Horváth 102
Summary
consistent hypothesis finding problem: computationally intractable
bottom-up induction: using the relative LGG, it iteratively generalizes the current clauses as long as it is consistent with the negative examples
- system based on this approach: Golem [Muggleton and Feng, 1993]
- problems with this approach:
(1) the size of the reduced LGG can grow exponentially with the number of positive examples
(2) as subsumption is NP-complete, deciding whether the LGG implies an example wrt. to
the background knowledge is NP-complete
PhD Course, Szeged, 2012 - © T.Horváth 103
Outline
complexity of learning function-free definite Horn clauses
bottom-up induction of clauses
the relative least general generalization (RLGG)
a generic algorithm
on the length of the reduced RLGG
top-down induction of clauses
the FOIL algorithm
PhD Course, Szeged, 2012 - © T.Horváth 104
FOIL: First-Order Inductive Learner
Quinlan (1990-1993)
combines the divide-and-conquer method designed for propositional TDIDT (top-down induction of decision trees) systems with the covering method developed for disjunctive logical expressions
- information-based heuristics in the divide-and-conquer method
hypothesis space is searched top-down in a heuristic fashion, looking for maximally general rules consistent with the negative examples
usually fast running times, no parameters, easy to use
may miss good solutions
implementations:
FOIL 6 (Quinlan; publicly available), mFoil (Dzeroski), Grendel (Cohen)
PhD Course, Szeged, 2012 - © T.Horváth 105
1. The Divide-and-Conquer Method
Hunt et al. (1966), Quinlan (1979,1986), Breiman et al. (1984) Cestnik et al.
(1987)
the method below yields a decision tree
1. if all training objects belong to a single class, the tree is a leaf labelled with that class
2. otherwise
1. select a test based on one attribute
2. divide the training set into subsets, each corresponding to one of the possible (mutually exclusive) outcomes of the test, and
3. apply the same procedure to each subset
PhD Course, Szeged, 2012 - © T.Horváth 106
2. The Covering Method
Michalski (1989), Michalski et al. (1986)
target class is represented by a disjunctive logical expression
1. find a conjunction of conditions that is satisfied by some objects in the target class, but no objects from another class
2. append this conjunction as one disjunct of the logical expression being developed
3. remove all objects that satisfy this conjunction and, if there are still some
remaining objects of the target class, repeat this procedure
PhD Course, Szeged, 2012 - © T.Horváth 107
FOIL: The Outer Loop
PhD Course, Szeged, 2012 - © T.Horváth 108
FOIL: How to Perform Step 3? (The Inner Loop of FOIL)
PhD Course, Szeged, 2012 - © T.Horváth 109
Inner Loop of FOIL: Which Literals Are Considered in Step 5?
PhD Course, Szeged, 2012 - © T.Horváth 110
Inner Loop of FOIL: Which Literal Is Selected in Steps 5-6?
PhD Course, Szeged, 2012 - © T.Horváth 111
Inner Loop of FOIL: Which Literal Is Selected in Step 5-6?
PhD Course, Szeged, 2012 - © T.Horváth 112
Inner Loop of FOIL: Which Literal Is Selected in Step 5-6?
PhD Course, Szeged, 2012 - © T.Horváth 113
FOIL: Example
PhD Course, Szeged, 2012 - © T.Horváth 114
FOIL: Example
PhD Course, Szeged, 2012 - © T.Horváth 115
FOIL: Example
PhD Course, Szeged, 2012 - © T.Horváth 116
FOIL: Example
PhD Course, Szeged, 2012 - © T.Horváth 117
FOIL: Next Iteration of the Inner Loop
PhD Course, Szeged, 2012 - © T.Horváth 118
FOIL: Next Iteration of the Inner Loop
PhD Course, Szeged, 2012 - © T.Horváth 119
Summary
Bottom-up induction: using the relative LGG, it iteratively generalizes the current clauses as long as it is consistent with the negative examples
- system based on this approach: Golem [Muggleton and Feng, 1993]
- problems with this approach:
(1) the size of the reduced LGG can grow exponentially with the number of positive examples
(2) as subsumption is NP-complete, deciding whether the LGG implies an example wrt. to the background knowledge is NP-complete
Top-down induction: iteratively specializes the current clauses by extending it with a literal as long as it is consistent with the negative examples
- system based on this approach: FOIL [Quinlan, 1990] and its variants - problems with this approach: same as (2) above
- the approach can be extended to learning recursive Horn clauses, as well as to
allowing negated literals in the clause‘ body
PhD Course, Szeged, 2012 - © T.Horváth 120
Appendix: Proof of Kietz‘s Theorem
PhD Course, Szeged, 2012 - © T.Horváth 121
Reduction Lemma
PhD Course, Szeged, 2012 - © T.Horváth 122
Reduction Lemma (cont‘d)
PhD Course, Szeged, 2012 - © T.Horváth 123
Reduction Lemma: Example
PhD Course, Szeged, 2012 - © T.Horváth 124
Proof of the Reduction Lemma: „IF“ Direction
PhD Course, Szeged, 2012 - © T.Horváth 125
Proof of the Reduction Lemma: „IF“ Direction
PhD Course, Szeged, 2012 - © T.Horváth 126
Proof of the Reduction Lemma: „ONLY IF“ Direction
PhD Course, Szeged, 2012 - © T.Horváth 127