SlideShare a Scribd company logo
1 of 17
Download to read offline
HORN CONCERTO
Efficient Rule Mining on RDF Data
Tommaso Soru

AKSW Colloquium, 03.04.2017
RDF Rule Mining
• RDFS/OWL rules are given as schema.
• Schema-free datasets might have an implicit schema.
• Why “mining”? Because rules are not visible in data.
2
Motivation/1
Link Prediction Problem.



Given a union of graphs G = G1 ∪ … ∪ GN,

find new edges among vertices s and t in G.
3
Motivation/2
Markov Logic Networks.



Given a collection of first-order
statements (evidence) and a set
of weighted first-order rules,
build an undirected weighted
graph where nodes are
statements and edges indicate
dependency.
4
Motivation/3
RDFS/OWL
Interpretation
Rule Mining
Weight
Learning
Grounding Inference
Input
Dataset(s)
Predicted
TriplesMANDOLIN’s pipeline
Markov Logic Networks
5
Rule Mining & Weight Learning
Given a directed labelled graph, find rules

and weights associated with them.
w rule
w1 p(x, y) ← q(x, y)
w2 p(x, y) ← q(y, x)
w3 p(x, y) ← q(x, z) ^ r(y, z)
w4 p(x, y) ← q(x, z) ^ r(z, y)
w5 p(x, y) ← q(z, x) ^ r(y, z)
w6 p(x, y) ← q(z, x) ^ r(z, y)
Horn Clauses
6
Horn Clauses
A Horn clause is a clause (a disjunction of literals patterns)

with at most one positive, i.e. unnegated, literal pattern.
p ∨ ¬q1 ∨ ¬q2 ∨ ... ∨ ¬qn
p ← q1 ∧ q2 ∧ ... ∧ qn
head body
p(x,y)
7
Confidence score (weight)
The confidence score of a rule is defined as the rate of

the occurrences of head and body together

over

the occurrences of the body.
p ← q1 ∧ q2 ∧ ... ∧ qn
head body
8
The HORN CONCERTO approach
P(A | B) =
P(A∩ B)
P(B)
Bayes’ Theorem
p ← q1 ∧ q2 ∧ ... ∧ qn
Horn Clause
Event-based confidence score
P(p |
!
q) =
P(p ∩
!
q)
P(
!
q)
≈
p ∧
!
q{ }
!
q{ }
9
Rules with
p ∧
!
q{ }
!
q{ }
SELECT ?p (COUNT(*) AS ?c) WHERE {
?x ?p ?y .
?x <target_q> ?y .
FILTER(?p != <target_q> )
} GROUP BY ?p
SELECT ?q (COUNT(*) AS ?c) WHERE {
[] ?q []
} GROUP BY ?q
!
q = 1
p(x, y) ← q(x, y)
10
Rules with
p ∧
!
q{ }
!
q{ }
SELECT ?q ?r (COUNT(*) AS ?c) WHERE {
?x ?q ?z .
?z ?r ?y .
?x <target_p> ?y
} GROUP BY ?q ?r
SELECT (COUNT(*) AS ?c) WHERE {
?x <target_q> ?z .
?z <target_r> ?y
}
!
q = 2
p(x, y) ← q(x, z) ∧ r(z, y)
11
Optimizations
• Select only top N properties.
• Order by descending score and prune when it’s lower
than a threshold T.
• Cache scores in-memory, as there might exist p1, p2
such that: pi(x,y) ← q(?,?), r(?,?).
• Parallelize algorithm by rule type.
12
Evaluation Setup
• 8 CPUs, 32 GB RAM, Ubuntu 16.04
• Scalability study
DBpedia Person (7 million triples)
DBpedia 2016-04 (397 million triples)
FUTURE
• Rule effectiveness for link prediction
FB15k (592 thousand triples)
WN18 (151 thousand triples)
• Rule quality (human judgment?)
13
Preliminary results – DBpedia Person
Runtime (s) # Rules Used RAM (GB)
AMIE+ > 10 days 6,337 4
Ontological PF > 3 hours > 1,000 4
HORN CONCERTO

single-thread
1.7 hours 3,125
client: 0.2

server: 1.0
14
Preliminary results – DBpedia 2016-04
Runtime (s) # Rules Used RAM (GB)
HORN CONCERTO

single-thread
11 hours 887
client: 0.2

server: N/A
15
Discussion
• AMIE+
Cons: indexes the graph in-memory.
• Ontological PF
Pros: Very fast
Cons: Relies on schema data (types domain, range)
• Horn Concerto
Pros: Works with SPARQL endpoint, fast also single-threaded,

may be able to overperform Ontological PF in RDF datasets with
available schema.
16
Thank you.

More Related Content

What's hot

D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...SEENET-MTP
 
2021 preTEST5A Final Review Packet!
2021 preTEST5A Final Review Packet!2021 preTEST5A Final Review Packet!
2021 preTEST5A Final Review Packet!A Jorge Garcia
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
 
Lec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrencesLec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrencesAnkita Karia
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notationsNikhil Sharma
 
It elective-4-buendia lagua
It elective-4-buendia laguaIt elective-4-buendia lagua
It elective-4-buendia laguaJohn Mark Lagua
 
Data Algorithms And Analysis
Data Algorithms And AnalysisData Algorithms And Analysis
Data Algorithms And Analysisgarishma bhatia
 
2.4 mst prim &kruskal demo
2.4 mst  prim &kruskal demo2.4 mst  prim &kruskal demo
2.4 mst prim &kruskal demoKrish_ver2
 
minimum spanning trees Algorithm
minimum spanning trees Algorithm minimum spanning trees Algorithm
minimum spanning trees Algorithm sachin varun
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...Jaehun Kim
 

What's hot (20)

Gsas intro rvd (1)
Gsas intro rvd (1)Gsas intro rvd (1)
Gsas intro rvd (1)
 
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
D. Vulcanov - On Cosmologies with non-Minimally Coupled Scalar Field and the ...
 
2021 preTEST5A Final Review Packet!
2021 preTEST5A Final Review Packet!2021 preTEST5A Final Review Packet!
2021 preTEST5A Final Review Packet!
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Sortsearch
SortsearchSortsearch
Sortsearch
 
Linear sorting
Linear sortingLinear sorting
Linear sorting
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
 
Lec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrencesLec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrences
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
19 primkruskal
19 primkruskal19 primkruskal
19 primkruskal
 
Asymptotic notations
Asymptotic notationsAsymptotic notations
Asymptotic notations
 
It elective-4-buendia lagua
It elective-4-buendia laguaIt elective-4-buendia lagua
It elective-4-buendia lagua
 
Data Algorithms And Analysis
Data Algorithms And AnalysisData Algorithms And Analysis
Data Algorithms And Analysis
 
2.4 mst prim &kruskal demo
2.4 mst  prim &kruskal demo2.4 mst  prim &kruskal demo
2.4 mst prim &kruskal demo
 
minimum spanning trees Algorithm
minimum spanning trees Algorithm minimum spanning trees Algorithm
minimum spanning trees Algorithm
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Lec29
Lec29Lec29
Lec29
 
Merge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering studentsMerge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering students
 
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...
Towards Seed-Free Music Playlist Generation: Enhancing Collaborative Filterin...
 
Topological sorting
Topological sortingTopological sorting
Topological sorting
 

Similar to Horn Concerto – AKSW Colloquium

Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerVitor Hirota Makiyama
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych Data Science Warsaw
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Greenplum 6 Changes
Greenplum 6 ChangesGreenplum 6 Changes
Greenplum 6 ChangesVMware Tanzu
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Formal Semantics of SQL and Cypher
Formal Semantics of SQL and CypherFormal Semantics of SQL and Cypher
Formal Semantics of SQL and CypheropenCypher
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cStew Ashton
 
MongoDB's index and query optimize
MongoDB's index and query optimizeMongoDB's index and query optimize
MongoDB's index and query optimizemysqlops
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)MongoSF
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit
 

Similar to Horn Concerto – AKSW Colloquium (20)

Query compiler
Query compilerQuery compiler
Query compiler
 
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServerText Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
Text Mining Applied to SQL Queries: a Case Study for SDSS SkyServer
 
Data structure-question-bank
Data structure-question-bankData structure-question-bank
Data structure-question-bank
 
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
 
Compilation
CompilationCompilation
Compilation
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 
Greenplum 6 Changes
Greenplum 6 ChangesGreenplum 6 Changes
Greenplum 6 Changes
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
Formal Semantics of SQL and Cypher
Formal Semantics of SQL and CypherFormal Semantics of SQL and Cypher
Formal Semantics of SQL and Cypher
 
Row Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12cRow Pattern Matching in Oracle Database 12c
Row Pattern Matching in Oracle Database 12c
 
Masters Thesis Defense Presentation
Masters Thesis Defense PresentationMasters Thesis Defense Presentation
Masters Thesis Defense Presentation
 
MongoDB's index and query optimize
MongoDB's index and query optimizeMongoDB's index and query optimize
MongoDB's index and query optimize
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...
 

Recently uploaded

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Recently uploaded (20)

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Horn Concerto – AKSW Colloquium

  • 1. HORN CONCERTO Efficient Rule Mining on RDF Data Tommaso Soru AKSW Colloquium, 03.04.2017
  • 2. RDF Rule Mining • RDFS/OWL rules are given as schema. • Schema-free datasets might have an implicit schema. • Why “mining”? Because rules are not visible in data. 2
  • 3. Motivation/1 Link Prediction Problem.
 
 Given a union of graphs G = G1 ∪ … ∪ GN,
 find new edges among vertices s and t in G. 3
  • 4. Motivation/2 Markov Logic Networks.
 
 Given a collection of first-order statements (evidence) and a set of weighted first-order rules, build an undirected weighted graph where nodes are statements and edges indicate dependency. 4
  • 6. Rule Mining & Weight Learning Given a directed labelled graph, find rules
 and weights associated with them. w rule w1 p(x, y) ← q(x, y) w2 p(x, y) ← q(y, x) w3 p(x, y) ← q(x, z) ^ r(y, z) w4 p(x, y) ← q(x, z) ^ r(z, y) w5 p(x, y) ← q(z, x) ^ r(y, z) w6 p(x, y) ← q(z, x) ^ r(z, y) Horn Clauses 6
  • 7. Horn Clauses A Horn clause is a clause (a disjunction of literals patterns)
 with at most one positive, i.e. unnegated, literal pattern. p ∨ ¬q1 ∨ ¬q2 ∨ ... ∨ ¬qn p ← q1 ∧ q2 ∧ ... ∧ qn head body p(x,y) 7
  • 8. Confidence score (weight) The confidence score of a rule is defined as the rate of
 the occurrences of head and body together
 over
 the occurrences of the body. p ← q1 ∧ q2 ∧ ... ∧ qn head body 8
  • 9. The HORN CONCERTO approach P(A | B) = P(A∩ B) P(B) Bayes’ Theorem p ← q1 ∧ q2 ∧ ... ∧ qn Horn Clause Event-based confidence score P(p | ! q) = P(p ∩ ! q) P( ! q) ≈ p ∧ ! q{ } ! q{ } 9
  • 10. Rules with p ∧ ! q{ } ! q{ } SELECT ?p (COUNT(*) AS ?c) WHERE { ?x ?p ?y . ?x <target_q> ?y . FILTER(?p != <target_q> ) } GROUP BY ?p SELECT ?q (COUNT(*) AS ?c) WHERE { [] ?q [] } GROUP BY ?q ! q = 1 p(x, y) ← q(x, y) 10
  • 11. Rules with p ∧ ! q{ } ! q{ } SELECT ?q ?r (COUNT(*) AS ?c) WHERE { ?x ?q ?z . ?z ?r ?y . ?x <target_p> ?y } GROUP BY ?q ?r SELECT (COUNT(*) AS ?c) WHERE { ?x <target_q> ?z . ?z <target_r> ?y } ! q = 2 p(x, y) ← q(x, z) ∧ r(z, y) 11
  • 12. Optimizations • Select only top N properties. • Order by descending score and prune when it’s lower than a threshold T. • Cache scores in-memory, as there might exist p1, p2 such that: pi(x,y) ← q(?,?), r(?,?). • Parallelize algorithm by rule type. 12
  • 13. Evaluation Setup • 8 CPUs, 32 GB RAM, Ubuntu 16.04 • Scalability study DBpedia Person (7 million triples) DBpedia 2016-04 (397 million triples) FUTURE • Rule effectiveness for link prediction FB15k (592 thousand triples) WN18 (151 thousand triples) • Rule quality (human judgment?) 13
  • 14. Preliminary results – DBpedia Person Runtime (s) # Rules Used RAM (GB) AMIE+ > 10 days 6,337 4 Ontological PF > 3 hours > 1,000 4 HORN CONCERTO single-thread 1.7 hours 3,125 client: 0.2 server: 1.0 14
  • 15. Preliminary results – DBpedia 2016-04 Runtime (s) # Rules Used RAM (GB) HORN CONCERTO single-thread 11 hours 887 client: 0.2 server: N/A 15
  • 16. Discussion • AMIE+ Cons: indexes the graph in-memory. • Ontological PF Pros: Very fast Cons: Relies on schema data (types domain, range) • Horn Concerto Pros: Works with SPARQL endpoint, fast also single-threaded,
 may be able to overperform Ontological PF in RDF datasets with available schema. 16