Knowledge Graph Construction
Knowledge Graph Construction
1. Previous Exercises
1.Every teacher must teach someone
- Correct Answer: Teacher⊑∃𝑇each.Human
- Every finger is a bodypart and is a part-of hand.
- Finger ⊑ BodyPart ⨅ ∃Part_of.Hand
- Zhang is a teacher of SEU
- Teacher(Zhang, SEU)
2.Give a model of the following ontology:
PhDstudent ⊔ Undergraduatgestudent ⊑ Student,
PhDstudent(John),
Undergraduatgestudent(Jack),
Sister(Lisa,Jack),
- Employee(Lisa)
Correct Answer:
- A model: Δ={ jo,l,ja}
- I(John)= jo, I(Lisa)=1, I(Jack)= ja
- I( PhDstudent)={ jo},
- I(Employee)={I},
- I( Undergraduatgestudent)={ja},
- I(Student)={ ja.jo}
- I(Sister)={(l,ja)}
3.Write the inferred axioms using description logics after conducting classification in forward reasoning on the following axioms.:
Endocarditis ⊑ Heart_Disease
Miocardial_Infarction ⊑ Heart_Disease
Heart_Disease ⊑ Disease
Enterococcal_Endocarditis ⊑ Endocarditis
Correct Answer:
Endocarditis ⊑ Disease
- Miocardial_Infarction ⊑ Disease
- Enterococcal_Endocarditis ⊑ Heart_Disease
- Enterococcal_Endocarditis ⊑ Disease
2. Knowledge Graph Construction
- Knowledge Graph Construction: Extracting knowledge from heterogeneous data sources to form a knowledge graph.
- 包装器,自动把半结构化数据爬取出来
2.1 Knowledge Graph Construction from Structured Data
- Basics of Relational Database
- RDB2RDF: Direct Mapping & R2RML
- Triple Extraction from Relational Web Tables
3. Basics of Relational Databases
3.1 Structuring data
We all structure the information we work with:
- So we can find what we need, when we need it
- To facilitate(促进) evaluation, comparison, and analysis
The structure you select influences
- The kinds of information you collect
- How it is possible to interrogate(查询) your data
- The extent to which you can take advantage of your computer’s data-handling abilities
- How easy it is to share data with others
Options for structuring & analyzing data
- A table of bibliographic(著书目录的) data (not a table in relational database )
- table会对数据重复存储
3.2 An alternative approach
- 先做一个作者表,通过映射关系,将一些信息分开表示
- 再分表,把出版信息分开
- 限定好Type的种类,使得可以与另一张表有效定位
- To solve the above problems, we can design a relational database to store data
3.3 relational database
- Database terms:
- A database is a collection of data
- Data is organized into one or more tables
- Each row is a record
- Each column is a field
Deciding on Fields
- Principles of designing database terms:
- Think of all the facts that will be collected 考虑所有情况
- plenty of fields 考虑所有column
- consult widely 共识
- small facts, “atomic” 力度要比较细,希望信息尽量清晰,比如学生姓名而不是学生信息
- difficult to add later 再加一个字段特别困难
- Principles of designing database terms:
Set data types
3.4 Example:
An example of designing a relational database:
- Study of 18th century book trade
- What things are we interested in?
- Publications
- Publishers
- People
And what information might we want to know about each of these things?
- Names
- Dates
- Places
Design three tables at first:
Joins between tables: Primary Key 唯一标识
- Each table needs a primary key 意味着id,要显式定义
- Choose (at least) one field that only contains unique values Commonly an auto-incrementing whole (integer) number
Joins between tables: relate two tables by primary keys and foreign keys
- 一个publisher可能对应多个publication,就是一对多,可以设置谁是谁的外键
多个人可能参与写同一本书,一个person可能写很多本书,所以person和publication是多对多的。无法对多对多的关系建立计算机可识别的映射关系,所以需要新建一张表,这张表记录了所有Author与Publication的记录
那么就可以隐式表达Person和Publication的关系
3.5 Database design: workflow
- Choose fields
- Are they atomic?
- Are there plenty?
- Give each field a data type
- Are they consistent?
- Arrange the fields into tables
- Do all the fields in the same table describe the same item?
- Set primary key fields
- A different primary key for each table?
- is this a field with no duplicate values?
Draw relationships between tables
- Which field relates each pair of tables?
- Mark 1-to-many, many-to-many,1-to-1 relationships
- Review, reflect, challenge
- Talk through the design with someone else
3.6 Once you’ve created your database
- Ask questions by constructing queries
- Find the records that meet certain criteria
- Search, sort, count, and filter data
- Perform basic mathematical and statistical operations Export data for other types of analysis
- Export data for other types of analysis
Query example1:
select id, cityname, country, population, longtitude, latitude from City
Query Results
Query example2:
- select id, cityname, country, population, longtitude, latitude from City where cityname=‘Tirane’
Query Results
- Results may resemble another table or spreadsheet
- But the contents are customized(定制) to your requirements
3.7 When to use a relational database
Your data can be organized in tabular form
- e.g., information about things that share common properties (organized in one column field)
You are interested in multiple types of entity .
- And the relationships between them
- Entities may be concrete(具体的) or more abstract
You want to identify instances of things that meet certain criteria (query)
You want to be able to present one dataset in multiple different ways
Query results can be exported and used elsewhere
3.8 Benefits of relational databases
More accurate representation of complex data
- And helps avoid duplication of information
Permits flexible querying
- Wider range of questions possible than with a spreadsheet (multiple tables)
- Useful if you are unsure which questions you will want to ask
Suitable for collaborative use
- Multiple people can access and use the same database
- Can encourage (or enforce) consistency in data entry
Technology has been around for several decades
- Widely supported and well understood
4. RDB2RDF: Direct Mapping
4.1 What is RDB2RDF?
4.2 Two W3C RDB2RDF Standards:
- Direct Mapping
- R2RML
Tools:
- Free: D2R, Virtuoso, Morph, r2rml4net, db2triples, ultrawrap, Quest;
- Commercial: Virtuoso, ultrawrap, Oracle SW.
4.3 w3C RDB2RDF Standards
- Standards to map relational data to RDF
- A Direct Mapping of Relational Data to RDF
- Default automatic mapping of relational data to RDF
- R2RML: RDB to RDF Mapping Language
- Customizable language to map relational data to RDF
4.4 Create URIs following some simple rules:
- Map
- table to class (对应turtle语言type)
- column to property (属性->谓词)
- row to resource (一条记录)
- cell to literal value (turtle中的实值)
- in addition cell to URI
- if there is a foreign key constraint
- We need IRIs for identifying
- the resource class corresponding to a table
- the resources represented by the table rows
- the properties of the resources corresponding to table cells
- the references due to foreign keys
Base IRI
- for the whole graph/dataset,
- e.g. @base http://foo.example/DB/ .
- Table name $\rightarrow$ Class name,
- e.g. People $\rightarrow$ \
表示类别
- e.g. People $\rightarrow$ \
- Row with PK $\rightarrow$ Resource with PK,
- e.g,
这个表示一个instance,就是一个实例,\
- e.g,
- Table row $\rightarrow$ Property,
- e.g.,
(\ )
- e.g.,
- Table cells: what if NULL? 直接省略
- Foreign key reference $\rightarrow$ additional property, e.g.,
(\ )
- for the whole graph/dataset,
Provide a base IRI http://foo.example/DB/ !
1 | @base <http://foo.example/DB/> . |
4.5 Exercise
Please use direct mapping to map the following two relational tables to RDF triples with the base IRI http://foo.example/DB/ and prefix rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns# .
1 | @base <http://foo.example/DB/> . |
5. RDB2RDF: R2RML
- R2RML is a language for specifying mappings from relational to RDF data.
5.1 DV
可以理解为物理表的一个虚表,没有实际物理内存
A mapping takes as input a logical table, i.e.,
- a database table
- a database view (a virtual table collecting data from relational tables), or an SQL query (called an “R2RML view” because it is like an SQL view but does not modify the database)
Example: database view
5.2 A triples map
5.2.1 Def
- A logical table is mapped to a set of triples by a rule called triples map.
5.2.2 A triples map has three parts:
- the input logical table
- a subject map
- several predicate-object maps (combining predicate and object maps).
5.3 Example:
1 | Example: |
解析:
1 | #What is being mapped |
1 | #相当于<Subject URI> rdf:type <Class URI> |
1 | rr:predicateObjectMap [ |
5.4 R2RML Examples
5.4.1
- DB
- Set of RDF triples
1 | <http://data.example.com/employee/7369> rdf:type ex:Employee. |
- R2RML
1 | @prefix rr: <http://www.w3.org/ns/r2rml#>. |
5.4.2 View Definition
1 | <#DeptTableView> rr:sqlQuery """ |
5.4.3 Mapping to a View Definition
1 | <#TriplesMap2> |
5.4.4 Linking Two Logical Tables
1 | @prefix rr: <http://www.w3.org/ns/r2rml#>. |
- Additional predicate object map for <#TriplesMap1>
- Object map retrieves subject from parent triples map by joining along a foreign key relationship
- It joins
- the current row of the logical table
- with the row of the logical table of <#TriplesMap1> that satisfies the join condition 就是说Map1的每一行映射都满足这个条件
- Note:
- child = referencing map
- parent = referenced map
5.5 Exercise
Please write the R2RML triples map to map the following relational database to RDF triples with prefix rr: http://www.w3.org/ns/r2rml# and prefix ex: http://example.com/ns#(for classes and properties).
RDF Triples
1 | <http://data.example.com/student/001> ex:name "Zhang". <http://data.example.com/student/002> ex:name "Wang". |
1 | @prefix rr:<http://www.w3.org/ns/r2rml#> . |
6. Summary: RDB2RDF
- RDB2RDF is to map the content of Relational Databases to RDF.
- Two W3C RDB2RDF standards: Direct Mapping and R2RML
- The direct mapping defines a simple and intuitive transformation from RDB to RDF.
- R2RML is a language for expressing customized mappings (using external ontology vocabularies) from RDB to RDF.