Smurf
文章50
标签0
分类6
Knowledge Graph Construction

Knowledge Graph Construction

Knowledge Graph Construction

Knowledge Graph Construction

1. Previous Exercises

1.Every teacher must teach someone

  • Correct Answer: Teacher⊑∃𝑇each.Human

image-20211027100352168

  1. Every finger is a bodypart and is a part-of hand.
  • Finger ⊑ BodyPart ⨅ ∃Part_of.Hand

image-20211027100403730

  1. Zhang is a teacher of SEU
  • Teacher(Zhang, SEU)

image-20211027100413537

2.Give a model of the following ontology:

  • PhDstudent ⊔ Undergraduatgestudent ⊑ Student,

  • PhDstudent(John),

  • Undergraduatgestudent(Jack),

  • Sister(Lisa,Jack),

  • Employee(Lisa)

Correct Answer:

  • A model: Δ={ jo,l,ja}
  • I(John)= jo, I(Lisa)=1, I(Jack)= ja
  • I( PhDstudent)={ jo},
  • I(Employee)={I},
  • I( Undergraduatgestudent)={ja},
  • I(Student)={ ja.jo}
  • I(Sister)={(l,ja)}

3.Write the inferred axioms using description logics after conducting classification in forward reasoning on the following axioms.:

  • Endocarditis ⊑ Heart_Disease

  • Miocardial_Infarction ⊑ Heart_Disease

  • Heart_Disease ⊑ Disease

  • Enterococcal_Endocarditis ⊑ Endocarditis

    Correct Answer:

  • Endocarditis ⊑ Disease

  • Miocardial_Infarction ⊑ Disease
  • Enterococcal_Endocarditis ⊑ Heart_Disease
  • Enterococcal_Endocarditis ⊑ Disease

2. Knowledge Graph Construction

  • Knowledge Graph Construction: Extracting knowledge from heterogeneous data sources to form a knowledge graph.

image-20211027102059171

  • 包装器,自动把半结构化数据爬取出来

2.1 Knowledge Graph Construction from Structured Data

  • Basics of Relational Database
  • RDB2RDF: Direct Mapping & R2RML
  • Triple Extraction from Relational Web Tables

3. Basics of Relational Databases

3.1 Structuring data

  • We all structure the information we work with:

    • So we can find what we need, when we need it
    • To facilitate(促进) evaluation, comparison, and analysis
  • The structure you select influences

    • The kinds of information you collect
    • How it is possible to interrogate(查询) your data
    • The extent to which you can take advantage of your computer’s data-handling abilities
    • How easy it is to share data with others
  • Options for structuring & analyzing data

image-20211027102751954

  • A table of bibliographic(著书目录的) data (not a table in relational database )
    • table会对数据重复存储

image-20211027103036966

image-20211027103052497

3.2 An alternative approach

  • 先做一个作者表,通过映射关系,将一些信息分开表示

image-20211027103135022

image-20211027103236236

  • 再分表,把出版信息分开

image-20211027103305826

  • 限定好Type的种类,使得可以与另一张表有效定位

image-20211027103334867

  • To solve the above problems, we can design a relational database to store data

3.3 relational database

  • Database terms:
    • A database is a collection of data
    • Data is organized into one or more tables
    • Each row is a record
    • Each column is a field

image-20211027103528257

  • Deciding on Fields

    • Principles of designing database terms:
      • Think of all the facts that will be collected 考虑所有情况
      • plenty of fields 考虑所有column
      • consult widely 共识
      • small facts, “atomic” 力度要比较细,希望信息尽量清晰,比如学生姓名而不是学生信息
      • difficult to add later 再加一个字段特别困难
  • Set data types

image-20211027104550372

3.4 Example:

  • An example of designing a relational database:

    • Study of 18th century book trade
    • What things are we interested in?
      • Publications
      • Publishers
      • People
  • And what information might we want to know about each of these things?

    • Names
    • Dates
    • Places
  • Design three tables at first:

image-20211027104728258

  • Joins between tables: Primary Key 唯一标识

    • Each table needs a primary key 意味着id,要显式定义
    • Choose (at least) one field that only contains unique values Commonly an auto-incrementing whole (integer) number
  • Joins between tables: relate two tables by primary keys and foreign keys

    • 一个publisher可能对应多个publication,就是一对多,可以设置谁是谁的外键

    image-20211027105108660

  • 多个人可能参与写同一本书,一个person可能写很多本书,所以person和publication是多对多的。无法对多对多的关系建立计算机可识别的映射关系,所以需要新建一张表,这张表记录了所有Author与Publication的记录

  • 那么就可以隐式表达Person和Publication的关系

image-20211027105422127

3.5 Database design: workflow

  • Choose fields
    • Are they atomic?
    • Are there plenty?
  • Give each field a data type
    • Are they consistent?
  • Arrange the fields into tables
    • Do all the fields in the same table describe the same item?
  • Set primary key fields
    • A different primary key for each table?
    • is this a field with no duplicate values?
  • Draw relationships between tables

    • Which field relates each pair of tables?
    • Mark 1-to-many, many-to-many,1-to-1 relationships
  • Review, reflect, challenge
    • Talk through the design with someone else

3.6 Once you’ve created your database

  • Ask questions by constructing queries
    • Find the records that meet certain criteria
    • Search, sort, count, and filter data
    • Perform basic mathematical and statistical operations Export data for other types of analysis
  • Export data for other types of analysis

Query example1:

  • select id, cityname, country, population, longtitude, latitude from City

  • Query Results

image-20211027110159731

Query example2:

  • select id, cityname, country, population, longtitude, latitude from City where cityname=‘Tirane’

Query Results

image-20211027110300006

  • Results may resemble another table or spreadsheet
  • But the contents are customized(定制) to your requirements

3.7 When to use a relational database

  • Your data can be organized in tabular form

    • e.g., information about things that share common properties (organized in one column field)
  • You are interested in multiple types of entity .

    • And the relationships between them
    • Entities may be concrete(具体的) or more abstract
  • You want to identify instances of things that meet certain criteria (query)

  • You want to be able to present one dataset in multiple different ways
    Query results can be exported and used elsewhere

3.8 Benefits of relational databases

  • More accurate representation of complex data

    • And helps avoid duplication of information
  • Permits flexible querying

    • Wider range of questions possible than with a spreadsheet (multiple tables)
    • Useful if you are unsure which questions you will want to ask
  • Suitable for collaborative use

    • Multiple people can access and use the same database
    • Can encourage (or enforce) consistency in data entry
  • Technology has been around for several decades

    • Widely supported and well understood

4. RDB2RDF: Direct Mapping

4.1 What is RDB2RDF?

image-20211027110907688

4.2 Two W3C RDB2RDF Standards:

  • Direct Mapping
  • R2RML

Tools:

  • Free: D2R, Virtuoso, Morph, r2rml4net, db2triples, ultrawrap, Quest;
  • Commercial: Virtuoso, ultrawrap, Oracle SW.

4.3 w3C RDB2RDF Standards

  • Standards to map relational data to RDF
  • A Direct Mapping of Relational Data to RDF
    • Default automatic mapping of relational data to RDF
  • R2RML: RDB to RDF Mapping Language
    • Customizable language to map relational data to RDF

image-20211027111210681

image-20211027111243049

4.4 Create URIs following some simple rules:

  • Map
    • table to class (对应turtle语言type)
    • column to property (属性->谓词)
    • row to resource (一条记录)
    • cell to literal value (turtle中的实值)
    • in addition cell to URI
      • if there is a foreign key constraint
  • We need IRIs for identifying
    • the resource class corresponding to a table
    • the resources represented by the table rows
    • the properties of the resources corresponding to table cells
    • the references due to foreign keys
  • Base IRI

    • for the whole graph/dataset,
    • Table name $\rightarrow$​​ Class name,
      • e.g. People $\rightarrow$​​​ \ 表示类别
    • Row with PK $\rightarrow$​​ Resource with PK,
      • e.g, 这个表示一个instance,就是一个实例,\
    • Table row $\rightarrow$​ Property,
      • e.g., (\
    • Table cells: what if NULL? 直接省略
    • Foreign key reference $\rightarrow$​​​​ additional property, e.g., (\)
  • Provide a base IRI http://foo.example/DB/ !

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@base <http://foo.example/DB/> . 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

<People/ID=7> rdf:type <People> . #某条记录的类别
<People/ID=7> <People#ID> "7" . #具体属性
<People/ID=7> <People#fname> "Bob" .
<People/ID=7> <People#addr> "18" .
<People/ID=7> <People#ref-addr> <Addresses/ID=18> . #利用外键关联映射两张表的记录

<People/ID=8> rdf:type <People> .
<People/ID=8> <People#ID> "8" .
<People/ID=8> <People#fname> "Sue" .

<Addresses/ID=18> rdf:type <Addresses> .
<Addresses/ID=18> <Addresses#ID> "18" .
<Addresses/ID=18> <Addresses#city> "Cambridge" .
<Addresses/ID=18> <Addresses#state> "MA" .

4.5 Exercise

Please use direct mapping to map the following two relational tables to RDF triples with the base IRI http://foo.example/DB/ and prefix rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns# .

image-20211027112228218

1
2
3
4
5
6
7
8
9
10
11
12
13
@base <http://foo.example/DB/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.

<Student/ID=001> rdf:type <Student> .
<Student/ID=001> <Student#ID> "001" .
<Student/ID=001> <Student#sname> "Zhang" .
<Student/ID=001> <Student#major> "101" .
<Student/ID=001> <Student#ref-major> <Major/ID=101> .

<Major/ID=101> rdf:type <Major> .
<Major/ID=101> <Major#ID> "101" .
<Major/ID=101> <Major#mname> "CS" .
<Major/ID=101> <Major#address> "CS_Building" .

5. RDB2RDF: R2RML

image-20211027114012008

  • R2RML is a language for specifying mappings from relational to RDF data.

5.1 DV

  • 可以理解为物理表的一个虚表,没有实际物理内存

  • A mapping takes as input a logical table, i.e.,

    • a database table
    • a database view (a virtual table collecting data from relational tables), or an SQL query (called an “R2RML view” because it is like an SQL view but does not modify the database)

image-20211027114215875

Example: database view

image-20211027114357042

5.2 A triples map

5.2.1 Def

  • A logical table is mapped to a set of triples by a rule called triples map.

5.2.2 A triples map has three parts:

  • the input logical table
  • a subject map
  • several predicate-object maps (combining predicate and object maps).

5.3 Example:

image-20211027114825313

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Example:
@prefix rr: <http:l//www.w3.org/ns/r2rml#> .
<TriplesMap1>
a rr:TriplesMap; #<TriplesMap1>前没有‘#’时要加这一句

rr:logicalTable [rr:tableName "Person"];

rr:subjectMap[
rr:template "http://www.ex.com/Person/ID={ID}";
#ID={ID} 直接{ID}都是可以的 看你自己怎么定义template
rr:class <http://www.ex.com/Person>;#表示从url拿class
];

rr:predicateObjectMap [
rr:predicate <http:7/www.ex.com/Person#NAME>; #表示从url拿predicate
rr:objectMap [rr:column "NAME"]; #表示从db中取
].

解析

1
2
#What is being mapped
rr:logicalTable [rr:tableName "Person"]; #定义rr的指向
1
2
3
4
5
6
7
#相当于<Subject URI> rdf:type <Class URI>
rr:subjectMap[
rr:template "http://www.ex.com/Person/ID={ID}"; #predicate URI
#Customized Subject URI
rr:class <http://www.ex.com/Person>;
#Customized Class
];
1
2
3
4
rr:predicateObjectMap [
rr:predicate <http:7/www.ex.com/Person#NAME>; #Predicate URI
rr:objectMap [rr:column "NAME"]; #Object Literal
].

5.4 R2RML Examples

5.4.1

  • DB

image-20211027115000107

  • Set of RDF triples
1
2
3
4
5
6
7
8
<http://data.example.com/employee/7369> rdf:type ex:Employee.
<http://data.example.com/employee/7369> ex:name "SMITH".
<http://data.example.com/employee/7369> ex:department <http://data.example.com/department/10>.

<http://data.example.com/department/10> rdf:type ex:Department.
<http://data.example.com/department/10> ex:name "APPSERVER".
<http://data.example.com/department/10> ex:location "NEW YORK".
<http://data.example.com/department/10> ex:staff 1.
  • R2RML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ex: <http://example.com/ns#>.
<#TriplesMap1>
rr:logicalTable [ rr:tableName "EMP" ];

rr:subjectMap [
rr:template "http://data.example.com/employee/{EMPNO}";
rr:class ex:Employee;
];

rr:predicateObjectMap [
rr:predicate ex:name;
rr:objectMap [ rr:column "ENAME" ];
].

image-20211027115000107

5.4.2 View Definition

image-20211027155406385

1
2
3
4
5
6
7
<#DeptTableView> rr:sqlQuery """
SELECT DEPTNO,
DNAME,
LOC,
(SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF #查询两张表中同一个字段相同的个数
FROM DEPT;
""".

5.4.3 Mapping to a View Definition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<#TriplesMap2> 
rr:logicalTable <#DeptTableView>;

rr:subjectMap [
rr:template "http://data.example.com/department/{DEPTNO}";
rr:class ex:Department;
];

rr:predicateObjectMap [
rr:predicate ex:name; #对于rdf的Property
rr:objectMap [ rr:column "DNAME" ];
];

rr:predicateObjectMap [
rr:predicate ex:location;
rr:objectMap [ rr:column "LOC" ];
];

rr:predicateObjectMap [
rr:predicate ex:staff;
rr:objectMap [ rr:column "STAFF" ];
].

image-20211027115000107

5.4.4 Linking Two Logical Tables

image-20211027160115570

1
2
3
4
5
6
7
8
9
10
11
12
13
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ex: <http://example.com/ns#>.
<#TriplesMap1>
rr:predicateObjectMap [
rr:predicate ex:department; #与rdf对应
rr:objectMap [#定义宾语映射,在第二张表找
rr:parentTriplesMap <#TriplesMap2>; #去map2去找subject来当宾语
rr:joinCondition [#表示这些属性是相同的
rr:child "DEPTNO";#triple2就是child
rr:parent "DEPTNO";#triple1就是parent
];
];
].
  • Additional predicate object map for <#TriplesMap1>
  • Object map retrieves subject from parent triples map by joining along a foreign key relationship
  • It joins
    • the current row of the logical table
    • with the row of the logical table of <#TriplesMap1> that satisfies the join condition 就是说Map1的每一行映射都满足这个条件
  • Note:
    • child = referencing map
    • parent = referenced map

5.5 Exercise

Please write the R2RML triples map to map the following relational database to RDF triples with prefix rr: http://www.w3.org/ns/r2rml# and prefix ex: http://example.com/ns#(for classes and properties).

image-20211027120354890

RDF Triples

1
<http://data.example.com/student/001> ex:name "Zhang". <http://data.example.com/student/002> ex:name "Wang".
1
2
3
4
5
6
7
8
9
10
11
12
@prefix rr:<http://www.w3.org/ns/r2rml#> .
@prefix rr:<http://example.com/ns#> .
<#TriplesMap1>
rr:logicalTable [rr:tabelName "RDB"];
rr:subjectMap[
rr:template "http://data.example.com/student/{ID}";
rr:class ex:Student;
];
rr:predicateObjectMap[
rr:predicate ex:name;
rr:objectMap [rr:column "Name"];
].

6. Summary: RDB2RDF

  • RDB2RDF is to map the content of Relational Databases to RDF.
  • Two W3C RDB2RDF standards: Direct Mapping and R2RML
  • The direct mapping defines a simple and intuitive transformation from RDB to RDF.
  • R2RML is a language for expressing customized mappings (using external ontology vocabularies) from RDB to RDF.
本文作者:Smurf
本文链接:http://example.com/2021/08/15/knowledge%20engineering/7.%20Knowledge%20Graph%20Construction/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可