Smurf
文章50
标签0
分类6
1. XML

1. XML

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

1.1 Def:

  • A markup language for documents containing structured information.

    用于数据交换的一种标记语言

1.2 Comparison:

1.2.1 XML:

  • Extensible set of tags 标签可以自定义

  • Content orientated 数据与格式分离

  • Standard Data infrastructure 不允许出错
  • Allows multiple output forms 有多种输出格式

    1.2.2 HTML:

  • Fixed set of tags 标签无法自定义

  • Presentation oriented 数据与格式镶嵌
  • No data validation capabilities 允许有error显示
  • Single presentation 单一输出格式

1.3 XML Syntax

  • empty elements can be abbreviated: e.g. can be written as
  • the outermost element is called root element (there is only one)

Example:

1
2
3
4
5
6
7
<?xml version="1.0" encoding="GB2312" ?> <!--版本号,编码-->
<author><!--开始tag-->
<firstName>Guilin</firstName>
<lastName>Qi</lastName>
<email>gqi@seu.edu.cn</email> <!--子元素-->
This is some text inside an XML element. <!--text-->
</author> <!--end tag-->

1.4 XML Attributes:

1.4.1 EP1:

1
<City ZIP=“210000”> Nanjing</City>

1.4.2 EP2:

1
2
3
4
5
6
<author>
<firstName>Guilin</firstName>
<lastName>Qi</lastName>
<email>gqi@seu.edu.cn</email>
This is some text inside an XML element.
</author>

等价于

1
2
3
4
5
<author email=“gqi@seu.edu.cn”>
<firstName>Guilin</firstName>
<lastName>Qi</lastName>
This is some text inside an XML element.
</author>

1.5 规范:

Authoring guidelines:

  1. All elements must have an end tag. 标签有头有尾
  2. All elements must be cleanly nested (overlapping elements are not allowed). 所有元素必须不能重复
  3. All attribute values must be enclosed in quotation marks.
  4. Each document must have a unique first element, the root node.
  5. 大小写敏感

Exercise:

1
2
3
4
5
6
7
8
9
10
11
12
13
<book>
<title>Knowledge Graph</Title> <!--尾标签有误-->
<author>
<firstName>Guilin</firstName>
<lastName>Qi</lastName>
<email>gqi@seu.edu.cn</email>
This is some text inside an XML element.
</author>
<author>
<firstName>Tianxing<lastName>
</firstName>Wu</lastName> <!--嵌套出错-->
<email>tianxingwu@seu.edu.cn</email>
</author><!--缺少</book>-->

1.6 XML插入HTML

1
2
3
4
5
6
7
8
9
<文章> 
<段落><![CDATA[
<html> <head><title></title></head>
<body>
<h1>东南大学</h1>
</body>
</html>]]>
</段落>
</文章>

1.7 XML Namespaces:

  • 为了解决属性相同产生歧义而提出
1
2
3
4
5
6
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples< / h:td>
<h:td>eananas< / h:td>
< / h:tr>
</ h:table>
  • Defining the default namespaces:
1
2
3
4
5
6
7
<table xm1ns="http:// www.w3.org/TR/htm14/ ">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>

1.8 URI format:

1.9 XML Schema:

  • 由于XML过于灵活,所以需要定义一种规范,以便于数据交换

  • 下面为示例代码:

1
2
3
4
5
6
7
8
<?xml version=“1.1” encoding=“utf-16”?>
<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”> <!--可以理解为定义了一个格式-->
<xsd:element name=“author” type=“xsd:string”
minOccurs=“1” maxOccurs=“unbounded”> <!--定义了元素-->
<xsd:attribute name=“email” type=“xsd:string”use=“required”> <!--具体一些属性-->
<xsd:attribute name=“homepage” type=“xsd:anyURI” use=“optional”>
</xsd:element>
</xsd:schema>

2 RDF

2.1 Def:

  • 对网站源数据进行标注,用于机器可读的数据交换。

  • The data model of Semantic Technologies and of the Semantic Web

2.2 URI

  • 为了解决命名模糊问题,RDF也采用URI定义source的形式

2.3 QName

2.3.1 Def:used in RDF as shorthand for long URIs (IRIs)

  • Example:

  • 既可以用QNames形式,也可以用URI形式

2.4 RDF Triple (Statement):

  • 可以发现S P O都可能是Resource

2.4.1 Resources

Def: IRIs 类似命名空间

2.4.2 Literals

Def: 类似一个值,放在尖叫括号外

  • data values;
  • encoded as strings;
  • interpreted by datatypes;
  • treated the same as strings without datatypes, called plain literal;
    • A plain literal may have a language tag;
    • Datatypes are not defined by RDF, but usually from XML Schema.
  • 大致意思是:literals分为有类型的和无类型的,其有类型的类型一般来自于命名空间,对于无类型的,被称为plain literals;其中plain literals又可以被语言标签标注,用于解释literals的语言类型,当然也可以不进行不标注,但注意这两种literals是不同的。
1
2
3
4
5
6
7
8
9
10
<!--Typed Literals:-->
“Beantown”^^xsd:string
“The Bay State” ^^xsd:string
<!--Plain literal and literals with language tags:-->
“France” “France”@en “France”@fr
“法国”@zh “Frankreich”@de
<!--Equalities for Literals:-->
“001”^^xsd:integer = “1”^^xsd:integer
“123.0”^^xsd:decimal = “00123”^^xsd:integer (based on datatype hierarchy)
<!--上面这两种形式等价,因为integer是decimal的父节点-->

  • Does the datatype “德国” equals to “德国” @ zh ?
    • Answer:不相同,因为他们位于的层次结构不同

Blank node

Def: unnamed resource or complex node (later)无名的资源或者复杂的节点,简单来说就是图上空的节点,语义较模糊的位置
  • Representation of blank nodes is syntax-dependent:
    underline+colon+ID (Turtle syntax): _:xyz, _:bn; 下划线加冒号加ID

2.5 RDF Syntax

2.5.1 Turtle

  • list as S P O triples (easy to read)将主谓宾依次列出
  • IRIs are in IRIs在<>中,也就是sources
  • triples end with a full-stop .以 . 结束
  • whitespaces are ignored空白可以省略

  • IRIS直接表示
1
2
3
4
5
<http://dbpedia.org/resource/Massachusets> <http://example.org/terms/captial> <http://dbpedia.org/resource/Boston> . 
<http://dbpedia.org/resource/Massachusets> <http://example.org/terms/nickname>
“The Bay State”.
<http://dbpedia.org/resource/Boston> <http://example.org/terms/inState> <http://dbpedia.org/resource/Massachusets>.
<http://dbpedia.org/resource/Boston> <http://example.org/terms/nickname> “Beantown”. <http://dbpedia.org/resource/Boston> <http://example.org/terms/population> “642109”^^xsd:integer.
  • QName表示:
1
2
3
4
5
6
7
8
@prefix db: <http://dbpedia.org/resource/> 
@prefix dbo: http://example.org/terms/ #预定义QNames

db:Massachusets dbo:capital db:Boston .
db:Massachusets dbo:nickname “The Bay State” .
db:Boston dbo:inState db:Massachusets .
db:Boston dbo:nickname “Beantown” .
db:Boston dbo:population “642109”^^xsd:integer .
  • QName简化书写条例:

    1. Grouping of triples with the same subject using semi-colon ‘;’; 主语相同可用;间隔

    2. Grouping of triples with the same subject and predicate using comma ‘,’.主语谓语相同可用,间隔

1
2
3
4
5
6
7
8
9
@prefix db: <http://dbpedia.org/resource/> 
@prefix dbo: http://example.org/terms/

db:Massachusets dbo:captial db:Boston ;
dbo:nickname “The Bay State” .

db:Boston dbo:inState db:Massachusets ;
dbo:nickname “Beantown” ;
dbo:population “642109”^^xsd:integer .

2.5.2 RDF/XML:

Def: RDF is originally designed on basis of XML (data exchange format on the Web)

  • a lot of tools and libraries support XML

  • Namespaces are used for disambiguating tags;

  • Tags belonging to the RDF language come with a fixed namespace, usually abbreviated “rdf”. rdf有固定的命名空间

image-20210818215502389

  • 可以这么理解,首先要说明这个部分为RDF语句,以及声明这部分所需要使用的命名空间;

  • 然后,定义主体描述内容:主语 谓语 宾语

    • 对于rdf:Description的element包含对resource的描述,并被rdf:about识别
    • ex:publishedBy也蕴含了resource常常用作谓语
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    <rdf:Description rdf:about="http: //semantic-web-book.org/uri"> <!--主语--><ex:title>Foundations of Semantic web Technologies</ex:title>
    <!--谓语 宾语(literals)-->
    <ex :publishedBy>
    <!--并列谓语-->
    <rdf : Description rdf:about="http://crcpress.com/uri">
    <!--上一级的宾语 也是下一级的主语 此处为嵌套结构-->
    <ex : name>CRC Press</ex :name>
    <!--宾语(literals)-->
    </rdf : Description>
    </ex :publishedBy><l rdf :Description>

2.6 RDF表示N元关系

  • 用一个节点中介

  • 利用一个空节点

1
2
3
4
5
6
7
8
9
10
<rdf :Description rdf :about="http: //example.org/Chutney">
<ex :hasIngredient rdf:nodeID="id1"/> <!--不具体指明宾语,而是用属性nodeID定义一个字符串-->
</rdf : Description>

<rdf : Description rdf :nodeID="id1">
<!--紧接上文的nodeID,称为主语resource-->
<ex : ingredient rdf :resource="http : / /example.org/greenMango" />
<ex : amount>1lb</ex : amount>
</rdf : Description>

2.7 RDF vs XML

  • IRIs solve the problem of term meaning. IRIs解决命名重复问题
  • Triple-based data model describe relations or properties among terms. RDF解决数据间的关系

Triple is good and easy to use, but cannot cover all kinds of knowledge! Semantic Web Knowledge Graph

2.8 Exercise

1
2
3
4
@prefix sw: <http://www.semanticweb.org/ontology-9/>
sw:John sw:is_a sw:professors;
sw:has_id sw:987654321;
sw:has_name sw:John Doe.

3. RDFs

3.1 Def

  • 为RDF data提供词汇集,帮助定义RDF schema
    • allows for specifying schema knowledge; ∙
      • Mothers are female
      • Only persons write books
      • is a part of the W3C Recommendation.
  • RDFs为RDF定义一些抽象类别词汇,以便于规范RDF的使用
  • 为何不用XML Schema?
    • 因为XML Schema没有语义semantics
    • 因为其引用的things不能超过document

3.2 RDFS: Class and Instance

  • Given a triple:

    • ex:Semantic Web rdf:type ex:Textbook .

    • Instance and class names cannot be distinguished syntactically with IRIs

      但是rdf不能显示表示这是一种抽象的关系

  • RDFS helps explicitly state that a resource denotes a class:

    • rdfs:Class is the“class of all classes”.

3.3 RDFS: Class Hierarchy (Taxonomy)

3.3.1 rdfs: subClassOf is also reflexive: 自反性

  • ex:Textbook rdfs: subClassOf ex:TextBook .
  • ex:Book rdfs:subClassOf ex:Book .

3.3.2 rdfs: subClassOf can derive class equivalence:等价性

3.4 RDFs可以缩写

  • 大致意思是可以用rdfs:Class同时代替rdf:Descriptionrdf:type,大致是因为rdfs直接包含了class,直接表面了该定义下为一个类

3.5 RDFS: Property and Property Hierarchy

  • 可以进行简答推理

3.6 RDFS: Property Restrictions

  • 谓语有值域与定义域:即主语的取值范围以及宾语的取值范围

  • 谓词的值域可以进行交集与并集

3.7 RDFS: Reification

  • 用空节点表示一种复杂的关系

  • Represent the following sentence graphically by means of the blank node:
    Wikipedia said that Tolkien wrote Lord of the Rings.

3.8 Example: Reasoning with RDFS

  • Given:
1
2
3
4
ex:happilyMarriedWith rdfs:subPropertyOf ex:isMarriedTo . 
ex:isMarriedTo rdfs:domain ex:Person .
ex:isMarriedTo rdfs:range ex:Person .
ex:pascal ex:happilyMarriedWith ex:lisa .
  • 可推出:
1
2
3
ex:pascal ex:isMarriedTo ex:lisa . 
ex:pascal rdf:type ex:Person .
ex:lisa rdf:type ex:Person .

Exercise:

What can be inferred from the following triples using RDFS semantics?

1
2
3
4
5
ex:Postgraduate_student rdfs:subClassOf ex:Student 
ex:Professor rdfs:subClassOf ex:Academic_staff
ex:Supervise rdfs:domain ex:Professor
ex:Supervise rdfs:range ex:Postgraduate_student
ex:John ex:Supervise ex:Mary
1
2
3
4
5
ex:John rdf:type ex:Professor 
ex:Mary rdf:type ex:Postgraduate_student

ex:John rdf:type ex:Academic_staff
ex:Mary rdf:type ex:Studentt
本文作者:Smurf
本文链接:http://example.com/2021/08/15/knowledge%20engineering/2.%20XML%20Ref%20and%20Refs/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可