Mining Knowledge Graphs from Text [Note1]

Part 1 Knowledge Graph Primer

What is a Knowledge Graph?

Knowledge Graph = Entities + Relationships + Attributes

Popular Knowledge Graphs(General): Google Knowledge Graph, Microsofy Satori Knowledge Graph
Domain Specific Knowledge Graphs: Microsoft Academic Graph, Linkedin Economic Graph, Common Sense Knowledge Graph

Why Knowledge Graph Important?

For Humans:

Help organize world’s information
Combat Information Overload
Easier for Exploration via Clear Structure
Tool for Supporting Business Decisions

For AIs:

Key ingredient for many AI tasks
Bridge from data to human semantics
Use decades of work on graph analysis

Applications:

QA/Agents
Decision Support
Fueling Discovery

Where Do Knowledge Graphs Come From?

Structured Text : Wikipedia Infoboxes, tables, databases, social nets
Unstructured Text : WWW, news, social media, reference articles
Images
Video : YouTube, video feeds

Knowledge Representation Choices

1) Most knowledge graph implementations use RDF triples (Resource Description Framework)

RDF是一种处理元数据的应用，元数据是指描述数据的数据或者说是描述信息的信息
eg: 书的内容是书的数据，作者的名字、出版社、地址是书的元数据。
RDF的基本构造为陈述(statement)了一个资源-资源具有的属性(attribute)-属性值(value) (即，subject-predicate/relation-object)的三元组。它表现的是一个数据模型。
每一个被描述的资源拥有一个统一资源标识符(URI)。URI可以是URL或者是其他诸如电话号码、国际标准图书编号ISBN和地理坐标等能唯一标识对象的符号。
属性同样也需要用URI来标识，防止同义词造成的混乱。

2) ABox (assertions) versus TBox (terminology)

Tbox是关于概念术语的断言，Abox是关于个体的断言
Tbox声明概念和角色间的包含关系，而Abox是关于个体的实例断言集合，断言包括声明个体是某概念的实例，以及个体之间的二元关系。

3) Common ontological primitives

rdfs:domain, rdfs:range, rdf:type, rdfs:subClassOf, rdfs:subPropertyOf, …
owl:inverseOf, owl:TransitiveProperty, owl:FunctionalProperty, …

RDF是领域无关的，而使用RDFS(RDF Schema)可以定义应用领域所使用的术语和概念。
但是无论是RDF或是RDFS都只能表示二元谓词（连接两个客体的谓词就叫二元谓词），不足以支持web上的复杂应用，因此W3C又发展了Web本体语言(OWL)，OWL是RDF的扩张，有相同的语法结构，可以定义词汇之间的关系，类与类的关系，属性与属性之间的关系等等。

4) Semantic Web
Standards for defining and exchanging knowledge.
Annotated data provide critical resource for automation
Major weakness: annotate everything?
被标注的数据可以为自动化的一些操作提供关键的资源，但是这一点也是它的弱点所在，对于大量的不标准的语义表达，难道要标注所有数据吗。

5) Information Extraction from Text (will be illustrated in Part 2)
Answer to the knowledge acquisition bottleneck
Many challenges:
chunking, polysemy/word sense disambiguation (多义词) , entity coreference , relational extraction

Ref: 【ReadingNotes】知识图谱导学 Knowledge Graph Tutorial - Part 1