Part 1 Knowledge Graph Primer
- What is a Knowledge Graph?
Knowledge Graph = Entities + Relationships + Attributes
Popular Knowledge Graphs(General): Google Knowledge Graph, Microsofy Satori Knowledge Graph
Domain Specific Knowledge Graphs: Microsoft Academic Graph, Linkedin Economic Graph, Common Sense Knowledge Graph
- Why Knowledge Graph Important?
For Humans:
- Help organize world’s information
- Combat Information Overload
- Easier for Exploration via Clear Structure
- Tool for Supporting Business Decisions
For AIs:
- Key ingredient for many AI tasks
- Bridge from data to human semantics
- Use decades of work on graph analysis
Applications:
- QA/Agents
- Decision Support
- Fueling Discovery
- Where Do Knowledge Graphs Come From?
- Structured Text : Wikipedia Infoboxes, tables, databases, social nets
- Unstructured Text : WWW, news, social media, reference articles
- Images
- Video : YouTube, video feeds
- Knowledge Representation Choices
1) Most knowledge graph implementations use RDF triples (Resource Description Framework)
RDF是一种处理元数据的应用,元数据是指描述数据的数据或者说是描述信息的信息
eg: 书的内容是书的数据,作者的名字、出版社、地址是书的元数据。
RDF的基本构造为陈述(statement)了一个资源-资源具有的属性(attribute)-属性值(value) (即,subject-predicate/relation-object)的三元组。它表现的是一个数据模型。
每一个被描述的资源拥有一个统一资源标识符(URI)。URI可以是URL或者是其他诸如电话号码、国际标准图书编号ISBN和地理坐标等能唯一标识对象的符号。
属性同样也需要用URI来标识,防止同义词造成的混乱。
2) ABox (assertions) versus TBox (terminology)
Tbox是关于概念术语的断言 ,Abox是关于个体的断言
Tbox声明概念和角色间的包含关系,而Abox是关于个体的实例断言集合,断言包括声明个体是某概念的实例,以及个体之间的二元关系。
3) Common ontological primitives
- rdfs:domain, rdfs:range, rdf:type, rdfs:subClassOf, rdfs:subPropertyOf, …
- owl:inverseOf, owl:TransitiveProperty, owl:FunctionalProperty, …
RDF是领域无关的,而使用RDFS(RDF Schema)可以定义应用领域所使用的术语和概念。
但是无论是RDF或是RDFS都只能表示二元谓词(连接两个客体的谓词就叫二元谓词),不足以支持web上的复杂应用,因此W3C又发展了Web本体语言(OWL),OWL是RDF的扩张,有相同的语法结构,可以定义词汇之间的关系,类与类的关系,属性与属性之间的关系等等。
4) Semantic Web
Standards for defining and exchanging knowledge.
Annotated data provide critical resource for automationMajor weakness: annotate everything?
被标注的数据可以为自动化的一些操作提供关键的资源,但是这一点也是它的弱点所在,对于大量的不标准的语义表达,难道要标注所有数据吗。
5) Information Extraction from Text (will be illustrated in Part 2)
Answer to the knowledge acquisition bottleneckMany challenges:
chunking, polysemy/word sense disambiguation (多义词) , entity coreference , relational extraction