“Uncertain Journeys”. Photo by designmilk on Flickr. CC BY-SA 2.0
I’ve worked in the technology industry for 20 years and much has changed in that time. But one thing that hasn’t is the firehose of unfamiliar terminology used, abused, and generally misused, usually in a range of different contexts. These days, it’s easier than ever to look up the definition of new jargon, but, every so often I hit a phrase that leaves me in a loop of Google searches and open browser tabs. Some phrases seem to defy clear description: I’ve already covered “ontology” in a previous article.
Here, I aim to simplify another such example. You too may have heard the phrase “knowledge graph” and turned to Google to find out what it means. Did you ever work it out? If not, read on. All will be explained…
A superficial search will deliver information about a single product: the Google Knowledge Graph, which enhances the results of a Google search. It’s the box on the right hand side of the results page (the top of a search on mobile devices).
The output of Google’s Knowledge Graph following a search for Bertrand Russell
In their 2012 introduction to the Knowledge Graph, Google announced
“… we’ve been working on an intelligent model — in geek-speak, a ‘graph’ — that understands real-world entities and their relationships to one another: things, not strings.”
There is no official information about exactly how Google’s Knowledge Graph works, but it draws upon public sources such as Wikipedia, and also amasses data on what people search for on the web. By the end of 2016 it apparently contained 70 billion connected facts.
Back then to the search for the generic definition of a knowledge graph, rather than for information about a Google product of the same name. Wikipedia doesn’t have a page on that subject, but instead links to:
So, a knowledge graph is something about linked data and semantic technology…some kind of graph-based knowledge representation? But it isn’t clear what it is or what I can do with it.
Fortunately, others have recently noticed the shortage of a clear definition too. A recent paper noted that there are several descriptions, and helpfully tabulated them:
From “Towards a Definition of Knowledge Graphs”.
OK. Quite a range — from reasonably clear to completely opaque. In the same article, after outlining the various definitions above, the authors then go on to propose the following definition of a knowledge graph, highlighting reasoning as a key characteristic:
A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
Is it possible to make that definition clearer? Let’s look at the common factors seen in the table of different definitions above, to see if that leads to a deeper understanding of a “knowledge graph”.
This is the main difference between the terms knowledge graph and knowledge base. The terms are used interchangeably, but they are not necessarily synonymous. While every knowledge graph is a knowledge base, or uses a knowledge base, the key is in the word “graph”.
A knowledge graph is organised as a graph, which is not always true of knowledge bases. The primary benefits of a graph are that connections (relationships) in the data are first-class citizens, you can easily connect new data items as they are injected into the data pool, and, finally, you can easily traverse links to discover how remote parts of a domain relate to each other (there’s a huge value in linking information). A graph is one of the most flexible formal data structures, so you can easily map other data formats to graphs using generic tools and pipelines.
Terminology alert! By “semantic”, I mean that the meaning of the data is encoded alongside the data in the graph, in the form of the ontology. A knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about.
There is an additional benefit in that you can submit queries in a style that is much closer to a natural language, using a familiar domain vocabulary. That is, the meaning of the data is typically expressed in terms of entity and relation names that are familiar to those interested in the given domain. This enables smarter search, more efficient discovery, and narrows the communication gap between data providers and consumers.
The underlying basis of a knowledge graph is the ontology, which specifies the semantics of the data. An ontology is typically based on logical formalisms which support some form of inference: allowing implicit information to be derived from explicitly asserted data. Some of the information inferred can be otherwise hard to discover.
Knowledge graphs being actual graphs, in the proper mathematical sense, allow for the application of various graph-computing techniques and algorithms (for example, shortest path computations, or network analysis), which add additional intelligence over the stored data.
They are easy to extend over time as the schema is not strict or prohibitive in the same way as it is for SQL, which takes us to the final point…
As I’ve already described, knowledge graphs have a flexible structure: the ontology can be extended and revised as new data arrives. This makes it convenient to store and manage data in a knowledge graph if you have use cases where regular updates and data growth are important, particularly when data is arriving from diverse, heterogenous sources. A knowledge graph can support a continuously running data pipeline that keeps adding new knowledge to the graph, refining it as new information arrives.
Knowledge graphs are also able to capture diverse meta-data annotations such as provenance or versioning information, which make them ideal for working with a dynamic dataset. There is an increasing need to account for the provenance of data and include it so that the knowledge can be assessed by its consumers in terms of credibility and trustworthiness. A knowledge graph can answer what it knows, and also how and why it knows it.
Some of the potential applications include: semantic search, automated fraud detection, intelligent chatbots, advanced drug discovery, dynamic risk analysis, content-based recommendation engines and knowledge management systems. If, like me, your preferred way of learning is to dive in and get your hands dirty trying things out, there is a list of potential projects to use the knowledge graph platform of GRAKN.AI.
GRAKN.AI is a database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications. It stores data flexibly and in a way that allows machines to understand the meaning of information in the complete context of their relationships. Check out GRAKN.AI for more information.
OK, so I have hacked a path through the terminology. I think the key points to take away about a knowledge graph are these:
Duh — it’s a graph. Relationships are first-class citizens and the links between data add huge value, as well as flexibility.
It’s semantic or self-descriptive and has a natural language-like representation, making it easy to query and explore.
It’s smart. Inference can uncover hidden knowledge that is too complex for human cognition, and being a graph means you can apply various graph-computing techniques and algorithms.
It’s alive. You can use a knowledge graph to store information in a form that is easy to reuse.
Let me know how you think I did in clearing up the confusion…and what I should take on next!
Many thanks to my colleagues at GRAKN.AI for help in writing this article, particularly Szymon Klarman for helping me find a path through the concepts and difficult words.