Using Knowledge Graphs to Find Information in the Age of Big Data

Dr. Corina Dima

Researcher

Institute for Parallel and Distributed Systems (IPVS), University of Stuttgart (Consortium partner in the Service-Meister project)

“Knowledge is intrinsic to the design of AI systems.” (Pan 2017). A company that wants to gain advantages from the use of AI systems must provide them with the knowledge necessary for such systems to learn. Knowledge is typically extracted from data, and data can come in multiple forms, ranging from unstructured data – like the pages of a product manual or the description of a service request - to fully structured data – like the entries in a database. Recently, knowledge graphs have received a lot of attention as a method for organizing data using a graph-structured data model. And they have proven to be a successful vehicle for integrating knowledge into AI systems.

What are Knowledge Graphs?

According to the definition proposed by Hogan et al., 2021, knowledge graphs are “graphs of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities”. The basic building blocks of knowledge graphs are thus entities, e.g. a company, a product, an employee, a location, a customer or an order placed by the customer and relations, e.g. a company produces a product, an employee is employed by a company, an employee works at a location, a customer orders a product, etc.

Each entity can be further specified through attributes – e.g. a product has a name, a serial number, a production date and eventually an expiry date. Every entity can be uniquely identified in a knowledge graph by means of its unique identifier. Entities and relations typically have types that are defined using OWL (W3C Web Ontology Language) and are described using the RDF (Resource Description Framework) data model.

Why use Knowledge Graphs?

The use of knowledge graphs has provided multiple companies with a competitive edge. Examples are Google’s Knowledge Graph, Amazon’s Product Knowledge Graph, Facebook’s Graph API, Microsoft’s Satori and the LinkedIn Knowledge Graph, to name only a few (see Noy et al., 2019 and Hogan et al., 2021 for further examples).

There are also general-purpose knowledge graphs built in an open, collaborative fashion, like Wikidata, DBpedia and Freebase. Or knowledge graphs that were built to cater to the needs of a specific domain, as is the case with UMLS (Unified Medical Language System) and its components in the biomedical domain.

Using knowledge graphs to model the data ecosystem of a company makes it possible to integrate data from different internal sources within a company – e.g. from the production, research, service, human resources departments, with sources that are external to the company – like customer information and information about suppliers.

Another possibility is to model domain data that is relevant for the company’s products – as is the case with Google’s Knowledge Graph and Microsoft’s Satori, which model general interest knowledge for the search engines provided by the companies.

In many cases, new ideas will build upon existing, open-access knowledge graphs and enrich them with further, domain-specific knowledge – both in the industry setting or for academic purposes.

Two Main Challenges for Integrating Knowledge Graphs into a Company’s Workflow

Integrating the various data sources into a coherent knowledge repository.
Aside from the initial process of importing existing data into a knowledge graph, which can entail a substantial modelling effort, it is as important to make sure that the information is constantly updated and maintained. Both aspects are now becoming easier to address with the advent of dedicated, AI-powered solutions allowing the direct integration of data from existing relational databases (Stoica et al., 2020) and the automatic extraction of structured data from unstructured text (Martinez-Rodriguez et al., 2021).
Make the knowledge graph information easily accessible.
This involves minimizing the amount of technical knowledge that a user of the system needs to have before they can productively use the system in their daily activities. Recent methods for question answering over knowledge graphs (see Lan et al., 2021 for a survey) offer the possibility of querying for information in natural language. They are typically equipped for answering both simple information requests (e.g., Who is our point of contact with the company WerkTeile?), requiring the retrieval of a single piece of information from the knowledge graph, as well as complex information requests, requiring the stitching together of several pieces of information to arrive at a precise answer (e.g., Which clients received spare parts produced by the company WerkTeile in November 2020?)

Question Answering Over Knowledge Graphs

Question answering over knowledge graphs (KGQA) aims to answer questions expressed in natural language using the information contained in a knowledge graph. For example, suppose there is a knowledge graph containing the information needed to answer the sample questions above.

The goal of the question answering system is to learn what information from the knowledge graph is requested by the question. This means realizing, in the case of the simple question above, that we are looking for an entity X that is in the relation contact_person to the entity WerkTeile. For more complex questions, the system needs to identify the full chain of relations connecting the entities mentioned in the question to the answer entity. A key challenge is bridging the lexical gap between the way that an entity or relation is expressed in the question and its knowledge graph representation. For example, WerkTeile might be registered as WerkTeile GmbH in the knowledge graph, and the words point of contact, contact or contact person are all reasonable ways of expressing the relation contact_person from the knowledge graph.

To be effective, the system must thus learn how to correctly map the natural language questions to the knowledge graph information. This process is usually done by training machine learning algorithms to perform the mapping, and requires a significant amount of sample questions together with their correct answer in the knowledge graph. In many cases, the datasets for training such systems can be constructed semi-automatically using templates over the knowledge graph (see the datasets from Lan et al., 2021 for examples).

Question answering systems over knowledge graphs offer the promise of a single point of access to the knowledge of a company, without requiring any technical background knowledge. However, an important prerequisite for deploying such systems in the current industry setting is constructing the underlying knowledge graphs, ideally using open, standardized schemes for naming the entity and relations from the industrial domain.

Having seamless access to different stores of data from a company has the potential to optimize existing processing models and to open avenues for new ones. Knowledge graphs are a powerful method for organizing data, and question answering models over knowledge graphs offer an intelligent way for finding answers to questions using this knowledge.

Dr. Corina Dima is a post-doctoral researcher at the Analytic Computing department, Institute for Parallel and Distributed Systems (IPVS), University of Stuttgart, a consortium partner in the Service-Meister project.

Analytic Computing department, Institute for Parallel and Distributed Systems (IPVS), University of Stuttgart

Please note: The opinions expressed in Industry Insights published by dotmagazine are the author’s own and do not reflect the view of the publisher, eco – Association of the Internet Industry.