Building Knowledge Graphs from Messy Enterprise Data

Schröder, Markus

doi:10.26204/KLUEDO/7255

Undocumented enterprise data can easily pile up in companies in form of datasets and personal information. In absence of a data management strategy, such data becomes rather messy and may not fit for its intended use. Since there is often no documentation available, only a limited number of domain experts are aware of its contents. Therefore, for companies it becomes increasingly difficult to use such data to its full potential. To provide a solution, this PhD thesis investigates the construction of enterprise and personal knowledge graphs by semantically enriching messy data with meaning using semantic technologies. Since real world entities and their interrelations are organized in a graph, knowledge graphs serve as a semantic bridge between domain conceptualization and raw data. Spreadsheets are a prominent example of such enterprise data, since they are widely used by knowledge workers in the industrial sector. Two distinct approaches are investigated to construct knowledge graphs from them: a global extraction & annotation method and a local mapping technique. The latter is further complemented with a predictor of mapping rules on messy data. Different human-in-the-loop strategies are considered to include experts depending on their user group. Since non-technical users usually lack understanding of semantic technologies, they need appropriate tools to be able to give feedback. In case of developers, approaches are proposed to close the technology gap between industry and Semantic Web related concepts. Semantic Web practitioners participate with ontology modeling and linked data applications. Enterprise and personal data is typically confidential which is why it cannot be shared with a research community to discuss its challenges. However, for evaluation and reproducibility reasons publicly available datasets are mandatory. The thesis proposes ways to generate synthetic datasets with the goal to be as authentic as possible. Besides that, for internal evaluations a crawler of personal data on desktops is implemented. There are further contributions related to this thesis in diverse domains. One is about the motivation to support users in their daily work using personal knowledge assistants. Others are the agricultural field and the data science domain which also benefit from knowledge graph approaches. In conclusion, this PhD thesis contributes to the construction of knowledge graphs from especially messy enterprise data, while users from different groups take part in this process in various ways.

Author:	Markus Schröder
URN:	urn:nbn:de:hbz:386-kluedo-72551
DOI:	https://doi.org/10.26204/KLUEDO/7255
Advisor:	Andreas Dengel
Document Type:	Doctoral Thesis
Language of publication:	English
Date of Publication (online):	2023/04/25
Year of first Publication:	2022
Publishing Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:	2022/12/06
Date of the Publication (Server):	2023/04/26
Page Number:	XV, 138
Faculties / Organisational entities:	Kaiserslautern - Fachbereich Informatik
CCS-Classification (computer science):	I. Computing Methodologies / I.2 ARTIFICIAL INTELLIGENCE / I.2.4 Knowledge Representation Formalisms and Methods (F.4.1) / Semantic networks
DDC-Cassification:	0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):	Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)

Building Knowledge Graphs from Messy Enterprise Data

Download full text files

Export metadata

Additional Services