Enterprise Knowledge Management: Introduction To Ontology Engineering For Machine Learning Use Cases
In part 1, I opened the door with the basic concepts that explain what an ontology is and how it supports the transition from enterprise data management to enterprise knowledge management. This article covers the ontology engineering process and my approach to incrementally developing them. The tools landscape is also part of this article.
I intended to write a post about standard ontology engineering but realized how few of those processes apply to data engineers and scientists. We need an applied process, and learning best practices for a different field doesn’t make sense. I will explain the ontology development process using a case study from a project I worked on.
As I explain the process, two main points will emerge. First, some ontologies have causal structures. I call ontologies cheat codes for machine learning because, once discovered, those causal structures can be used to build highly reliable models and explain the inference they serve. Second, some ontologies change over time. In data science, we call this drift. This article covers a mostly static ontology. The next in the series will introduce dynamic ontologies with a different use case.
The payoff for all this is a more mature data engineering framework that manages knowledge instead of data. Knowledge engineering and management have direct ties to ROI. Managing aggregated knowledge graphs vs. sprawling data sets also costs less.
The first section is a giant asterisk about defining knowledge in the scope of data engineering or enterprise data management vs. defining knowledge more rigorously. Applied data science gets away with practices that will not fly anywhere else. It’s critical to call these out so we move forward with our assumptions laid out.
The next sections introduce ontology engineering basics with a supplier discovery case study:
Knowledge Organization Strategies
Building Knowledge Representations For Enterprise Data
Data Mapping and Integration (AKA Data Harmonization)
Causal Structures
I cover tools and standards immediately after. This section has links to more detailed information and tutorials. I intend to provide resources to frame where these tools belong in the ontology engineering process and why they are necessary.
The last section introduces dynamic ontologies and sets up the next article in the series, which explains dynamic ontologies in greater depth and introduces LLMs into the development process.
Defining Knowledge – The Elephant I Want You To Ignore
Nothing that comes next is as simple as I will present it, and I would be negligent to ignore the vast body of work these concepts rest on. Ontologies walk the line between science and a deeper exploration of knowledge and the nature of reality. Epistemology is a rabbit hole where Descartes and Plato live. And I will step over it like most data scientists do.
I begin and end this article with epistemology. I’m about to define a process for managing knowledge without defining what knowledge is in the first place. Pay no attention to that elephant staring us down while I continue. At the end of the article, I will return to epistemology and confront two applications we must engineer solutions for, the dynamic nature of some ontologies and reconciling conflicting ontologies.
For this article, knowledge is constrained by business context, which is the domain expertise required to operate a business. Think in terms of workflows and the expertise required to complete those workflows and deliver high-quality work products. Workflows have two sides, the tasks and the decisions made as part of those tasks.
Mostly static ontologies have few decisions. When decision-making enters a workflow, the rate of change or dynamism increases dramatically. This use case has very low dynamism, so I can focus on the fundamentals of ontology engineering. At the same time, you’ll get a sense of how valuable small, static ontologies are.