Ontology-Based Automated Feature Selection: Why And What It Looks Like

May 14, 2022

∙ Paid

Ontologies are the high-level structure around knowledge graphs. Any dataset can be represented as a relational or graph data structure. In the relational paradigm, keys indicate relationships without context. It’s up to Data Curators to add content by linking metadata to each data point. In a very large database, that quickly becomes unmanageable.

Why? The number of relationships that must be preserved becomes massive, and the way a relational architecture functions, it takes a performance hit every time a new relationship is added. Accessing structure requires an increasing number of joins leading to exponential increases in overhead.

Most companies just throw compute at the problem, but there is a better structure to represent the relationships. Full disclosure, there are performance issues to be resolved here as well. It’s better, not perfect.

In a graph data structure, the edges can hold context. Edges can be traversed using graph search algorithms that are a more efficient mechanism to access data and leverage relationships. It’s a better paradigm for machine learning because it takes a systems approach to data. Models are representations of systems, and as their accuracy increases, they become simulations. Graph data structures are the best match for machine learning use cases and that’s where ontologies add value.

Ontology-Based Automated Feature Selection: Why And What It Looks Like

How Do We Know What Data To Use In Model Training?

This post is for paid subscribers