The blueprint for modern data engineering is Joe Reis and Matt Housley’s book Fundamentals of Data Engineering. One of their core tenets is building a forward-looking data architecture. It’s not enough to build for today’s needs. Decisions today must amplify the technology waves to come and result in infrastructure that supports changing business needs.
The authors’ primary focus was architecture. I’m explaining a different level of architecture that manages the knowledge vs. the storage and movement of data. Most of this framework is architecture and tool-agnostic. There are parallels to mesh, fabric, graph DB, and other concepts, but this isn’t a pitch for any architecture, stack, or platform.
I’m diving into the implementation side of the series. This article explains the framework for analytics and data science-centric data management. Let’s begin by defining a top-level vocabulary for the contents of a data catalog. I’ll use that as a starting point to explain how to transition from data catalogs and dictionaries to the new framework.