Sorry, it was too easy not to do. This image is the blueprint for a modern model training pipeline and corresponding workflow.
You might remember LinkedIn's feature store called Feathr, which supports an ontology that lays on top of the data, mapping concepts. That is the domain knowledge side of the inputs. Before the business has a formal ontology, this is expert knowledge. LinkedIn custom-built infrastructure to support this type of a pipeline, so they are likely moving in a similar direction.
I wrapped up my last post by listing a few companies that are all moving in similar directions. LinkedIn follows the theme of hard problems at scale. This is the break from data science that is building a new field around applied research.
I also talked about starting with expert systems in my last post. Expert systems capture some expert knowledge in programming logic and generate data from the software's usage and production operations. At this phase, expert knowledge is aggregated into datasets and validated based on outcome quality.
Some expert knowledge is refuted during validation. Analytics improve system logic based on that newly discovered expert knowledge.
At the same time, data is gathered about functionality that is too complex to be supported by traditional programming logic. Simple models can be trained to handle these functional areas and integrated into the expert system.
The features gathered by the expert system are continually expanded through experimentation to support more comprehensive continuous improvement cycles. This phase grows datasets beyond expert knowledge, and ontologies are required to represent this new knowledge.
The path from an early stage project to the point where businesses have input data, and domain knowledge goes through those phases.
In the last post, I detailed the most significant impediment to move businesses past the analytics or descriptive modeling phase. Those models seem to work, and MLOps does an excellent job covering for them when they fail. I talked about the peril that creates for businesses that remain in the legacy data science paradigm instead of maturing to adopt applied research.
Analytical models have significant business value, but they cannot be overextended to solve hard problems and simple problems at scale. Most businesses do not see that cliff until they have hit the ground and are staring up at it.