The 3 Data Science Lifecycles and Workflows
The Data, Research, and Model Development Lifecycles. This is a living document that I will update regularly as the field moves forward.
This is a complete workflow description of the 3 main Data Science Lifecycles. Every business will have a customized implementation based on their needs, capabilities, and maturity. The business’s Data Science Value Stream will dictate which parts of the workflow need to be implemented.
Job descriptions should be created based on these workflows. At the highest level, the Data Lifecycle is managed by Data Engineers and Data Librarians. The Research Lifecycle is managed by Researchers and Applied ML Researchers. The Model Development Lifecycle is managed by ML Engineers and MLOps Engineers.
Individual elements of these workflows eventually build out their own roles. In early stages, workflows can be managed completely by the basic roles. Quality and Testing, Product and Requirements, Research Oversight and Review, Platform Architecture, and other roles must be added when workflow steps requires specialized capabilities and/or when level of effort becomes unsustainable for a cross functional role.
Typically, the business implements the Data Lifecycle, then the Model Development Lifecycle, and finally the Research Lifecycle as Machine Learning Maturity increases.
Each lifecycle has basic and advanced elements of the workflow. The workflow fills in as business needs advance.
Each step adds complexity and costs. These must be justified by returns. Bottom line, don’t implement a complex workflow unless it provides obvious value.