Machine Learning Data Ops: An Emerging Need To Automate Data Activities To Support Machine Learning Researchers
Beyond Data Engineering and Data Ops is a new business function required to support Applied ML Research. MLDOps (please come up with a better name) supports the Data Librarian and Applied Researcher. The purpose is to provide new frameworks that support best in class data management.
Data has a lifecycle that begins even before it is gathered. The platform required to manage data across the full lifecycle needs a specialist role to architect, build, buy, and support.
I am building out my first enterprise ML research platform for a client. I have built these in pieces over the last 4 years, but this is the first time I have been able to architect the complete platform. I am learning a lot about the gaps in both roles and platforms.
The MLDOps role manages these high level platform components. There are overlaps between traditional Data Engineering and Data Ops, but you’ll also see the new components.
Keep in mind this is a forward looking platform and business case. I don’t expect enterprise research platforms to spring up next year. I think in the next 2-3 years, the industry will begin to talk about the need, and you’ll see companies provide solutions. I see Amazon, Google, and Microsoft working to incorporate an increasing number of research supporting components into their cloud ML platforms.