The Value Of Experience With Maintaining And Improving Models For Junior Data Scientists
You'll learn a lot about Machine Learning building models, but you'll learn more by maintaining them for 6+ months. This should be part of every curriculum, but it's usually ignored, so here's a thread to help fill in the gap yourself.
1. Build something you'll frequently use yourself. Fitness and productivity are great use cases to think about here. Find a data science problem that you have and implement a solution. Commit to using it for over 6 months.
These projects are excellent because:
You're the data source
You understand the problem like a user does
Your usage drives continuous improvement and new features
You'll learn from 3 perspectives:
Data Scientist, as you build
MLOps as you deploy and support
User as the target audience
You'll realize that value is in the eye of the user, not either of the other roles.
2. Build something your friends, colleagues, or family will use. Listen for problems they complain about and find one that is a data science problem meaning it can't be solved using traditional software.
Monitor usage. You'll quickly see that users tell you the product's terrific but often abandon it in silence. That will force you to reengage users and see how hard it is to get them back once they've left.
Getting it right the first time is more important to long term success than most data scientists realize. Yes, the model will improve over time, but few users are willing to deal with imperfect products for very long.
This group of users will be far more forgiving than most, so it'll be easier to get them back the second time. You'll learn to start with users first, then build something with them at the center instead of the model or technology at the center.
3. Build something you see a community or business need for. These are difficult to find, but if it's possible, you're working or have a good professional network.
The first thing you'll learn is how to listen and solicit user needs.
You're thinking, 'That's a product management thing.' Traditionally, but knowing how to do this will make you stand out from the crowd. Identifying opportunities to apply data science and create value for users will get you promoted quickly.
Either open source the project or, release it for sale if you're feeling ambitious. Open source projects will force you to continuously improve model functionality and reliability.
Releasing it for sale introduces you to marketing and selling. That's important if you're thinking about opening a business of your own down the road. It's not worth the effort unless you want to do a trial run while learning in the process.
In all 3 cases, you'll learn about reliability. It starts in the data curation phase. Most projects don't need massive datasets. Small, targeted, high quality is more efficient.
Having users makes you see that train, test, validate, deploy isn't a viable workflow. Those models need constant maintenance. You don't have time to continually tweak every time something comes up.
You'll move towards experimentation and more rigorous model validation methods (beyond statistical accuracy). Users will help you realize why we use data science best practices to reduce model maintenance level of effort.
Maintenance shows you the weaknesses of each approach and model architecture. Most curricula only teach what models do well, and that's a massive gap.
You'll be user-centric and more disciplined. It's worth the effort to maintain models.
I’m curious - how do your engagements go with clients when it comes to deploying models? Do you stay “on the hook” for a period of time after models are deployed as part of a project?