My Data Science Interview Process
You can use this for more roles than Data Science, but I will be focused on our field for this post. This process is specific to individual contributors. My leadership interview process is entirely different. If there's enough interest, I will write up a post to cover interviewing leaders.
It isn't easy to find quality research on the hiring process. Few studies track long-term employee outcomes and have access to data across multiple companies. There's no evidence to support the effectiveness of any specific hiring process.
There is a lot of evidence to refute standard hiring practices. Over the last decade, several disciplines have examined the relationship between methods and outcomes. Evidence suggests our methods assess talent at the same accuracy as guessing.
After studying the data for a few years, I walked away from standard practices and built a process that works for me. That's what I'm going to cover with my experience as the only evidence of its effectiveness. I want that to be clear going in.
There is evidence to support some of the tactics, but that should not be confused with evidence to support the whole process.
Hiring Starts with The Job Description
In my Business Strategy For Data Scientists course, I spend a full video explaining how to build job descriptions strategically. We must start with business needs instead of a generic definition of each role.
I begin with the Data Science value stream, which explains how the team creates value for the business and allows me to rank the highest value-generating activities. These become my core capabilities.
Research artifacts, novel models and datasets, have the highest business value. The resulting highest value core capabilities are experimental design, experimental execution, and data curation.
Model development and data engineering are on the next tier of value-generating activities associated with research artifacts. I continue until I have defined the capabilities required to execute the workflow. Then I specify the technology stack and required domain knowledge.
I connect workflows to projects as a sanity check. I make sure I have all the capabilities listed to complete those projects and haven't included unnecessary capabilities.
Next, I break down the workflow into three lifecycles, data, research, and model development. You could make a case for a fourth, MLOps. I create roles for each lifecycle and build job descriptions from there.
The Initial Screening Process
I rely heavily on my professional network to create a talent pipeline. I build internal training programs and structured career paths to build a talent pipeline within the business. Those are the lowest effort, most reliable methods to source talent. Use external blind hiring as a last resort.
The better my job descriptions are, the more targeted the candidates applying will be. I read resumes for outcomes first. The candidates who talk about specific business impacts go into a short pile.
I read deeper on those resumes for a blog, GitHub, YouTube channel, publications, patents, and conference speaking. I skip past their resume to those sources. I get a better sense of their work from one of those sources than a resume.
Not everyone has those, so I cannot rely entirely on them. In resumes without external links, I read for project details. I prioritize resumes with overlaps between prior work and upcoming projects. The most significant overlaps go into the short pile.
I set up a phone screen with the top 5 candidates.