Image by Author
If you know how to create a machine learning decision tree, congratulations, you have the same level of code expertise as ChatGPT and the thousands of other data scientists competing for the job you want.
One fascinating trend among hiring managers lately is that raw coding ability just doesn’t cut it anymore. To get hired, you need to go a step above knowing languages, frameworks, and how to search on StackOverflow. You need far more conceptual understanding, and a grasp of today’s data science landscape – including things you think only the CEO of a company should be worried about, like data governance and ethics.
There are many technical and non-technical data science skills that you should know but If you’re having a hard time getting hired, these less common data science skills might be the ticket to getting your foot in the employment door.
Previously, data scientists worked in isolation, in dark underground basements producing models. The models would create predictions or insights; those would be passed onto C-suite execs who would act on them with no understanding of the model that had produced these predictions. (I’m exaggerating a little, but not by that much.)
Today, leadership takes a much more active role in understanding the products of data scientists. That means that you, as a data scientist, need to be able to explain why models do what they do, how they work, and why they came up with that particular prediction.
While you could show your boss the actual code running your model, it’s much more useful (read: employable) to be able to show them how your model works through visualization. For example, imagine you’ve developed an ML model that predicts customer churn for a telecom company. Instead of a screenshot of your lines of code, you could use a flowchart or decision tree diagram to visually explain how the model segments customers and identifies those at risk of churning. This makes the model’s logic transparent and easier to grasp.
Knowing how to illustrate code is a rare skill, but certainly one worth developing. There aren’t any courses yet, but I recommend you try a free tool like Miro to create a flowchart documenting your decision tree. Better yet, try to explain your code to a non-data scientist friend or family member. The more lay, the better.
Image by Author
Many data scientists tend to focus more on model algorithms than on the nuances of the input data. Feature engineering is the process of selecting, modifying, and creating features (input variables) to improve the performance of machine learning models.
For example, if you’re working on a predictive model for real estate prices, you might start with basic features like square footage, number of bedrooms, and location. However, through feature engineering, you could create more nuanced features. You might calculate the distance to the nearest public transport station or create a feature that represents the age of the property. You could even combine existing features to create new ones, such as a “location desirability score” based on crime rates, school ratings, and proximity to amenities.
It’s a rare skill because it requires not just technical know-how, but also deep domain knowledge and creativity. You need to really get your data and the problem at hand, and then creatively transform the data to make it more useful for modeling.
Feature engineering is often covered as part of broader machine learning courses on platforms like Coursera, edX, or Udacity. But I find the best way to learn is through hands-on experience. Work on real-world data and experiment with different feature engineering strategies.
Here is a hypothetical question: imagine you’re a data scientist at a healthcare company. You’ve been tasked with developing a predictive model to identify patients at risk of a certain disease. What is likely to be your biggest challenge?
If you answered, “grappling with ETL pipelines,” you’re wrong. Your biggest challenge is likely to be making sure your model is not only effective but also compliant, ethical, and sustainable. That includes ensuring that any data you collect for the model complies with regulations like HIPAA and GDPR, depending on your location. You need to know when it’s even legal to use that data, how you need to anonymize it, what consent you require from patients, and how to get that consent.
And you need to be able to document data sources, transformations, and model decisions so that a non-expert would be able to audit the model. This traceability is vital not just for regulatory compliance but also for future model audits and improvements.
Where to learn data governance: It’s dense, but one great resource is the Global Data Management Community.
Image from dataedo
“I know data science basically can know statistics, create models, find trends, but if you asked me, I couldn’t think of any real ethical dilemmas, I think data science just spills out the real facts,” said Reddit user Carlos_tec17, wrongly.
Beyond legal compliance, there’s an ethical aspect to consider. You need to ensure that any model you create doesn’t inadvertently introduce biases that could lead to unequal treatment of certain groups.
I love the example of Amazon’s old recruitment model to illustrate why ethics matter. If you’re not familiar with it, Amazon data scientists tried to speed up their hiring workflow by creating a model that could pick out potential hires based on resumes. The problem was that they trained the model on their existing base of resumes, which was very male-dominated. Their new model was biased towards male hires. That is extremely unethical.
We are so far past the “move fast and break things” stage of data science. Now, as a data scientist, you need to know that your decisions will have a real impact on people. Ignorance is no longer an excuse; you need to be fully aware of all the possible ramifications your model could have, and why it makes the decisions it makes.
UMichigan has a helpful course on “data science ethics.” I also liked this book to illustrate why and how ethics crop up in even “number-based” science like data science.
One secret life hack is that the better you know how to market, the easier you’ll find it to get a job. And by “market,” I mean “know how to make things sexy.” With the ability to market, you’ll be better at making a resume that sells your skills. You’ll be better at charming an interviewer. And in data science specifically, you’ll be better at explaining why your model – and the results of your model – matter.
Remember, it doesn’t matter how good your model is if you can’t convince anyone else it’s necessary. For example, imagine you’ve developed a model that can predict equipment failures in a manufacturing plant. In theory, your model could save the company millions in unplanned downtime. But if you can’t communicate that fact to the C-suite, your model will languish unused on your computer.
With marketing skills, you can prove your use and the need for your model with a compelling presentation that highlights the financial benefits, the potential for increased productivity, and the long-term advantages of adopting your model.
This is a very rare skill in the data science world because most data scientists are numbers people at heart. Most would-be data scientists really believe that simply doing your best and keeping your head down is a winning employment strategy. Unfortunately, computers are not the ones hiring you – people are. Being able to market yourself, your skills, and your products is a real advantage in today’s job market.
To learn how to market, I recommend a few beginner, free courses like “Marketing in a Digital World,” offered by Coursera. I especially liked the section on “Offering product ideas that stick in a digital world.” There aren’t any data science-specific marketing courses out there, but I liked this blog post that walks through how to market yourself as a data scientist.
It’s tough out there. Despite there being a projected growth of data scientist employment, according to the Bureau of Labor Statistics, many more entry-level data science aspirants are finding it hard to land a job, as these Reddit posts illustrate. There’s competition from ChatGPT and the layoff vultures are circling.
To compete and stand out in the job market, you have to go above just technical chops. Data governance, ethics, model viz, feature engineering, and marketing skills make you a more thoughtful, robust, and intriguing candidate for hiring managers.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn.