Image by Author
If you’re preparing for data science interviews, you know how overwhelming it can be to go through all the available resources online. One can easily get lost in the details. That’s why I’m excited to introduce you to a hidden gem of a resource: “The Data Science Interview Book” by Dip Ranjan Chatterjee.
This freely available web-based book covers all the essential topics you need to know for data science interviews, from statistics and model building to algorithms, neural networks, and business intelligence. But what makes it different from other resources is its focus on providing only the relevant information to get you ready for the interview. This makes it the perfect resource for busy data scientists who need to brush up on a wide range of concepts quickly. Here are a few things that I believe make this book unique:
- Real-world interview questions: This book includes real-world interview questions from companies like Google, DoorDash, and Airbnb, along with detailed solutions and case studies.
- Updated content: The book is continually updated with new sections, questions, and richer content.
- Cheatsheets and references: The book includes cheatsheets for quick reference guides for various topics, as well as additional references for those who want to study topics more deeply.
Don’t panic if you encounter a section followed by a ⚠️ symbol. This simply indicates that those sections are still being worked on and are subject to change. Here are the major sections covered in this book:
1. Statistics
This section covers the fundamentals of statistics, which are essential for data analysis and model building. Topics include probability basics, probability distributions, central limit theorem, Bayesian vs. frequentist reasoning, hypothesis testing, and A/B testing.
2. Model Building
This section of the book will guide you through the process of creating a successful model, from data gathering to model selection. It also teaches you the data preprocessing techniques essential for any data scientist, including feature scaling, handling outliers, dealing with missing values, and encoding categorical variables. It also has a subsection on hyperparameter optimization and some famous open-source tools used for it.
3. Algorithms
Algorithms are fundamental to data science, and understanding them is crucial for acing a data science interview. This section covers various machine-learning algorithms and also provides you a practical advice on how to choose the right algorithm for your use case. This section starts with the basics of bias-variance tradeoff, and generative vs discriminative models. Then, it proceeds to advanced concepts of regression, classification, clustering, decision trees, random forests, ensemble learning, and boosting. Additionally, the section also discusses time series analysis and anomaly detection. Finally, it concludes with a comprehensive table on Big O analysis, which covers the time and space complexities of different machine learning algorithms.
4. Python
Python is a versatile language used in data science for various tasks. This section has the following sub-sections:
- Theoretical: It covers some fundamental concepts in Python such as mesh grid, statistical methods, range vs xrange, switch case, and lambda functions.
- Basics: There are some common programming techniques that you must be familiar with to solve Python questions during an interview like lists, tuples, and dictionaries, and understanding control flow using loops and conditionals.
- Coding Algorithms from Scratch: Often, companies ask candidates to code algorithms from scratch during a coding demo round. The general steps for coding an algorithm from scratch are discussed here.
- Questions: It covers some sample questions related to statistics, data manipulation, and NLP.
5. SQL
In data science interviews, SQL queries are often used to evaluate a candidate’s ability to work with data and solve complex problems. This section covers the basics of SQL, including joins, temp tables vs table variables vs CTE, window functions, time functions, stored procedures, indexing, and performance tuning. The Temp Table vs Table Variable vs CTE section explains the differences between these three temporary data structures and when to use each one. You will also learn how to create and use stored procedures. The Performance Tuning section covers various tips to optimize your SQL queries. Overall, it will provide you with a solid foundation in SQL.
6. Analytical Thinking
While the book includes several ongoing sections like Excel, Neural Networks, NLP, Machine Learning Frameworks, Business Intelligence, etc., I’d like to highlight this one specifically. I think it is unique because it covers business scenarios and behavioral management-related questions, which are becoming increasingly important in data science interviews. Companies are not just looking for technical expertise, but also for candidates who can think strategically and communicate effectively.
For example, here is a question that Salesforce asked in one of their interviews:
“As a data scientist at Salesforce, you are speaking with a Product Manager who wants to understand the user base of Salesforce. What would be your approach?”
By going over these scenario-based questions, you will be well-prepared for your interviews.
7. Cheatsheets
Instead of spending hours searching for cheatsheets online, you can find quick and comprehensive guides for topics such as Numpy, Pandas, SQL, statistics, RegEx, Git, PowerBI, Python basics, Keras, and R basics all in one place. These guides are perfect for a quick refresh before an interview or for referencing during a coding challenge.
I completely understand the importance of having a reliable and comprehensive resource to prepare for interviews, and I believe that this book fits the bill. I am sure it will help you succeed. I wish you all the best for your data science preparation journey! In case of any questions, please feel free to reach out to me.
Kanwal Mehreen is an aspiring software developer with a keen interest in data science and applications of AI in medicine. Kanwal was selected as the Google Generation Scholar 2022 for the APAC region. Kanwal loves to share technical knowledge by writing articles on trending topics, and is passionate about improving the representation of women in tech industry.