160+ Data Science Interview Questions by@alexeygrigorev

March 1st 2020 44,987 reads

A typical interview process for a data science position includes multiple rounds. Often, one of such rounds covers theoretical concepts, where the goal is to determine if the candidate knows the fundamentals of machine learning.

In this post, Iโd like to summarize all my interviewing experience โโโ from both interviewing and being interviewed โโโ and came up with a list of 160+ theoretical data science questions.

This includes the following topics:

- Linear regression
- Validation
- Classification and logistic regression
- Regularization
- Decision trees
- Random forest
- Gradient boosting trees
- Neural networks
- Text classification
- Clustering
- Ranking: search and recommendation
- Time series

The number of questions in this post might seem overwhelming โโโ and it indeed is. Keep in mind that the interview flow is based on what the company needs and what you have worked with, so if you didnโt work with models in time series or computer vision, you shouldnโt get questions about them.

Important: donโt feel discouraged if you donโt know the answers to some of the interview questions. This is absolutely fine.

Finally, to make it simpler, I grouped the questions into three categories, based on difficulty:

- ๐ถ easy
- โโญ๏ธ medium
- ๐ expert

Thatโs, of course, subjective, and itโs based only on my personal opinion.

Letโs start!

Supervised machineย learning

- What is supervised machine learning? ๐ถ

Linear regression

- What is regression? Which models can you use to solve a regression problem? ๐ถ
- What is linear regression? When do we use it? ๐ถ
- Whatโs the normal distribution? Why do we care about it? ๐ถ
- How do we check if a variable follows the normal distribution? โโญ๏ธ
- What if we want to build a model for predicting prices? Are prices distributed normally? Do we need to do any pre-processing for prices? โโญ๏ธ
- What are the methods for solving linear regression do you know? โโญ๏ธ
- What is gradient descent? How does it work? โโญ๏ธ
- What is the normal equation? โโญ๏ธ
- What is SGD โโโ stochastic gradient descent? Whatโs the difference with the usual gradient descent? โโญ๏ธ
- Which metrics for evaluating regression models do you know? ๐ถ
- What are MSE and RMSE? ๐ถ

Validation

- What is overfitting? ๐ถ
- How to validate your models? ๐ถ
- Why do we need to split our data into three parts: train, validation, and test? ๐ถ
- Can you explain how cross-validation works? ๐ถ
- What is K-fold cross-validation? ๐ถ
- How do we choose K in K-fold cross-validation? Whatโs your favorite K? ๐ถ

Classification

- What is classification? Which models would you use to solve a classification problem? ๐ถ
- What is logistic regression? When do we need to use it? ๐ถ
- Is logistic regression a linear model? Why? ๐ถ
- What is sigmoid? What does it do? ๐ถ
- How do we evaluate classification models? ๐ถ
- What is accuracy? ๐ถ
- Is accuracy always a good metric? ๐ถ
- What is the confusion table? What are the cells in this table? ๐ถ
- What is precision, recall, and F1-score? ๐ถ
- Precision-recall trade-off โโญ๏ธ
- What is the ROC curve? When to use it? โโญ๏ธ
- What is AUC (AU ROC)? When to use it? โโญ๏ธ
- How to interpret the AU ROC score? โโญ๏ธ
- What is the PR (precision-recall) curve? โโญ๏ธ
- What is the area under the PR curve? Is it a useful metric? โโญ๏ธ
- In which cases AU PR is better than AU ROC? โโญ๏ธ
- What do we do with categorical variables? โโญ๏ธ
- Why do we need one-hot encoding? โโญ๏ธ

Regularization

- What happens to our linear regression model if we have three columns in our data: x, y, z โโโ and z is a sum of x and y? โโญ๏ธ
- What happens to our linear regression model if the column z in the data is a sum of columns x and y and some random noise? โโญ๏ธ
- What is regularization? Why do we need it? ๐ถ
- Which regularization techniques do you know? โโญ๏ธ
- What kind of regularization techniques are applicable to linear models? โโญ๏ธ
- How does L2 regularization look like in a linear model? โโญ๏ธ
- How do we select the right regularization parameters? ๐ถ
- Whatโs the effect of L2 regularization on the weights of a linear model? โโญ๏ธ
- How L1 regularization looks like in a linear model? โโญ๏ธ
- Whatโs the difference between L2 and L1 regularization? โโญ๏ธ
- Can we have both L1 and L2 regularization components in a linear model? โโญ๏ธ
- Whatโs the interpretation of the bias term in linear models? โโญ๏ธ
- How do we interpret weights in linear models? โโญ๏ธ
- If a weight for one variable is higher than for another โโโ can we say that this variable is more important? โโญ๏ธ
- When do we need to perform feature normalization for linear models? When itโs okay not to do it? โโญ๏ธ

Feature selection

- What is feature selection? Why do we need it? ๐ถ
- Is feature selection important for linear models? โโญ๏ธ
- Which feature selection techniques do you know? โโญ๏ธ
- Can we use L1 regularization for feature selection? โโญ๏ธ
- Can we use L2 regularization for feature selection? โโญ๏ธ

Decision trees

- What are the decision trees? ๐ถ
- How do we train decision trees? โโญ๏ธ
- What are the main parameters of the decision tree model? ๐ถ
- How do we handle categorical variables in decision trees? โโญ๏ธ
- What are the benefits of a single decision tree compared to more complex models? โโญ๏ธ
- How can we know which features are more important for the decision tree model? โโญ๏ธ

Random forest

- What is random forest? ๐ถ
- Why do we need randomization in random forest? โโญ๏ธ
- What are the main parameters of the random forest model? โโญ๏ธ
- How do we select the depth of the trees in random forest? โโญ๏ธ
- How do we know how many trees we need in random forest? โโญ๏ธ
- Is it easy to parallelize training of a random forest model? How can we do it? โโญ๏ธ
- What are the potential problems with many large trees? โโญ๏ธ
- What if instead of finding the best split, we randomly select a few splits and just select the best from them. Will it work? ๐
- What happens when we have correlated features in our data? โโญ๏ธ

Gradient boosting

- What is gradient boosting trees? โโญ๏ธ
- Whatโs the difference between random forest and gradient boosting? โโญ๏ธ
- Is it possible to parallelize training of a gradient boosting model? How to do it? โโญ๏ธ
- Feature importance in gradient boosting trees โโโ what are possible options? โโญ๏ธ
- Are there any differences between continuous and discrete variables when it comes to feature importance of gradient boosting models? ๐
- What are the main parameters in the gradient boosting model? โโญ๏ธ
- How do you approach tuning parameters in XGBoost or LightGBM? ๐
- How do you select the number of trees in the gradient boosting model? โโญ๏ธ

Parameter tuning

- Which parameter tuning strategies (in general) do you know? โโญ๏ธ
- Whatโs the difference between grid search parameter tuning strategy and random search? When to use one or another? โโญ๏ธ

Neural networks

- What kind of problems neural nets can solve? ๐ถ
- How does a usual fully-connected feed-forward neural network work? โโญ๏ธ
- Why do we need activation functions? ๐ถ
- What are the problems with sigmoid as an activation function? โโญ๏ธ
- What is ReLU? How is it better than sigmoid or tanh? โโญ๏ธ
- How we can initialize the weights of a neural network? โโญ๏ธ
- What if we set all the weights of a neural network to 0? โโญ๏ธ
- What regularization techniques for neural nets do you know? โโญ๏ธ
- What is dropout? Why is it useful? How does it work? โโญ๏ธ

Optimization in neuralย networks

- What is backpropagation? How does it work? Why do we need it? โโญ๏ธ
- Which optimization techniques for training neural nets do you know? โโญ๏ธ
- How do we use SGD (stochastic gradient descent) for training a neural net? โโญ๏ธ
- Whatโs the learning rate? ๐ถ
- What happens when the learning rate is too large? Too small? ๐ถ
- How to set the learning rate? โโญ๏ธ
- What is Adam? Whatโs the main difference between Adam and SGD? โโญ๏ธ
- When would you use Adam and when SGD? โโญ๏ธ
- Do we want to have a constant learning rate or we better change it throughout training? โโญ๏ธ
- How do we decide when to stop training a neural net? ๐ถ
- What is model checkpointing? โโญ๏ธ
- Can you tell us how you approach the model training process? โโญ๏ธ

Neural networks for computerย vision

- How we can use neural nets for computer vision? โโญ๏ธ
- Whatโs a convolutional layer? โโญ๏ธ
- Why do we actually need convolutions? Canโt we use fully-connected layers for that? โโญ๏ธ
- Whatโs pooling in CNN? Why do we need it? โโญ๏ธ
- How does max pooling work? Are there other pooling techniques? โโญ๏ธ
- Are CNNs resistant to rotations? What happens to the predictions of a CNN if an image is rotated? ๐
- What are augmentations? Why do we need them? ๐ถWhat kind of augmentations do you know? ๐ถHow to choose which augmentations to use? โโญ๏ธ
- What kind of CNN architectures for classification do you know? ๐
- What is transfer learning? How does it work? โโญ๏ธ
- What is object detection? Do you know any architectures for that? ๐
- What is object segmentation? Do you know any architectures for that? ๐

Text classification

- How can we use machine learning for text classification? โโญ๏ธ
- What is bag of words? How we can use it for text classification? โโญ๏ธ
- What are the advantages and disadvantages of bag of words? โโญ๏ธ
- What are N-grams? How can we use them? โโญ๏ธ
- How large should be N for our bag of words when using N-grams? โโญ๏ธ
- What is TF-IDF? How is it useful for text classification? โโญ๏ธ
- Which model would you use for text classification with bag of words features? โโญ๏ธ
- Would you prefer gradient boosting trees model or logistic regression when doing text classification with bag of words? โโญ๏ธ
- What are word embeddings? Why are they useful? Do you know Word2Vec? โโญ๏ธ
- Do you know any other ways to get word embeddings? ๐
- If you have a sentence with multiple words, you may need to combine multiple word embeddings into one. How would you do it? โโญ๏ธ
- Would you prefer gradient boosting trees model or logistic regression when doing text classification with embeddings? โโญ๏ธ
- How can you use neural nets for text classification? ๐
- How can we use CNN for text classification? ๐

Clustering

- What is unsupervised learning? ๐ถ
- What is clustering? When do we need it? ๐ถ
- Do you know how K-means works? โโญ๏ธ
- How to select K for K-means? โโญ๏ธ
- What are the other clustering algorithms do you know? โโญ๏ธ
- Do you know how DBScan works? โโญ๏ธ
- When would you choose K-means and when DBScan? โโญ๏ธ

Dimensionality reduction

- What is the curse of dimensionality? Why do we care about it? โโญ๏ธ
- Do you know any dimensionality reduction techniques? โโญ๏ธ
- Whatโs singular value decomposition? How is it typically used for machine learning? โโญ๏ธ

Ranking andย search

- What is the ranking problem? Which models can you use to solve them? โโญ๏ธ
- What are good unsupervised baselines for text information retrieval? โโญ๏ธ
- How would you evaluate your ranking algorithms? Which offline metrics would you use? โโญ๏ธ
- What is precision and recall at k? โโญ๏ธ
- What is mean average precision at k? โโญ๏ธ
- How can we use machine learning for search? โโญ๏ธ
- How can we get training data for our ranking algorithms? โโญ๏ธ
- Can we formulate the search problem as a classification problem? How? โโญ๏ธ
- How can we use clicks data as the training data for ranking algorithms? ๐
- Do you know how to use gradient boosting trees for ranking? ๐
- How do you do an online evaluation of a new ranking algorithm? โโญ๏ธ

Recommender systems

- What is a recommender system? ๐ถ
- What are good baselines when building a recommender system? โโญ๏ธ
- What is collaborative filtering? โโญ๏ธ
- How we can incorporate implicit feedback (clicks, etc) into our recommender systems? โโญ๏ธ
- What is the cold start problem? โโญ๏ธ
- Possible approaches to solving the cold start problem? โโญ๏ธ๐

Time series

- What is a time series? ๐ถ
- How is time series different from the usual regression problem? ๐ถ
- Which models do you know for solving time series problems? โโญ๏ธ
- If thereโs a trend in our series, how we can remove it? And why would we want to do it? โโญ๏ธ
- You have a series with only one variable โyโ measured at time t. How do predict โyโ at time t+1? Which approaches would you use? โโญ๏ธ
- You have a series with a variable โyโ and a set of features. How do you predict โyโ at t+1? Which approaches would you use? โโญ๏ธ
- What are the problems with using trees for solving time series problems? โโญ๏ธ

That was a long list! I hope you found it useful. Good luck with your interviews!

The post is based on this thread on Twitter. Do you know the answers? Consider contributing to this github repository!

Join Hacker Noon

Create your free account to unlock your custom reading experience.