Saturday, January 7, 2017

20 Basic Interview Questions in Machine Learning and Data Mining

I am sure most people would agree with me that data science is one of the most sought after fields in the job market today, and it is only going to increase with the time. Over the last few years I have interviewed dozens of candidates for data science jobs at various levels - undergraduate, graduate, post-graduate, both full time and part time; and I am often surprised to see how even the most qualified candidates fail to answer even very basic questions in Machine Learning and Data Mining. So I decided to put a list of basic interview questions that might be of help to some of you. In my experience, more than 95% of the candidates failed to answer three out of five questions from the following list, which I think are simply from ML 101.

20 Basic Interview Questions in Machine Learning and Data Mining

  1. What is the difference between inductive learning and transductive learning?
  2. What is the difference between discriminative models vs generative models?
  3. What is overfitting? How can you avoid it?
  4. What is regularization? Why do we need it?
  5. What is cross validation? Why do we need it?
  6. What is the difference between Logistic Regression and Support Vector Machines?
  7. Tell me the convergence properties of K-means Clustering algorithm? Does it converge at all?
  8. What is the difference between Naive Bayes and Logistic Regression?
  9. What is the difference between L2 and L1 regularization?
  10. Neural Network is not something that we discovered in this millennium. They have been around for decades. Then why have they suddenly become so popular now?
  11. What is VC dimension? Why do we need this?
  12. Is Logistic Regression a linear classifier? It is made of Softmax function which does not look like linear?
  13. Heard of RKHS? Explain it? What is the meaning of each character i.e. R,K,H,S in it?
  14. Heard of Representer Theorem? Explain it?
  15. Explain the difference between Bagging and Boosting?
  16. Generalized error consists of two parts i.e. Bais and Variance? What is the effect of Bagging and Boosting on Bias and Variance?
  17. What is class imbalance? How can you take care of it?
  18. What would be the right metric for imbalanced classification problem?
  19. How the depth of tree in Decision tree related to the bias and variance?
  20. High dimensions, is it good or bad, and why?
  21. More to come...


Answers to come in the next post based on the interests.

No comments:

Post a Comment

Generative Adversarial Networks for Text Data

This is my second post in the series of generative models using Deep Learning. My earlier post talked about other form of generative models...