Part 4 of The where I interview my heroes. series This is a very special version of the series. Today, I’m talking to The Twice grandmaster: THE Kaggle Discussion grandmaster (Ranked #1), Competition Grandmaster (Ranked #27) and also Kernels Master: Dr. Jean-Francois Puget (kaggle: CPMP). About the Series: I have very recently started making some progress with my . But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me. In this Series of Blog Posts, I talk with People that have really inspired me. Self-Taught Machine Learning Journey The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of learning from. **Sanyam Bhutani:**​ Hello Grandmaster, Thank you for taking the time to do this. Thank you for inviting me. Dr. Jean-Francois Puget: **Sanyam Bhutani:**​ Could you tell the readers about your kaggle journey? How you got started and got addicted to this “legal drug”? I started my professional life with a Ph.D. in machine learning, just to say how important this field is for me. Then I moved to a startup called ILOG to work in a different area. Fast forward, ILOG got acquired by IBM, and I moved back to machine learning. ML had evolved a lot since my Ph.D., and I realised I needed a refresher. I took some online course like Andrew Ng Stanford ML course on Coursera, read quite a bit, but this was not enough. I needed to get up to date on state of the art ML practice, and Kaggle looked like the right place. After having watched Kaggle for a while, I decided to put my toes in the water and started competing in May 2016. I got hooked immediately! Dr. Jean-Francois Puget: You’re the Technical leader for IBM Machine Learning and Optimization offerings. How do you find time for kaggle? Are the kaggle competitions related to the projects at IBM? Sanyam Bhutani: I don’t think you can build if you don’t know how they are used. That’s why it is important to practice machine and data science if you work on tools for machine learning practitioners. Therefore, participating in Kaggle competitions is really useful for my job. This said I also spend quite a lot of my personal time on Kaggle competitions, during evenings, weekends, or vacations. Dr. Jean-Francois Puget: tools learning When you started to kaggle in 2016, you already were an expert in the ML field. Did Kaggle live up to its promise of being challenging? Sanyam Bhutani: Kaggle proved to be way more competitive than I would have imagined. People who don’t enter Kaggle competitions have no idea of how elaborate and advanced winning solutions are. Dr. Jean-Francois Puget: You’ve had amazing results on kaggle over the past few years. What was your favourite challenge? Sanyam Bhutani: For the readers, here is a little snap from Dr. Puget’s kaggle profile. Of course I love the ones I fared well. The one I am most proud of is the recent Talking data competition where I finished 6th alone, ahead of many grandmaster teams. Only 3 people finished with a gold medal in that competition. But the one I enjoyed the most is the 2Sigma New York Apartment Rental, because it had a mix of natural language data and structured data, with lots of room for feature engineering. I only worked 12 days on it, but these were very intense. I recommend this competition to people who want to exercise their feature engineering skills. Dr. Jean-Francois Puget: What kind of challenges do you look for today? How do you decide if a competition is worth your time? Sanyam Bhutani: I am now selecting competitions primarily to learn about a domain I don’t master. For instance, I am entering the TGS Salt detection competition to learn more about image segmentation. Dr. Jean-Francois Puget: How do you tackle a new competition? What are your go to techniques? Sanyam Bhutani: First step is to quickly get a baseline and a submission. This is to clear any bug or any basic misunderstanding. Dr. Jean-Francois Puget: The Second and most important step is to establish a reliable local validation setting. The Goal is to be able to evaluate if a model is better than another model with training data, instead of relying on the public leaderboard score. If you manage to get this, then you can perform as many experiments you want, and you are not bound by the 5 submissions a day limit. Submissions are only used to check that your local validation is reliable, i.e. that when your local score improves then your LB score also improves. The Basic tool for that is cross validation. Mastering cross validation, and how to define folds is a key skill. Make sure you understand when you can use a random fold split, or when you must use some stratification or some time based fold definitions. The Third step is to really understand the metric, and how to approximate it via loss functions. Sometimes it is easy, for instance using mean squared error (mse) if the metric is square root mse (rmse). Sometimes it is tricky when the metric is not differentiable, for instance when the metric is roc-auc for binary classification, or intersection over union for image segmentation. Then comes feature engineering, NN architecture choice, hyper parameter optimization, etc. Jumping to this before the preparation steps above is a loss of time. For the readers and noobs like me who want to become better kagglers, what would be your best advice? Sanyam Bhutani: Read write ups of top teams after each competition ends. It is better if you entered the competition, as you will get more from the write up, but reading all write ups is useful IMHO. That’s when you learn the most on Kaggle. Dr. Jean-Francois Puget: Also, get a decent computing resource, either with on premise machines, or cloud services. Kaggle competitions require more and more computing resources as time goes by. This is a general trend in IT industry anyway. Last, try hard on your side before looking at the shared material. Reusing shared material, especially kernels, is fine if you don’t use them as a black box. Play with what you want to reuse, and modify it to leverage your local validation setting. For instance, lots of shared kernels have no cross validation; you should add it in that case. For the readers who want to take up Machine Learning as a Career path, Do you feel a good kaggle profile and (kaggle) experience is helpful? Sanyam Bhutani: Certainly useful. I am getting interesting job offers now that I am a grandmaster. Kaggle is very visible. But one should not overestimate it either. The skills you learn at Kaggle are very useful, but other skills are also required in the real world. Ability to deal with business stakeholders, ability to gather relevant data, are key skills that are not tested on Kaggle. Indeed Kaggle competitions come with a well defined business problem and with relevant data. Dr. Jean-Francois Puget: Given the explosive growth rate of ML, How do you stay up to date with the State of the Art Techniques? Sanyam Bhutani: I practice, on Kaggle, and also on IBM related ML projects. I also read scientific papers. A good place to get alerts on the interesting material is the KaggleNoobs slack team. Dr. Jean-Francois Puget: What progress are you really excited about in Machine Learning? Sanyam Bhutani: I think that gradient boosted machines (GBMs) like XGBost or LightGBM are the most important development of the last 5 years. Most people would say deep learning. I agree that deep learning is extremely important and exciting, but GBMs are more actionable in the industry right now, as they can be used to replace most predictive models built so far. Dr. Jean-Francois Puget: Do you feel ML as a field will live up to the hype? Sanyam Bhutani: The hype has moved away from ML to DL in recent years, and now to AI. Sure the three domains are closely related, but ML is no longer what makes headlines. This said I don’t think ML/DL/AI will live up to the current hype. While there are true advances, they are oversold and overgeneralized. There will be a dramatic backslash on AI and Deep Learning soon, with yet another AI winter after that. Dr. Jean-Francois Puget: Before we conclude, any tips for the beginners who feel overwhelmed to start competing? Sanyam Bhutani: Don’t be shy, just try it. Use a pseudo instead of your real name, so that you have no fear of damaging your reputation. But be prepared to have your ego hurt because Kaggle is very competitive ;) And use any setback to learn how to do better next time. Dr. Jean-Francois Puget: Thank you so much for doing this interview. Sanyam Bhutani: is the best community for kaggle where you can find Dr. Puget, Other Kaggle Grandmasters, Masters, Experts and it’s a community where even noobs like me are welcome. Kaggle Noobs Come join if you’re interested in ML/DL/Kaggle. If you found this interesting and would like to be a part of , you can find me on Twitter . My Learning Path here

Interview with Twice Kaggle Grandmaster: Dr. Jean-Francois Puget (CPMP)

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

10 Security Products to Protect Your Smart Home

10 Must-Try Open Source Tools for Machine Learning

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Biggest Image Datasets for Computer Vision

10 Best Image Classification Datasets for ML Projects

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

10 Security Products to Protect Your Smart Home

10 Must-Try Open Source Tools for Machine Learning

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Biggest Image Datasets for Computer Vision

10 Best Image Classification Datasets for ML Projects

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps