Part 19 of where I interview my heroes. The series Index to “Interviews with ML Heroes” This is a special first in the interview series. Today I get to interview a great from my homeland (India). kaggler I’m honored to be talking to Kernels (Ranked #1, kaggle: @sudalairajkumar) and Competitions GrandMaster (Ranked #140), Discussions Expert: (Ranked #53): Sudalai Rajkumar Sudalai Rajkumar has completed his executive course in Business Analytics and Intelligence from Indian Institute of Management-Bangalore, he has a background with a BE from PSG College of Technology He is currently working as a Data Scientist at H2O.ai, before H2O.ai he had worked at various other companies in key positions: as a Lead Data Scientist at Fresh works, Tiger Analytics and lead of R&D at Global Analytics. About the Series: I have very recently started making some progress with my . But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me. Self-Taught Machine Learning Journey In this Series of Blog Posts, I talk with People that have really inspired me and whom I look up to as my role-models. The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of from. learning **Sanyam Bhutani:**​ Hello Grandmaster, Thank you for taking the time to do this. Hello Sanyam, the pleasure is mine too. Sudalai Rajkumar: Currently, You are crowned the king of Kaggle kernels with Rank #1, you’re a Comp GrandMaster as well as a Discussions Expert. Sanyam Bhutani: Can you tell us how did you get interested in Machine Learning and in kaggle? I have an interest in finding patterns right from my childhood which eventually lead me to take up a job in analytics field over the core engineering field. So I started taking up MOOC courses to gain knowledge in machine learning. I was able to get a theoretical understanding from all these courses but I was not sure how to use all of them. So I was looking for an opportunity to try them out. That is when I got introduced to Kaggle to get some hands-on experience. Sudalai Rajkumar: You’re currently working as a Data Scientist at H2O.ai and have been working in the Data Science space during the past few years. Sanyam Bhutani: Where does kaggle come in the picture? Is it related to your other projects? Yes, it started as a way to learn new concepts in the field. I started to work on Kaggle problems after my office hours. Sudalai Rajkumar: H2O.ai is working on many exciting projects, could you tell us more about your role at H2O.ai? Sanyam Bhutani: Yes, H2O is working on multiple exciting projects and there are several wonderful people in the company. Currently, I am working on the Natural Language Processing side of Driverless AI. Driverless AI is an automated machine learning platform and you can read more about it Sudalai Rajkumar: here. You’ve had many amazing finishes on competitions.Could you tell what was your favorite challenge? Sanyam Bhutani: It was the in Kaggle. I got an awesome chance to team up with and we finished second on that one. It was my first gold medal in Kaggle and I learned a lot of new concepts working with him. It also gave me a lot of confidence that I can do well in the competitions. Sudalai Rajkumar: Rainfall Prediction Competition Marios You’ve had great results-both in solo finishes and team finishes. Sanyam Bhutani: For a noob kaggler-What tips do you have when forming a team or not? Teaming up in competitions is definitely a great way to exchange ideas and learn new concepts. My tip would be to not team up with someone who is far ahead in the competition leaderboard or someone who is far below. In the former case, that person would have already done most of the things and we won’t get to learn too much and in the latter case, it might be hard for the other person to catch up with us. So it is better to team up with someone in the same rank range in the leaderboard to have a better learning experience. Also, it is good to team up with a person who has ideas/models different from what we have. Sudalai Rajkumar: What kind of challenges do you look for today? How do you decide to enter a new competition? Sanyam Bhutani: Honestly, I am not doing many challenges these days. I am trying to do some image competitions off late to learn more about them. I do not have much experience in this field. Sudalai Rajkumar: What are your first steps and go to techniques when starting out on a new competition? Sanyam Bhutani: The first step would be to do an exploratory data analysis and understand the data. Then I will try to create a good validation methodology. Then the next step would be to create a baseline model using given features (Light GBM mostly for structured data and Deep learning ones for unstructured data), Make a submission and make sure that the pipeline and the cross-validation are working fine. Sudalai Rajkumar: Currently, you’re the King of Kernels, being Ranked #1. Sanyam Bhutani: Can you give us an insight into what efforts go into your kernels? What’s your workflow like when writing kernels? Most of the kernels I wrote are exploratory in nature. So now I have a code base for different types of plots which helps me write those kernels faster. Once the dataset is released, I generally try to look at the data and see if there are any interesting patterns in the data. So most of my efforts go into finding interesting signals in the data and looking for the best plots to represent the same. I also constantly look at other people’s kernels to learn new ideas to represent the data, new tools to plot the data and so on. Sudalai Rajkumar: What suggestions do you have for beginners who want to write great kernels? Sanyam Bhutani: Kindly read multiple good kernels and try to understand them in detail. Learn how they create insights from the data, the plots they have used to portray the data, the inferences that they have come up with. It is also a good idea to take up a new concept (like a new algo or a novel technique) and educate people about the same. I personally do not like the kernels which just blends the output of two or three other kernels and get a high score. Sudalai Rajkumar: For the readers and noobs like me who want to become better kagglers, what would be your best advice? Sanyam Bhutani: Some very valuable points are Sudalai Rajkumar: Create a generic code base which will be helpful in the long term. Learn to look at the data and to do feature engineering. Look at the forums/discussion channel for more ideas and better understanding. Kaggle kernels are immensely helpful and so make use of the same whenever possible. Iterate ideas quickly — fail fast and learn fast. Only 1 out of 10 ideas work in general and so do not give up. Use a reasonable system — use cloud if necessary. Choose the right competition. Put in your heart out. The general opinion is that Machine Learning opportunities in India are currently very sparse for a fresher’s position. Sanyam Bhutani: What advice would you give to the junior data scientists who want to take up a job in the field? Apart from theoretical knowledge, companies also started looking at other related activities like GitHub projects, hackathon performances, open source contributions, blogs, meetups, internships and so on. So it is better to build a machine learning portfolio to showcase our potential and grab good opportunities. Also, this is a fast-changing field and so it is necessary to keep us updated with the latest happenings. Sudalai Rajkumar: Given the explosive growth rate of ML, How do you stay updated with the recent developments? Sanyam Bhutani: Most of the ML research community people are active on twitter and share any prominent developments in this field. I mostly follow them to know about the latest happenings in this field and keep me updated. There are quite a few good people in LinkedIn as well who share such things. Sudalai Rajkumar: What developments in the field do you find to be the most exciting? Sanyam Bhutani: Since am working on the NLP side of things currently, the development on the transfer learning models for natural language tasks are very exciting for me this last one year. So hopefully we will be able to accomplish more applications on the language side in the upcoming days. Sudalai Rajkumar: What are your thoughts about Machine Learning as a field, do think its Overhyped? Sanyam Bhutani: There might be a bit of overhype about ML as a field due to its sudden surge and so on. But I think it is going to stay here for a long time and change the way things are getting done. Sudalai Rajkumar: Before we conclude, any tips for the beginners who aspire to be like you someday but feel completely overwhelmed to even start competing? Sanyam Bhutani: It is always good to get into the water to learn swimming ;) So do not worry about anything else and start getting your hands dirty with data. It is the best way to learn things. All the very best! Sudalai Rajkumar: Thank you so much for doing this interview. Sanyam Bhutani: If you found this interesting and would like to be a part of , you can find me on Twitter . My Learning Path here If you’re interested in reading about Deep Learning and Computer Vision news, you can checkout my . newsletter here

Twitter

Interview with Radiologist, fast.ai fellow and Kaggle expert: Dr. Alexandre Cadrin-Chenevert

RTX 2080Ti Vs GTX 1080Ti: FastAI Mixed Precision training & comparisons on CIFAR-100

Connect with me on Twitter

Interview with Twice Kaggle GrandMaster and Data Scientist at H2O.ai: Sudalai Rajkumar

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps