Part 5 of The series where I interview my heroes.
This is another very special version of the series.
Today, I’m honored to be talking to Dr. Marios Michailidis:
Kaggle Competitions Grandmaster (Ranked#3), Discussions Master (Rank#5).
About the Series:
I have very recently started making some progress with my Self-Taught Machine Learning Journey. But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me. In this Series of Blog Posts, I talk with People that have really inspired me.
The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of learning from.
Sanyam Bhutani: Hello Grandmaster, Thank you for taking the time to do this.
Dr. Marios Michailidis: No problem. Thank you for considering me for this one :)
Sanyam Bhutani: You’re on the Top 3 of the Competitions leaderboard today.
Could you tell the readers about your kaggle journey. What got you interested in competing on kaggle and about your path to becoming #1 in 2016?
Dr. Marios Michailidis: I went to study at Southampton University (to do a Master in Risk Management). Near the end of the last semester, I started joining some entrepreneurship talks in order to get some ideas about what to do next. There was a specific talk about an entrepreneur who described his journey of making a successful business out of horse racing predictions using Logistic regression. I thought it was quite impressive as he basically started with pretty much nothing and ended up making a good living out of it. Then I felt I wanted to have the same “power” to predict the future and I joined the field of predictive modelling. Later on, I joined kaggle too.
Sanyam Bhutani: You’re a Competitive Data Scientist and also an open source contributor to projects: StackNet, KazAnova.
Where does kaggle come in the picture? Is it related to your other projects and the research work?
Dr. Marios Michailidis: Kaggle helps me in various ways:
- Learn new skills, new tools, what’s hot.
- Solve a variety of problems.
- Become part of a (very generous) and open community.
- Collaborate with other experienced people in the field.
- Test/prove my ideas. Benchmark myself and how well I can do in a variety of problems.
- Receive recognition, promote my research.
- Do my job better. The company I work for: H2o.ai, is a leader in the development of software for data science and predictive analytics. Kaggle is a great environment to test our products and ensure they fair well against some of the top data scientists in the field.
- Generally become better in my craft.
Sanyam Bhutani: You’ve had many amazing finishes at many competitions.
Could you tell what was your favourite challenge?
Dr. Marios Michailidis: — -The Acquire valued shoppers’ challenge, It was the first competition I won. I loved working with my then-teammate Gert. Plus the concept of the competition was very close to what I was working on at that point (in recommenders) and my Ph.D.
Sanyam Bhutani: What kind of challenges do you look for today? How do you decide to enter a new competition?
Dr. Marios Michailidis: I enter all! I think the question here should be which challenges I choose NOT to enter! The only challenges I will not enter are those that are very demanding resource-wise (e.g. some of the computer vision ones with close to a TB of data). Other than that I am up for any challenge!
Sanyam Bhutani: What are your first steps and go to techniques when starting out on a new competition?
Dr. Marios Michailidis:
- Understand the problem and the metric we are tested on- this is key.
- Create a reliable cross-validation process that best would resemble the leaderboard or the test set in general as this will allow me to explore many different algorithms and approaches, knowing the impact they could yield.
- Understand the importance of different algorithmic families, to see when and where to maximize the intensity (is it a linear or non-linear type of problem?)
- Try many different approaches/techniques on the given problem and seize it from all possible angles in terms of algorithms ‘selection, hyper parameter optimization, feature engineering, missing values’ treatment- I treat all these elements as hyperparameters of the final solution.
Sanyam Bhutani: For the readers and noobs like me who want to become better kagglers, what would be your best advice?
Dr. Marios Michailidis:
- Dedicate time to learning about data science and actually doing “kaggling”. You will eventually see results, sooner or later. You don’t need a Ph.D. to do this.
- Don’t be intimidated by others’ scores. We all started from the bottom (and occasionally we still find ourselves near there on different problems)!
- Pick up some online coursera classes to enhance your skills.
- Also have a read at this: https://www.linkedin.com/pulse/how-start-data-science-marios-michailidis
Sanyam Bhutani: Given the explosive growth rate of ML, How do you stay updated with the recent developments?
Dr. Marios Michailidis:
- Reading relevant blogs/data science pages
- Attending conferences
- Through networking
Sanyam Bhutani: What are your thoughts about Machine Learning as a field, do think its Overhyped?
Dr. Marios Michailidis: A bit. In the sense that everyone thinks they should somehow apply/use machine learning without exactly knowing what it does or how it can help them ( or even if the really need it — recently I saw a toaster that was using machine learning — god knows for what reason!). However, I believe this field will flourish more in the following years.
Sanyam Bhutani: Before we conclude, any tips for the beginners who aspire to be like you someday but feel completely overwhelmed to even start competing?
Dr. Marios Michailidis: On top of what I added before regarding becoming better on kaggle
Let me rehearse what I had said in a previous interview on what wins kaggle competitions :
- Understand the problem well.
- Discipline; To have a well-thorough and documented approach that you follow strictly and defines all the modelling process/framework from how you cross-validate, select models, avoids overfitting (which requires a lot of …discipline).
- Allow room to try problem-specific things or new approaches within that framework. For example, unless you use deep learning for image classification, you are not likely to go very far.
- The hours you put in.
- Have access to the right tools. Good hardware.
- Make key partnerships. Look for people that are likely to have taken a very diverse approach from you.
- Ensembling (as in combing many different algorithms/approaches to get a better score).
Sanyam Bhutani: Thank you so much for doing this interview.
Kaggle Noobs is the best community for kaggle where you can find Dr. Michailidis, Other Kaggle Grandmasters, Masters, Experts and it’s a community where even noobs like me are welcome.
Come join if you’re interested in ML/DL/Kaggle.
If you found this interesting and would like to be a part of My Learning Path, you can find me on Twitter here.