Some time ago we sat down with Vladimir Rybakov, Head of Data Science at WaveAccess and a master communicator of business to ML teams and ML problems to business people.
We talked about:
Check out the full video of our conversation:
What follows is not a 1-1 transcript but rather a cleaned-up, structured, and rephrased version of it.
You can watch the video to get the raw content.
WaveAccess is quite a big company. We have over 400 people right now, working on custom software development solutions for our clients.
If we’re talking specifically about our data science team, right now it’s about 10 but if you count other people supporting data science and machine learning efforts its probably around 30 to 40.
In terms of projects we are doing, I cannot actually specify any subject field that we focus on because we actually do all sorts of things in text mining, computer vision, forecasting and then just data analysis. Basically everything.
I think the most dominant direction that we see right now is applying machine learning and data science to CRM systems. We get a lot of inquiries about applying machine learning to customer data. If you think about it those could be huge datasets that companies nead to process manually, it’s prone to human mistakes and can be automated.
I think the most dominant direction that we see right now is applying machine learning and data science to CRM systems.
Another big direction is text mining. It’s actually closely related to those CRM systems We also get a lot of computer vision projects.
Some of the largest ones are related to vision actually. Recently we’ve been working on this one where we would analyse the whether people grow crops where they are supposed to based on satellite images.
As you know Russia is quite a big country and if you were to go there by car and check it manually it would take a lot of time and you would probably not find everything.
One thing to note is that we actually had to develop this model from scratch. From my experience a lot of the things that are open-sourced, I’d say 60-70% is actually not very useful. The results shown on the Github readme are cherry-picked and make a good impression but when you try and use it they are not that great.
I’ve been a Head of Data Science for the last 2 years and it took me around 5 years to get to this level.
Quite a few people just randomly ask me: “How, how did this happen? How I was able to become a Head of Data Science so fast”.
But okay, we’ll get to that. When it comes to my background I am a Mathematician. I started studying Physics but decided to change my major and went into practical mathematics.
I think, in my fourth year I learnt about artificial intelligence and I got really interested in it. I started working closely with my professor, we got a couple of grants and I was fortunate enough to spend my master’s degree in the field of data science.
After that I worked for a year as a software developer but I didn’t like it too much. We were working on older systems that were very robust but there was not a lot of room for changing things. I didn’t see myself growing there at all and I decided to change that. I think you start to see a pattern here.
So I decided to try the life of a freelancer, data science freelancer. As I was working alone, it was quite hard because I didn’t actually have anyone to get advice from.
It was 6 years ago and I remember at that time there was not a lot of resources that I could use to learn.
it was a funny project where I had to classify the person’s personality based on the face and ear shape.
Anyhow, it was a funny project where I had to classify the person’s personality based on the face and ear shape.Long story short we didn’t click well with the client and I decided to look for another job.
I was already 25 years old and still quite junior developer. I was searching for a month or so but I managed to get two offers. I remember one was from Electrolux and the other from WaveAccess.
Initially, I was gonna accept that Electrolux offer but then I thought that I really I never liked working at huge corporations with a lot of bureaucracy, and stuff like that. I thought if I go there I will just be another small contributor. WaveAccess was just a little over 100 people then and I figured it would be a way better fit.
So I began working as an Algorithmist, I know it sounds pretty cool.I would develop algorithms in Java that would process graph data.
After some time I decided to go to my CTO and told him: “Ok so I know data science are there any projects that I could help you with”.
After some time I decided to go to my CTO and told him: “Ok so I know data science are there any projects that I could help you with”. His answer was “Not really”.
After some time (his name is Alexander by the way) he came back and told me: “This is your chance, there is this small projects for this large company and we need to create this POC. Can you deliver it in a week?”
I delivered and that’s how my data science career started. I was the first data scientist at WaveAcces, working hard on many projects and learning a ton.
After some time when we got this huge project where we needed to predict routes of ships I got to hire the first person for my team.
It was both exciting and tiring. I still remember this one customer demo that I put together. It was a coding marathon when I stayed at the office for 40 hours straight, without sleep but I got it to work and delivered it on time.
I still remember this one customer demo that I put together. It was a coding marathon when I stayed at the office for 40 hours
Continually we were growing the team and it’s 10 people now, so as you can guess it’s much easier now.
I’d say a big takeaway from this is:
I did not realize how much additional administration work needs to be done in terms of:
So a lot of my time it goes to meetings where:
I also spend a lot of time with the marketing and sales teams, because they need to which projects have we finished lately to use that in their activities.
I’d say if you’re not a person that like a lot of attention Head of Data Science may not be for you.
I’d say if you’re not a person that like a lot of attention Head of Data Science may not be for you. There will be a lot of talking to the clients, to the team, interviewing people etc. You have to feel comfortable in those situations.
I definitely think communication is one of them but in my opinion the most important skill a leader can have is this ability to always have a backup plan.
important skill a leader can have is this ability to always have a backup plan.
If something goes wrong, you are the one who needs to lead people where to go. Tell people how we can fix it.Oftentimes someone on your team will have the solution but if nobody else has it, it is on you to propose something.
So I think a good leader needs to be a great problem solver.You need to have this ability to solve problems that you don’t even fully understand yourself.
In data science, there are a lot of subject fields and you cannot possibly know all of them and there are days when a client comes in and asks you to come up with a solution before you have a chance to think about it thoroughly.
there are days when a client comes in and asks you to come up with a solution before you have a chance to think about it thoroughly.
When that happens, a new request comes in and the subject that is completely unknown to me, I just try to bring it down to some basics and drop all the specifics.
First, you need to understand the core of the problem and bring it down the concepts that you do understand.
First, you need to understand the core of the problem and bring it down the concepts that you do understand. After that, you can come up with a solution based on your experience from the previous projects.
Then I try to ask a lot, as much as I can to understand the problem from their perspective.
Then I try to ask a lot, as much as I can to understand the problem from their perspective.
Finally I try to propose a general solution that is based on some simple building blocks without going into details to much.
Most of the times clients want to feel that you are in control and you know everything which, of course, is not always true but that’s the reality.
With this general solution comes a very rough times and cost estimate with I’d say 30% error.
When clients are happy to proceed I start to really ask a lot of questions. I try to dig into details as deep as I can.
When clients are happy to proceed I start to really ask a lot of questions. I try to dig into details as deep as I can.
From my experience there is no way to understand the project before you actually start working on it. But once you do there is no excuse: you have to understand the problem fully.
I also try to understand their side, understand what they don’t understand and put a lot of effort to talk to communicate with them. I think this is absolutely crucial that both clients and consultants are talking the same language.
This is actually something that we are pursuing right now. We are running workshops with a lot of consultant-client sessions. I think that there are a lot of people on the buying side of Data Science market that do not understand what can be done or what is required to build things and those consulting sessions help business folks see the process from the data science perspective.
Once you have all the people speaking the same language it is way easier to deliver good projects.
Once you have all the people speaking the same language it is way easier to deliver good projects.
Another skill that is very important is taking responsibility.if you don’t take responsibility, you will not grow, it’s that simple.
I think the best thing everyone can do to grow their roles is to go to your boss and ask “what would I need to do to get a raise”. Ask “how can I take more responsibility on the projects I am involved with”,
Figure out how can you bring more value to the company.
Instead of solving a task and be done with it, think how can your solution bring more value. Is there a way to improve the process that needs this task. Can I improve that. Those types of questions.
Another thing is that you need a good boss that will feel ok with you taking more and more responsibility and ownership. I was lucky to have one like that.
Speaking about luck there is this saying in Russian:
“Only those who work hard get lucky”
So yeah I think there are no guarantees but you can put yourself in a position where things can happen for you.
One more thing that I feel is really important is staying up-to date with the latest ideas and results. Things become outdated so fast it is crazy. I think keeping up with all that is a big part of the job description.
I think the most difficult skills to learn were things related to scaling organization from 2 or 3 people to a bigger unit.
I struggled with introducing bureaucracy, administration and structure into our department just because it’s against my nature. At some point, I understood that it was necessary.
as you grow in your organization you manage more and develop less.
Another thing is that as you grow in your organization you manage more and develop less.
I struggled with that because I wanted to develop and there were so many other things that I needed to get done.
But when I understood that I was no longer a developer, I’m the manager it got easier.
At some point it’s just not cost-effective to develop yourself and you have to accept that.
At some point it’s just not cost-effective to develop yourself and you have to accept that.There are so many other, more important things that you should do.
I think I truly understood that as I was taking my post-graduate in management.I wanted to understand how business works. I highly suggest that to anyone willing to grow their data science careers.
I think that in 5 or 10 years those data scientists that devote more time into understanding the business value of what they are creating will stay relevant regardless of what happens with auto ml and things like that.
Right now those are not working very well but they will become better.So you need to contribute more than just fitting models.
I feel that understanding the other side of the business equation is crucial.
Taking this course helped me better communicate and my project management skills got way better. Everything goes much smoother now.
Even if it is quite boring to fill out those plans or building a GANT diagram, it pays off in the long run.
Even if it is quite boring to fill out those plans or building a GANT diagram, it pays off in the long run.
I follow a bunch of smart people on Twitter!
You can learn a lot from them. Whenever a new exciting paper comes up I will know about it from their tweets.
I also read Medium a lot. I think it is probably the best resource you can find on the internet in terms of good and not purely academic articles.
I also read Medium a lot. I think it is probably the best resource you can find on the internet in terms of good and not purely academic articles.
There are some YouTube channels, like Two Minute Papers that I really like. Just recently there was a Video about this Open AI paper where they talked about this concept of model surgery. So they developed this approach where you could look inside the model as it was training.
I found it fascinating.
There’s already a lot of content here, but I think here’s one more thing I want to talk about.
I do a lot of interviews with new candidates and I see a lot of people who take a bunch of courses online and they suddenly believe they are data scientists.
I really think people should get their hands dirty and practice.
I really think people should get their hands dirty and practice. You do your own projects or even go on kaggle and try things you’ve learnt.
Speaking of kaggle it’s a great place. I am not talking about competitions and prizes but rather learning through discussions and kernels where people show their work.You can learn a lot from this.
There is one more thing that I think is really important. It’s not technical but not less valuable.
I strongly believe that you should be a good person. Good to your team and to people around you.
I strongly believe that you should be a good person. Good to your team and to people around you. You know the structure of our department it’s quite horizontal. Of course, the final decision is on me, but I do not put myself over others.
You should set the example, both in terms of knowledge, but also in terms of communication. You are setting the standards and creating the culture of your workplace.Never forget that.
We have this rule in our office.
If somebody has a problem, and in 30 minutes, he doesn’t know how to solve this problem. Then he needs to go and ask others for help.
If somebody has a problem, and in 30 minutes, he doesn’t know how to solve this problem. Then he needs to go and ask others for help.
It’s not that he has to solve the problem in 30 minutes but if after looking for materials online and thinking about it no solution appears he should ask for help. Anything after that 30 minutes is a waste of time.
I want my team to feel comfortable asking others for help. I ask people to explain papers and ideas to me all the time. There is no way I can know all the tricks from all the latest papers. But it is likely that someone on my team understands it.
That said, you have to remember that there is a fine balance here. If you ask for help all the time, you will not grow as a person but if you try to solve everything by yourself you will spend too much time on it.
You know, I saw this meme recently, that said “data science is pain and if somebody tells you differently they are selling you stuff”
“data science is pain and if somebody tells you differently they are selling you stuff”
No, but seriously with all said and done I think the most important part regardless of whether you’re a junior specialist or a head of data science is to remember to have fun and enjoy your work.
the most important part regardless of whether you’re a junior specialist or a head of data science is to remember to have fun and enjoy your work.
I strongly oppose those who start doing stuff because it is a lucrative domain. Do what you enjoy and the success will follow.
This article was originally written by Jakub Czakon and posted on the Neptune blog. You can find more in-depth articles for machine learning practitioners there.