Part 11 of where I interview my heroes. The series During the past few interviews, I’ve had the chance of interacting with , , , and and an . Kaggle Grandmasters Technical Leaders Practitioners Two Distinguished Researchers OpenAI Fellow Today, I’m honored to be talking to one of the greatest contributors of the complete community: The creator of Keras, author of one of the best Deep Learning books: François Chollet. François is currently working as an AI Researcher at Google, he’s also at the core of Keras Devolopment. About the Series: I have very recently started making some progress with my . But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me. Self-Taught Machine Learning Journey In this Series of Blog Posts, I talk with People that have really inspired me and whom I look up to as my role-models. The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of learning from. **Sanyam Bhutani:**​ Hello François, Thank you so much for taking the time to do this interview. It’s an absolute honor to be talking with you. Thanks for having me, it’s my pleasure to answer your questions. François Chollet: **Sanyam Bhutani:**​ Today, you’re working at Google AI Research and you’ve created one of the most popular libraries in Deep Learning. Could you tell us how you got started? What got you interested in Deep Learning? I’ve been into AI in general for a long time, but I got interested in the specific problem of learning stacked, modular, hierarchical representations for visual perception in the late 2000s as a student. Back then, Jeff Hawkins had written a pretty thought-provoking book (On Intelligence) about, among other things, hierarchical information processing in the brain, and some folks at MIT had been working on hierarchical models for visual perception inspired by the human visual cortex, the HMAX family of models. François Chollet: These ideas struck me as profound and correct, so I started to work on my own algorithms. I wasn’t using neural networks back then, I started with stacked feature learners based on matrix factorization. I wanted to learn not just hierarchies of visual features, but hierarchies of visual-temporal features, incorporating correlations over time, not just space. I had some initial practical successes in mid-2012, while I was doing research at the University of Tokyo — I applied my setup to unsupervised few-shot recognition of hand gestures and got some really nice results. A few months later, Hinton’s lab won the ImageNet competition (which my lab at the University of Tokyo had also entered) with a deep convnet trained on GPUs, so I got intrigued by that approach. This is something that people have lost sight of nowadays, but end-to-end differentiable models trained with backpropagation are just one solution to the problem of learning modular-hierarchical representations for perception, and there are alternative avenues that haven’t been explored very much. And that problem itself is just one of the many subproblems that compose the field of AI. Could you share what was the original motivation behind creating Keras? Did you expect it to grow so big? Sanyam Bhutani: No, I definitely didn’t expect it to grow so big. I kind of expected it to make a bit of a splash in the small community of people using deep at the time, in March 2015 (a few thousands of people), but back then no one really expected deep learning would gather such interest over the following few years. François Chollet: learning I started Keras for my own use, pretty much. I was doing natural language processing research at the time, in my free time, and I was looking for a good to work with RNNs. The LSTM algorithm was still largely off the radar back then — now it’s everywhere — but as neural networks were gaining prominence in the machine learning world, a number of people started investigating the use of LSTM for natural language processing. There was no reusable open-source implementation of LSTM at the time, as far as I can tell. So I made one, using Theano, which I had been using for the previous year. And then I made more layers. And that turned into a framework. I named it Keras, I open-sourced it, and it grew from there. tool What made it different at the time was this: it was pretty accessible and easy to use compared to other options, it supported RNNs and convnets (that was a first, I believe), and it made you define models via Python code rather than via configuration files (which had been the most popular approach previously, in particular, that was the approach of Caffe and PyLearn2). both What are your thoughts about Keras being made the default API in TensorFlow 2.0. Why do you feel it is necessary? Sanyam Bhutani: TensorFlow is an extremely powerful framework, but it has long suffered from usability issues, in particular a sprawling and sometimes confusing API. TensorFlow 2 fixes these issues in a big way. Two things are at the center of this effort: eager execution, and the Keras API. Eager execution brings an imperative coding style to TensorFlow, making it more intuitive and easier to debug. The Keras API consolidates usage patterns into one coherent spectrum of really productive and enjoyable workflows, suited to a variety of user profiles, from research to applications development to deployment. I’m really excited about what we’re about to release. But you’ll see for yourself soon! François Chollet: Outside of TF and Keras, what other frameworks do you think look promising? Sanyam Bhutani: I think MXNet looks very promising, together with its high-level API, Gluon, which is very inspired by Keras and Chainer. MXNet leverages a lot of the same ideas as TensorFlow 2 — a blend of eager and symbolic execution. Together with TensorFlow, it’s one of the few frameworks that are actually production-grade and scalable. And MXNet has a lot of engineering firepower behind it — there’s a large team at Amazon working on it. It’s a serious project with some very good ideas and solid execution. François Chollet: For the readers and the beginners who are interested in working on Deep Learning with the dream of working at Google someday, what would be your best advice? Sanyam Bhutani: I don’t think you should tie your dreams to external markers of status, like working for a specific big-name company, or making a specific amount of money, or attaining a specific job title. Figure out what you value in life, and then stay true to your values. You’ll never have to regret a single decision. François Chollet: Could you tell us what does a day in your life looks like? Sanyam Bhutani: It’s not very glamorous, mostly reviewing a lot of code, talking to people, writing design docs, shipping code. I still write quite significant amounts of code. François Chollet: It’s a common belief that you need major resources to produce significant results in Deep Learning. Sanyam Bhutani: Do you think a person who does not have the resources that someone at Google might have access to, could produce significant contributions to the field? There are some category of problems that require industry-scale training resources, for sure. But there is no shortage of problems where a single GPU is all you need to make significant progress. The main thing that’s holding back AI research right now is not a lack of hardware, it’s a lack of diversity of thought. If you have limited resources, don’t spend your time worrying about GPUs, rather, spend it worrying whether you’re working on the right problem and asking the right questions. François Chollet: You’ve been an advocate of Ethics in “AI”. Could you share a few areas that we must pay attention to when building “AI Products”? Sanyam Bhutani: Other people have talked about the ethical issues of machine learning far better than I could. Kate Crawford, for instance, or Meredith Whittaker. I think everyone interested in this should check out their work. François Chollet: Do you feel Machine Learning has been overhyped? Sanyam Bhutani: Definitely, to some extent. I think that machine learning is, in a way, simultaneously overhyped and underrated. On one hand, people tend to vastly overestimate the intelligence and the generalization power of current machine learning systems, perceiving machine learning as a kind of magic wand that you can wave at arbitrary problems to make them disappear. This is, of course, largely false, there is very little actual intelligence in our algorithms, and their scope of application is extremely narrow. But at the same time, most people still underestimate how much can be achieved with the relatively crude systems we have today, if we apply them systematically to every problem they can potentially solve. Machine learning is, in a way, the steam power of our era: a pretty basic mechanism that nonetheless has the potential to profoundly change the world when used at scale. François Chollet: Do you feel a Ph.D. or Masters level of expertise is necessary or one can contribute to the field of Deep Learning without being an “expert”? Sanyam Bhutani: Plenty of significant contributors in the field of deep learning today don’t have a Ph.D. To contribute meaningfully to a field, whether with systems development or with novel research, you absolutely do need to have a certain level of expertise. But you can gain expertise without going through a Ph.D. program, obviously, and having a Ph.D. is not actually a guarantee that you’ve developed meaningful expertise in anything — in theory it should, but as far as I can tell, reality doesn’t align very well with that theory. In fact, unless you aim at becoming an academic, I don’t think getting a Ph.D. is the best path to gaining expertise. The best path is a path that gets you to grow fast, open-endedly. And you will learn the fastest by working on a large variety of projects in situations of teamwork and close mentorship from experts. In practice, the typical Ph.D. program looks nothing like that. François Chollet: Before we conclude, any advice for the beginners who feel overwhelmed to even get started with Deep Learning? Sanyam Bhutani: In 10 years you’ll be able to buy a textbook that will neatly sum up every AI advance that has happened from 2010 to 2020. The flood of content being published today may look important, but most of it is noise. Focus on the big questions. François Chollet: Thank you so much for doing this interview. Sanyam Bhutani: If you found this interesting and would like to be a part of My Learning Path , you can find me on Twitter here . If you’re interested in reading about Deep Learning and Computer Vision news, you can checkout my newsletter here .

Amazon

Google

Twitter

How not to do Fast.ai (or any ML MOOC)

Interview with Deep Learning Researcher and Leader of OpenMined: Andrew Trask

Connect with me on Twitter

Nominated for 2022 - HackerNoon Contributor of the Year - Deep Learning

Nominated for 2022 - HackerNoon Contributor of the Year - Machine Learning

Nominated for 2022 - HackerNoon Contributor of the Year - Computer Science

Too Long; Didn't Read

Interview with The Creator of Keras, AI Researcher: François Chollet

Interview with The Creator of Keras, AI Researcher: François Chollet

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

A Full Time ML Role, 1 Million Blog Views, 10k Podcast Downloads: A Community Taught ML Engineer

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps