Soon we won’t program computers. We’ll train them like dogs.
This is the title of a quite recent Wired article. What they expose there is how, since machines are getting smarter thanks to AI and machine learning, they will understand our language and communication methods, and not vice versa.
As big as a statement as it could seem, recent progress in AI and ML tends to suggest that this could indeed be the future: computers are able to understand “human friendly” data, such as pictures, voice, language, and produce meaningful results, being trained on vast amounts of data.
An excellent and user-friendly example is the recent Teachable Machine experiment by the Google Creative Lab: a browser program where anyone can teach their computer to recognize objects and gestures without writing a single line of code, but instead showing it examples and pressing a button. This playful example, I believe, is a preview of our future interaction with machines.
We strongly believe in these advancements as a way to give everyone access to robotics and AI.
What does this mean for robotics? Robots are becoming more affordable, small and user-friendly over the years: from the giant industrial robots to the small robotic arms that can be purchased by nearly everyone interested in robotics in general. But, even if the prices and material are become more affordable, often the learning curve isn’t. Think about all the creatives and artisans who would love to use a robot in their work as an assistant, but are scared by the amount of code, software engineering and math behind them. What if instead of programming the robot, you could just teach him what to do, just like you would do with a person? This is a fundamental change of paradigm, from programmable robots to teachable robots.
Teaching a robot to fist bump, made during The Big Hack 4.0.
Along with a team of developers, I recently worked on these topics, connecting ideas from AI and Robotics to create robots that can learn from humans. Recent state-of-the art deep learning architectures permit to perform accurate pose estimation from raw images. We used a pretrained model from this GitHub repo, extracting the body pose from the webcam.
This information can then be processed to estimate the angles of the joints, so that the robot can emulate the movement, or just follow the user’s hand movement with its own end-effector. While the depth is still not there, I believe that new models can obtain a sufficiently accurate depth estimation from monocular views, as seen recently in conferences like CVPR, to obtain full 3D motion reconstruction.
Teaching the robot the gestures that executes the actions, simply with examples. Inspired by the Teaching Machine by Google.
After teaching the robots these movements, they can be executed simply with a gesture or a voice command. You put your hand up, and the robot grabs an object. You say a voice command, and the robot moves that object, as simple as that. You can teach those commands as well, just by showing them to the robot. This was inspired by the Teachable Machine experiment, that we re-implemented in Tensorflow and Keras.
This opens a lot of possibilities for artisans, creatives and makers, people think and work with their hands. They could just teach a robot what to do, without writing any code, and interact with it with gestures and voice commands.
It would also helps students of all age in schools to learn more about robotics and AI, having an intuitive and simple way to interact with a robot and control it. Stimulating curiosity is the key to open their minds to the wonders of science and tech.
This is the future of software and human-computer interaction. We strongly believe in these advancements as a way to give everyone access to robotics and AI. These are game-changing technologies, and we aim at a future where robots can cooperate and help humans and, more widely, humanity.