#3 Explained Research Paper is famous for their cutting edge technology and projects including Self Driving Car, Project Loon (Internet balloon), Project Ara and the list goes on. But alot of research goes behind the scenes, which yields in some interesting research papers that literally gives us access and insight on these fun experiments. Encouraging us to replicate the experiments by ourselves and build further to push the boundaries. Google Project Ara | Source The Robots Project by has published that tries to master the simple task of picking and grasping different shaped objects. Aiming to replicate some common human activities. Learning GoogleX QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation ( ) Source Look at robot no. 6 learning stuff And the success rate is fascinating. This experiment uses 7 robotic arms that ran 800 hours at a course of 4 months to grasp objects placed infront of them. Each uses a RBG camera ( ) with a resolution of 472x472. The closed-loop vision-based control system is based on a general formulation of robotic manipulation as a Markov Decision Process (MDP). image above Same objects with Different Colours | Source To be efficient, Off-policy Reinforcement Learning is used which has the ability to learn from data collected hours, days or even weeks ago. The algorithm is designed by combining two methods: Qt-Opt 1. L arge-scale Distributed Optimization (using multiple robots to train model faster, making it a large-scale distributed system) 2.Deep algorithm ( Q-learning RL technique used to learn a policy, which tells an agent which action to take under which circumstances) What is Qt-Opt? QT-Opt is a combination of large-scale distributed optimization and Q-learning algorithm resulting in that supports continuous action spaces, making it well-suited to robotics problems. Distributed Q-learning algorithm To make the robot not go crazy at their initial attempts, the model is initially trained with offline data which doesn’t require real robots and improves the scalability. For this case, the policy takes an image and returns the sequence on how the arm and gripper should move in 3D space. RESULTS The results gives a unbelievable 96% . grasp success rate Source The model learned many new things that are sophisticated and borderline humane. 1.When blocks are too close to each other and there is no space for the gripper, policy separates the blocks from the rest before picking it up. Source 2.Swatting objects from gripper were not the part of dataset but it automatically repositions the gripper for another attempt. Source I encourage you to read the research paper for more insight. for more Notifications. Follow me on Medium and Twitter #ResearchPaperExplained If you have any Query about the paper or want me to explain your favourite paper, comment below. Clap it… Share it…. and Clap it again. Previous Stories you will Love: _#1 Research Paper Explained_hackernoon.com DeepMind’s Amazing Mix & Match RL Technique _I didn’t know it…_hackernoon.com What the Hell is “Tensor” in “TensorFlow”?