How can Uber deliver food and always arrive on time or a few minutes before? How do they match riders to drivers so that you can always find a Uber? All that while also managing all the drivers?! Well, we will answer exactly that in the video... References ►Read the full article: ►Uber blog post: ►What are transformers: ►Linear Transformers: ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/uber-deepeta/ https://eng.uber.com/deepeta-how-uber-predicts-arrival-times/ https://youtu.be/sMCHC7XFynM https://arxiv.org/pdf/2006.16236.pdf https://www.louisbouchard.ai/newsletter/ Video transcript 0:00 how can uber deliver food and always 0:02 arrive on time or a few minutes before 0:05 how do they match riders to drivers so 0:07 that you can always find a uber all that 0:10 while soon managing all the drivers we 0:12 will answer these questions in this 0:14 video with their arrival time prediction 0:16 algorithm deep eta deep eta is uber's 0:20 most advanced algorithm for estimating 0:22 arrival times using deep learning used 0:25 both for uber and uber eats deep eta can 0:28 magically organize everything in the 0:30 background so that riders drivers and 0:32 food are fluently going from point a to 0:34 point b as efficiently as possible many 0:37 different algorithms exist to estimate 0:40 travel on such road networks but i don't 0:42 think any are as optimized as uber's 0:45 previous arrival time prediction tools 0:47 including uber were built with what we 0:50 call shortest path algorithms which are 0:52 not well suited for real-world 0:54 predictions since they do not consider 0:56 real-time signals for several years uber 0:59 used xgboost a well-known gradient 1:02 boosted decision tree machine learning 1:04 library xjboost is extremely powerful 1:07 and used in many applications but was 1:09 limited in uber's case as the more it 1:11 grew the more latency it had they wanted 1:14 something faster more accurate and more 1:16 general to be used for drivers riders 1:18 and food delivery all orthogonal 1:20 challenges that are complex to solve 1:22 even for machine learning or ai 1:25 here comes deep eta a deep learning 1:28 model that improves upon xg boosts for 1:30 all of those oh and i almost forgot 1:33 here's the sponsor of this video 1:36 myself please take a minute to subscribe 1:39 if you like the content and leave a like 1:41 i'd also love to read your thoughts in 1:43 the comments or join the discord 1:45 community learn ai together to chat with 1:47 us let's get back to the video 1:49 deep eta is really powerful and 1:51 efficient because it doesn't simply take 1:53 data and generate a prediction there's a 1:56 whole preprocessing system to make this 1:58 data more digestible for the model this 2:00 makes it much easier for the model as it 2:02 can directly focus on optimized data 2:05 with much less noise and far smaller 2:07 inputs a first step in optimizing for 2:10 latency issues this pre-processing 2:12 module starts by taking map data and 2:14 real-time traffic measurements to 2:16 produce an initial estimated time of 2:18 arrival for any new customer request 2:21 then the model takes in these 2:23 transformed features with the spatial 2:25 origin and destination and time of the 2:27 request as a temporal feature but it 2:29 doesn't stop here it also takes more 2:32 information about real-time activities 2:34 like traffic weather or even the nature 2:36 of the request like delivery or ride 2:39 share pickup all this extra information 2:41 is necessary to improve from the 2:43 shortest path algorithms we mentioned 2:45 that are highly efficient but far from 2:47 intelligent are real world proof and 2:50 what kind of architecture does this 2:52 model use you guessed it a transformer 2:54 are you surprised because i'm definitely 2:56 not and this directly answers the first 2:59 challenge which was to make the model 3:01 more accurate i've already covered 3:03 transformers numerous times on my 3:04 channel so i won't go into how it works 3:07 in this video but i still wanted to 3:08 highlight a few specific features for 3:11 this one in particular first you must be 3:13 thinking but transformers are huge and 3:16 slow models how can it be of lower 3:18 latency than xg boost well you will be 3:21 right they've tried it and it was too 3:23 slow so they had to make some changes 3:26 the change with the biggest impact was 3:28 to use a linear transformer which scales 3:30 with the dimension of the input instead 3:33 of the input's length this means that if 3:35 the input is long transformers will be 3:38 very slow and this is often the case for 3:40 them with as much information as routing 3:42 data instead it scales with dimensions 3:45 something they can control that is much 3:47 smaller another great improvement in 3:49 speed is the discretization of inputs 3:52 meaning that they take continuous values 3:53 and make them much easier to compute by 3:56 clustering similar values together 3:58 discretization is regularly used in 4:00 prediction to speed up computation as 4:02 the speed it gives clearly outweighs the 4:04 error that duplicates values may bring 4:07 now there is one challenge left to cover 4:10 and by far the most interesting is how 4:13 they made it more general here is the 4:15 complete deep eta model to answer this 4:18 question there is the earlier 4:19 quantization of the data that are then 4:22 embedded and sent to the linear 4:24 transformer we just discussed then we 4:26 have the fully connected layer to make 4:28 our predictions and we have a final step 4:31 to make our model general the bias 4:33 adjustment decoder it will take in the 4:36 predictions and the type features we 4:38 mentioned at the beginning of the video 4:40 containing the reason the customer made 4:42 a request to uber to a render prediction 4:44 to a more appropriate value for a task 4:46 they periodically retrain and deploy 4:49 their model using their own platform 4:51 called michelangelo which i'd love to 4:53 cover next if you're interested if so 4:56 please let me know in the comments and 4:58 voila this is what uber currently use in 5:01 their system to deliver and give rides 5:03 to everyone as efficiently as possible 5:07 of course this was only an overview and 5:09 they used more techniques to improve the 5:11 architecture which you can find out in 5:13 their great blog post linked below if 5:16 you're curious i also just wanted to 5:18 note that this was just an overview of 5:20 their arrival time prediction algorithm 5:22 and i am in no way affiliated with uber 5:25 i hope you enjoyed this week's video 5:28 covering a model applied to the real 5:30 world instead of a new research paper 5:32 and if so please feel free to suggest 5:35 any interesting applications or tools to 5:37 cover next i'd love to read your ids 5:39 thank you for watching and i will see 5:41 you next week with another amazing paper [Music]