Last year we saw NeRF, NeRV, and other networks able to create 3D models and small scenes from images using artificial intelligence. Now, we are taking a small step and generating a bit more complex models: whole cities. Yes, you’ve heard that right, this week’s paper is about generating city-scale 3D scenes with high-quality details at any scale. It works from satellite view to ground-level with a single model. How amazing is that?! We went from one object that looked okay to a whole city in a year! What’s next!? I can’t even imagine. The model is called CityNeRF and grows from NeRF, which I previously covered on my channel. NeRF is one of the first models using radiance fields and machine learning to construct 3D models out of images. But NeRF is not that efficient and works for a single scale. Here, CityNeRF is applied to satellite and ground-level images at the same time to produce various 3D model scales for any viewpoint. In simple words, they bring NeRF to city-scale. But how? Watch the video to learn more! Video Transcript 00:00 last year we first saw nerf then nerve 00:03 and other networks able to create 3d 00:05 models and small scenes from images 00:07 using artificial intelligence now we are 00:09 taking a small step and generating a bit 00:12 more complex models whole cities yes 00:15 you've heard that right this week's 00:16 paper is about generating city scale 3d 00:19 scenes with high quality details at any 00:21 scale it works from satellite view to 00:23 ground level with a single model how 00:26 amazing is that we went from one object 00:28 that looked ok to a whole city in a year 00:31 what's next i can't even imagine but i 00:33 can easily imagine what should be next 00:35 for you your next step as an ai 00:37 professional or student should be to do 00:39 like me and try the sponsor of today's 00:41 episode weights and biases if you run a 00:44 lot of experiments such as playing with 00:45 guns or any models like this one you 00:48 should be using weights and biases it 00:49 made my life so much easier you have no 00:52 idea and it takes not even five minutes 00:54 to set up simply install and import it 00:56 into your code add a line to initialize 00:58 and another to say which metric to track 01:01 and voira you will have all of your 01:03 future experiments in a project where 01:05 you can see all of the input hyper 01:07 parameters output matrix and any 01:09 insights that you and your team have and 01:11 easily compare all of them to find out 01:13 what worked best you can help out the 01:15 channel and give it a try with the first 01:17 link below it's completely free for 01:19 personal use and i promise it will be 01:21 set up in under 5 minutes the model is 01:23 called city nerf and grows from nerf 01:26 which i previously covered on my channel 01:28 nerf is one of the first models using 01:30 radeon's fields and machine learning to 01:32 construct 3d models out of images but 01:35 nerf is not that efficient and works for 01:37 a single scale here city nerf is applied 01:40 to satellite and ground level images at 01:42 the same time to produce various 3d 01:44 model scales for any viewpoint in simple 01:47 words they bring nerf to city scale but 01:50 how i won't be covering how nerf works 01:52 since i've already done this in a video 01:54 you can see in the top right corner of 01:56 your screen right now if you haven't 01:57 heard of this model yet instead i'll 01:59 mainly cover the differences and what 02:02 city nerf brings to the initial nerf 02:04 approach to make it multiscale here 02:06 instead of having different pictures a 02:08 few centimeters apart they have pictures 02:10 from thousands of kilometers apart 02:12 ranging from satellites to pictures 02:14 taken on the road as you can see north 02:16 alone fails to use such drastically 02:19 different pictures to reconstruct the 02:20 scenes in short using the weights of a 02:23 multi-layer perception a basic neural 02:25 network nerf will process all images 02:27 knowing their viewpoint and positions in 02:30 advance nerf will find each pixel's 02:32 colors and density using array from the 02:35 camera so it knows the camera's 02:37 orientations and can understand depth 02:39 and corresponding colors using all the 02:42 arrays together then this process is 02:44 optimized for the convergence of the 02:46 neural network using a loss function 02:48 that will get us closer to the ground 02:50 truth while training which is the real 02:52 3d model that we are aiming to achieve 02:55 as you can see here the problem is that 02:57 the quality of the rendered scene is 02:59 averaged at the most represented 03:01 distances and makes specific viewpoints 03:03 look blurry especially because we 03:05 typically have access to much more 03:07 satellite imagery than close views we 03:09 can try to fix this by training the 03:11 algorithm with different skills 03:12 independently but as they explain it 03:14 causes significant discrepancies between 03:16 successive scales so you will not be 03:19 able to zoom in and have a fluid nice 03:21 looking 3d scene at all times instead 03:24 they train their model in a progressive 03:26 manner meaning that they are training 03:27 their model in multiple steps 03:29 independently where each new step starts 03:31 from the learned parameters of the 03:33 previous step these steps are for 03:35 specific resolutions based on the camera 03:38 distance from the object of interest 03:40 here demonstrated with l so each step 03:42 will have its pre-processed pack of 03:45 images to be trained on and further 03:47 improved by the following steps starting 03:49 from far satellite images to more and 03:52 more zoomed in images the model can add 03:54 details and make a better foundation 03:56 over time as shown here they start by 03:59 training the model on l1 their farthest 04:01 view and end up with the ground level 04:03 images always adding to the network and 04:05 fine-tuning the model from the learn 04:07 parameters step to different scales so 04:09 this simple variable l controls the 04:12 level of detail and the rest of the 04:14 model stays the same for each stage 04:16 compared to having a pyramid-like 04:18 architecture for each scale as we 04:20 typically see the rest of the model is 04:22 basically an improved and adapt version 04:24 of nerf for this task you can learn more 04:26 about all the details of the 04:27 implementations in differences with nerf 04:30 in their great paper linked in the 04:31 description below and the code will be 04:33 available soon for you to try it if 04:35 interested and voila this is how they 04:38 enable nerf to be applied to city scale 04:40 scenes with amazing results it has 04:42 incredible industrial potential and i 04:44 hope to see more work in this field soon 04:47 thank you for watching and if you are 04:48 not subscribed please consider clicking 04:50 on the little red button it's free and 04:53 you will learn a lot i promise and i 04:55 will be sharing a couple of special 04:56 videos for the end of the year stay 04:58 tuned 05:00 [Music] References ►Read the full article: ►Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B. and Lin, D., 2021. CityNeRF: Building NeRF at City Scale. ►Project link: ►Code (coming soon): ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/citynerf/ https://arxiv.org/pdf/2112.05504.pdf https://city-super.github.io/citynerf/ https://city-super.github.io/citynerf/ https://www.louisbouchard.ai/newsletter/