Hello friends! Let’s do something new today (well it’s not new to be honest). Where did it all start? I was trying to download videos from Reddit and found that the browser extensions available were either paid or not working. But, thankfully there were some websites that did work just great. I tried that for quite a while until I found it boring to open the website every once and then I felt a need to download something. So, I thought why not create a for the same which could be easily passed on the link and have the file downloaded. Python script Before we Start… Reddit has a different way of storing videos so as to make it harder to download (but we will anyways). It stores the video without the audio at one place and the audio at another URL and when we use the Reddit player it loads and plays both of these simultaneously. So, we will download both of these and stitch them with . ffmpeg The audio URL just replaces the quality factor with ‘audio’ in the video URL. Let’s say the video URL is : Let’s act now… Libraries that we need to install for system calls to run ffmpeg subprocess : to work with command-line arguments sys : for web scraping bs4 (aka BeautifulSoup) : for making HTTP requests requests : to parse json data json : to work with media files it should be installed on the machine ffmpeg : Kicking off… # imports import subprocess
import json
from bs4 import BeautifulSoup
import requests
import sys # getting a response using the URL url = sys.argv[1]    # gets the url passed in the command-line
headers = {'User-Agent':'Mozilla/5.0'}
response = requests.get(url,headers = headers) # finding the post id for the Reddit post post_id = url[url.find('comments/') + 9:]
post_id = f"t3_{post_id[:post_id.find('/')]}" What does the response have..? The response is the whole HTML file written for that particular Reddit page. But as Reddit is a dynamic website most of the HTML we see is generated using JavaScript and so are the media files. Therefore, to find the links to the media files we’ll have to find the script tag with the data. I googled it and found that a file can simply be obtained from a Reddit link appending at the end of each Reddit link and the video URL’s could be easily grabbed from there. But, I decided to dig into the original HTML code and find the script tag with the data.  And I found it. It was in a tag with the attribute set to . Let’s find extract that using JSON .json script id ‘data’ BeautifulSoup. # processing the response to find the data if(response.status_code == 200):    # checking if the server responded with OK
  soup = BeautifulSoup(response.text,'lxml')
  # I looked up the original code of the reddit page 
  # to find where all the data was and it was in a script tag
  # with the id set to 'data'
  required_js = soup.find('script',id='data') 
  
  json_data = json.loads(required_js.text.replace('window.___r = ','')[:-1])
  # 'window.___r = ' and a semicolon at the end of the text were removed
  # to get the data as json
  title = json_data['posts']['models'][post_id]['title']
  title = title.replace(' ','_')
  dash_url = json_data['posts']['models'][post_id]['media']['dashUrl']
  height  = json_data['posts']['models'][post_id]['media']['height']
  dash_url = dash_url[:int(dash_url.find('DASH')) + 4]
  # the dash URL is the main URL we need to search for
  # height is used to find the best quality of video available
  video_url = f'{dash_url}_{height}.mp4'    # this URL will be used to download the video
  audio_url = f'{dash_url}_audio.mp4'    # this URL will be used to download the audio part # downloading the video and audio files with open(f'{title}_video.mp4','wb') as file:
    print('Downloading Video...',end='',flush = True)
    response = requests.get(video_url,headers=headers)
    if(response.status_code == 200):
        file.write(response.content)
        print('\rVideo Downloaded...!')
    else:
        print('\rVideo Download Failed..!')

with open(f'{title}_audio.mp3','wb') as file:
    print('Downloading Audio...',end = '',flush = True)
    response = requests.get(audio_url,headers=headers)
    if(response.status_code == 200):
        file.write(response.content)
        print('\rAudio Downloaded...!')
    else:
        print('\rAudio Download Failed..!') # using ffmpeg to stitch the video and audio into one subprocess.call(['ffmpeg','-i',f'{title}_video.mp4','-i',f'{title}_audio.mp3','-map','0:v','-map','1:a','-c:v','copy',f'{title}.mp4'])
subprocess.call(['rm',f'{title}_video.mp4',f'{title}_audio.mp3']) Finally! We have our video downloaded successfully and the other downloaded files too are trashed. I didn’t explain a lot of things in detail but I hope this article does really get you interested towards learning . I’ll also recommend learning the tool. Wish you a happy coding journey! 🙂🙂🙂 Web Scraping ffmpeg

Dash

Mozilla

2022 - HackerNoon Contributor of the Year - Mathematics

2022 - Pythonistas Paradise

Nominated for 2022 - Pythonistas Paradise

Nominated for 2022 - HackerNoon Contributor of the Year - Mathematics

Too Long; Didn't Read

In your car, at home, or at work — Bosch technology shapes many areas of life.

How to Write a Python Script to Download Reddit videos

Too Long; Didn't Read

Companies Mentioned

Coin Mentioned

Ajay Singh Rana

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

How to Write a Python Script to Download Reddit videos

Too Long; Didn't Read

Companies Mentioned

Coin Mentioned

Ajay Singh Rana

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES