How to Fetch Statistics From YouTube API Using Python by@physboy

How to Fetch Statistics From YouTube API Using Python

In this article, we’ll learn how to use the YouTube API to get the full playlists of any channel (not only your own) and get all videos from a playlist. The API does not provide accurate statistics, even when updating one channel on the browser, changes in the values in the browser will be noticeable in the user’s browser. We’ve got the channel statistics and the channel number in the URL after “channel/” as follows:  https://www.youtube.com/channel/**=UCYvmuw-JtVrTZQ-7Y4kd63Q=**
image
Nikita Vasilev HackerNoon profile picture

Nikita Vasilev

Data engineer, python tutor

linkedin social icon

Credibility


In this article you’ll learn how tо:

  • get token for YouTube API
  • get the full playlists of any channel (not only your own)
  • and get all videos from a playlist


  1. Make sure you’re logged into google.
  2. Navigate to https://cloud.google.com/
  3. Chose “Console” in the profile header


image

4. To create a new project: click “New Project”.  Otherwise, skip to step 7.

image


5. Enter your project name and click on the “create” button


image

6. Choose your new project


image
7.  Click to the API & Services

image

8. Click “ENABLE APIS AND SERVICES”


image


9. Click “YouTube Data API”


image


10. Click “Enable”


image


11. Click “Credentials” in the left column then click “CREATE CREDENTIALS” on the top row


image


12. To reveal your “API Key” Click“API key”


image


13. Your “API Key” will appear as follows (please, do not share your key):


image


Now that we have the “API key”, we need to get the “channel_id”. To get the channel_id, navigate to the video then click on the channel name, which is to the right of the avatar just above the number of subscribers.


Highlight and copy everything in the URL after “channel/” as follows:  https://www.youtube.com/channel/UCYvmuw-JtVrTZQ-7Y4kd63Q


Let’s import libraries:


import pandas as pd
import requests
import json


Now, let’s get the channel statistics.


api_key = 'YOUR_API_KEY'
channel_id = 'UCYvmuw-JtVrTZQ-7Y4kd63Q'


def get_channel_stat() -> dict:
    url = f'https://www.googleapis.com/youtube/v3/channels?part=statistics&id={channel_id}&key={api_key}'

    r = requests.get(url)
    data = r.json()
    return data['items'][0]['statistics']


stats = get_channel_stat()
pd.DataFrame([stats])


image

While page statistics have been collected, it is important to clarify that these results are approximations: API YouTube does not provide accurate statistics.


Even when updating one channel on YouTube in the browser,  sharp changes in the values will be noticeable: the number of subscribers, the number of views, and even the number of videos will change with the update.


If detailed statistics (e.g., the number of views for each video)how to get the full list of the “video_id” must first be decided. The best way that has been found so far is to get a stable video value on each request from “playlists”.


Collecting a list of videos is also possible via a

“GET request https://www.googleapis.com/youtube/v3/videos”


(To read more click the following: https://developers.google.com/youtube/v3/docs/videos/list)


But, for me, it didn't work for many reasons; one of which is that this query always returns a different number of IDs.


Let's take a look at how to get a list of playlists first.


(To read more click the following: https://developers.google.com/youtube/v3/docs/playlists/list)

In these requests, we will use pagination. Every request that contains the “nextPageToken” value in json has content on the next page, which we’ll request until the “nextPageToken” stops returning.


part=snippet,id, this parameter specifies which part of json that will be returned.


def get_pl_data_per_page(npt: str=None, limit: int=None) -> dict:

    url = f"https://www.googleapis.com/youtube/v3/playlists?key={api_key}&channelId={channel_id}&part=snippet,id&order=date"
    if limit is not None:
        url = f"{url}&maxResults={limit}"
    if npt is not None:
        url = f"{url}&&pageToken={npt}"

    r = requests.get(url)
    return r.json()


def get_playlists_data(limit: int=50):
    res = []
    nextPageToken = None
    while True:
        data = get_pl_data_per_page(nextPageToken)
        nextPageToken = data.get("nextPageToken", None)
        res.append(data)
        
        if nextPageToken is None: break
    return res


data = get_playlists_data()
df = pd.DataFrame(data)
p_ids = set(df.explode('items')["items"].apply(pd.Series)["id"].to_list())
p_ids


image

Now that we have the list of playlists, let's get all the videos from a playlist.


limit = 50
plid = 'PLz68cPKdtIOzuoKHzMcQoO7WaQHYm5J3Y'
url = f'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults={limit}\
        &playlistId={plid}&key={api_key}'
json_url = requests.get(url)
data = json.loads(json_url.text)


To collect the Data Frame, follow JSON as below:


to_append = []
for item in data['items']:
        playlistId = item['snippet']['playlistId']
        videoOwnerChannelTitle = item['snippet']['videoOwnerChannelTitle']
        position = item['snippet']['position']
        videoId = item['snippet']['resourceId']['videoId']
        to_append.append([playlistId, videoOwnerChannelTitle, position, videoId])

cols = ['playlist_id', 'videoOwnerChannelTitle', 'position', 'videoId']
pd.DataFrame(to_append, columns=cols)


image

In the next article, getting statistics for each video in a channel will be analyzed.

react to story with heart
react to story with light
react to story with boat
react to story with money

Related Stories

L O A D I N G
. . . comments & more!