6,570 reads

OpenAI GPT: How to Create a YouTube Summary

by Taras DrapalyukApril 7th, 2023

Too Long; Didn't Read

Recently, while watching an interview on YouTube, I came up with the idea of optimizing the process by reading through the transcripts. As an evolution of this idea, I started finding a way to get a summary of the video. After some research, I built a simple app that creates a YouTube video summary using OpenAI GPT model. In this article, I'll show you how you can create your own app like this.

featured image - OpenAI GPT: How to Create a YouTube Summary

Requirements

Before diving into the code, make sure that you have the following:

Python 3 installed on your machine
OpenAI API key (you can get a free trial at https://platform.openai.com/)
Installed Python packages:
1. youtube-transcript-api
2. openai

Let’s write some code

1. Extracting ID from the link

First, let's extract the video ID from the provided YouTube link:

import re


def extract_youtube_video_id(url: str) -> str | None:
    """
    Extract the video ID from the URL
    https://www.youtube.com/watch?v=XXX -> XXX
    https://youtu.be/XXX -> XXX
    """
    found = re.search(r"(?:youtu\.be\/|watch\?v=)([\w-]+)", url)
    if found:
        return found.group(1)
    return None

2. Getting a video transcript

Next, we'll write a function to fetch a video's transcript using youtube-transcript-api library:

from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled


def get_video_transcript(video_id: str) -> str | None:
    """
    Fetch the transcript of the provided YouTube video
    """
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
    except TranscriptsDisabled:
        # The video doesn't have a transcript
        return None

    text = " ".join([line["text"] for line in transcript])
    return text

3. Getting summary with GPT-3

Now that we have the video transcript, we can use OpenAI GPT-3 to generate a summary:

def generate_summary(text: str) -> str:
    """
    Generate a summary of the provided text using OpenAI API
    """
    # Initialize the OpenAI API client
    openai.api_key = os.environ["OPENAI_API_KEY"]

    # Use GPT to generate a summary
    instructions = "Please summarize the provided text"
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": text}
        ],
        temperature=0.2,
        n=1,
        max_tokens=200,
        presence_penalty=0,
        frequency_penalty=0.1,
    )

    # Return the generated summary
    return response.choices[0].message.content.strip()

Let me go through each parameter:

model: The AI model to be used is gpt-3.5-turbo instead of text-davinci-003 because it performs similarly but is 10 times cheaper. When GPT-4 is publicly released, you can easily replace the model with a better one.
messages: This is an array, where each item (message) has a role (either system, user, or assistant) and content (the text message itself). The first message with the role system should contain instructions for the AI to guide it.
temperature: It’s a number between 0 and 2. It uses for sampling. The higher value (0.8 and more) you use, the more random outcome becomes. Otherwise, a lower value (around 0.2) makes it more focused and predictable.
n: Number of chat choices to generate for each input. In this case, we need only one.
max_tokens: The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. Let's use 200 tokens for a more concise summary.
presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

To lightly reduce the number of repetitive samples, use penalty coefficients between 0.1 and 1. If you want to strongly reduce repetition, you can increase the coefficients up to 2, but be aware that this can worsen the samples. Negative values can be used to increase the chance of repetition. More details you can find in the documentation.

Putting It All Together

Finally, let's create a function to get all three steps done and run it:

def summarize_youtube_video(video_url: str) -> str:
    """
    Summarize the provided YouTube video
    """
    # Extract the video ID from the URL
    video_id = extract_youtube_video_id(video_url)

    # Fetch the video transcript
    transcript = get_video_transcript(video_id)

    # If no transcript is found, return an error message
    if not transcript:
        return f"No English transcript found " \
               f"for this video: {video_url}"

    # Generate the summary
    summary = generate_summary(transcript)

    # Return the summary
    return summary


if __name__ == '__main__':
    url = "https://www.youtube.com/watch?v=D1R-jKKp3NA"
    print(summarize_youtube_video(url))

Here is an example of Steve Jobs Stanford Commencement Speech 2005 video summary generated by GPT-3:

Steve Jobs gave a commencement speech at a university where he shared three stories from his life. The first story was about dropping out of college and how it led him to take a calligraphy class, which later influenced the design of the Macintosh computer. The second story was about getting fired from Apple, which allowed him to start over and create successful companies like Pixar. The third story was about his experience with cancer and how it taught him to live each day as if it were his last. He encouraged the graduates to find what they love, not settle, and to have the courage to follow their hearts and intuition. He ended his speech with the message "stay hungry, stay foolish.”

Isn't it awesome? 😎

Conclusion

Using OpenAI GPT can be a great way to generate summaries of YouTube videos. However, this is just one example. This approach can be used for all sources of information that can be represented as text. It saves time and helps you quickly get the key points. Imagine how helpful it could be with articles or conference calls!