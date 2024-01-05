I write a newsletter called Above Average, where I talk about the second-order insights behind everything that is happening in big tech. If you are in tech and don’t want to be average, . subscribe to it A lot of people want podcasts transcribed and read instead of listening to them. We can go one level up and even extract insights from the podcasts as well using Open AI API. Here’s my tweet exchange, which provoked this experiment. So here is what we are going to do. And it will be slightly different from what I suggested in the tweet. The goal is to pick a YouTube video and get a transcription of that video, and then using prompt engineering, we extract insights, ideas, book quotes & summaries, etc., To Summarize, we achieve our goal in three steps: Step 1: Select a YouTube podcast video Step 2: Transcribe the video Step 3: Get Insights from the transcription Step 1: Select a YouTube podcast video A recent podcast conversation that broke YouTube was Jeff Bezos on Lex Friedman's podcast. So, for this exercise, I will pick this . video Step 2: Transcribe the video I used langchain along with Open AI’s audio-to-text model whisper to transcribe the youtube video. As usual you would need your OpenAI secret key to use the following script. YouTubeAudioLoader import os\nimport sys\nimport openai\n\nfrom dotenv import load_dotenv, find_dotenv\n_ = load_dotenv(find_dotenv()) # read local .env file\nopenai.api_key = os.environ['OPENAI_API_KEY']\n\n\n## youtube video's audio loader - langchain \nfrom langchain_community.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader\nfrom langchain_community.document_loaders.generic import GenericLoader\nfrom langchain_community.document_loaders.parsers import OpenAIWhisperParser #, OpenAIWhisperParserLocal\n\nurl="https://www.youtube.com/watch?v=DcWqzZ3I2cY&ab_channel=LexFridman"\nsave_dir="outputs/youtube/"\nloader = GenericLoader(\n YoutubeAudioLoader([url],save_dir),\n OpenAIWhisperParser()\n )\ndocs = loader.load()\nprint(docs[0].page_content[0:500])\n\n# Specify the file path where you want to save the text\nfile_path = "audio-transcript.txt"\ntry:\n with open(file_path, 'a', encoding='utf-8') as file:\n for doc in docs:\n file.write(doc.page_content)\n print(f'Large text saved to {file_path}')\nexcept FileNotFoundError:\n print(f"Error: Input file '{file_path}' not found.")\nexcept Exception as e:\n print(f"An error occurred: {e}") You might see the following error while running this script, and I pasted the solution that works in case you are using a Windows system. Postprocessing: and not found. Please install or provide the path using –ffmpeg-location. ERROR: ffprobe ffmpeg Running this script will generate the transcript and store it in a text file audio-transcript.txt. Step 3: Extract insights from the conversation To extract insights, I am using Open AI API, and here is the script. The code loads the transcript text and passes it along with a prompt designed to extract insights, people & books. To get more interesting things out of this conversation, you can come up with a more interesting prompt. Note that the file name is slightly different because I had to cut the transcript to a short length since my completion query to Open AI API exceeded my TPM limits. import os\nimport sys\nimport openai\nimport shutil\nfrom pprint import pprint\n\nfrom dotenv import load_dotenv, find_dotenv\n_ = load_dotenv(find_dotenv()) # read local .env file\nopenai.api_key = os.environ['OPENAI_API_KEY']\n\nclient = openai.OpenAI()\n\nfile_path = "audio-transcript-copy.txt"\ntry:\n with open(file_path, 'r', encoding='utf-8') as file:\n long_text = file.read()\n print(f'{file_path} is rad')\nexcept FileNotFoundError:\n print(f"Error: Input file '{file_path}' not found.")\nexcept Exception as e:\n print(f"An error occurred: {e}")\n\nprompt2 = f"""\nYou will be provided with text deilimited by triple quotes.\nThe given text is a podcast transcript.\n\nProvide the host and guest name.\nSummarize the transcript in to 10 points.\n\nIf there are any people referred in the transcript. Extract the people mentioned and list them along with some info about them in the following format\n1. Person 1's Name: Person 1's profession or what he or she is known for or the context in which he or she was referred to.\n2. Person 2's Name: Person 2's profession or what he or she is known for or the context in which he or she was referred to.\n...\n2. Person N's Name: Person N's profession or what he or she is known for or the context in which he or she was referred to.\nIf the transcript doesnt contain refereces to any people then simply write \\"No people referred to in the conversation.\\"\n\nExtract the books mentioned and list them in the following format.\n1. Book 1's Title: Context in which the book was referred to.\n2. Book 2's Title: Context in which the book was referred to.\n...\nN. Book N's Title: Context in which the book was referred to.\nIf the transcript doesnt contain refereces to any books then simply write \\"No books referred to in the conversation.\\"\n\nIF you find any inspiration quotoes complie them in to a list.\n\n\\"\\"\\"{long_text}\\"\\"\\"\n"""\n\nresponse = client.chat.completions.create(\n model="gpt-4",\n messages=[\n {\n "role": "user",\n "content": prompt2\n }\n ],\n temperature=0.7,\n #max_tokens=64,\n #top_p=1\n)\n\nprint(response.choices[0].message.content) Here is what the output I got: Result: Lex Freidman & Jeff Bezos Podcast Summary Can provide a service to podcasters to generate smart transcripts with insights. This would be a B2B play. AI PRODUCT IDEA ALERT 1: Instead of a service to podcast creators, it could be B2C customers who listen to podcasts and want to read through podcasts and create their own library of insights. Result: Lex Freidman & Jeff Bezos Podcast Summary Can provide a service to podcasters to generate smart transcripts with insights. This would be a B2B play. AI PRODUCT IDEA ALERT 1: Instead of a service to podcast creators, it could be B2C customers who listen to podcasts and want to read through podcasts and create their own library of insights. AI PRODUCT IDEA ALERT 2: Expect both these ideas to be used by existing podcast hosting companies like Spotify & launch these ideas as new features. If I was a Product Manager in any such companies I would be pitching them by now. That's it for day 5 of 100 Days of AI.