Using OpenAI's Whisper and GPT-3 API to Build and Deploy a Transcriber App

Hello and welcome back.

This is part 2 of the tutorial for building a transcriber app using OpenAI’s Whisper and the GPT-3 API.

In part 1, we started the journey by looking up what elements are necessary for the transcriber app and how to get them.

We chose to use podcast audios for the transcription and uncovered the 2 ways to develop the app, either locally or from the Colab environment.

We saw how to use OpenAI Whisper and the GPT-3 API to get the transcription from a podcast episode. And also, how to get the summary of an episode with working Python code that we can use locally or upload to a cloud service.

In this second part, we’ll pick up from where we left off and go through the remaining steps on the road ahead.

Alright, let’s go.

Act 3: The Deployment (Whisper transcriber)

In this final act, we want to take the Python function that creates the transcription and

host it online.

This is so that others can interact with the transcriber app and see the results.

You can stay with your working prototype locally but you won’t be able to show it to others and no one will know that it works or how it works.

That’s why we’ll see how to upload it to the cloud so that we can use it from anywhere, even if our main computer goes offline.

The standard process here is to take the function and upload it to the cloud so that it can accept network requests (like from the front end of an app).

For this, there are several options.

You can use any of the traditional cloud providers like AWS, or GCP. There’s also an alternative to use services that make cloud deployments easier like Modal or Begin.

Here’s where it can get tricky because of the many alternatives.

And for the next step, which is building and deploying a frontend, there are even more alternatives.

But fear not, tech adventurer, we’ll bring this journey to a conclusion with the help of a service called Streamlit.

Streamlit is an open-source service that lets you build web apps from Python scripts and is mostly used for apps that involve data visualization and manipulation.

If you’re developing locally, you can use it to build graphics and charts from data that you have on your PC.

It’s as simple to use as running the following commands in a terminal.

pip install streamlit
streamlit hello

That’s it.

But that’s not the reason I’m telling you about it.

What’s interesting about Streamlit for our use case, is that it has a feature called “Streamlit Cloud” which is used for… you guessed it… uploading scripts to the cloud.

Oneq of the cool things about it is that you can use it to turn your Python code into a fully functioning app without having to worry about:

Encapsulating the code in a Docker container.
Uploading it to a traditional cloud provider.
Provisioning a virtual machine.
Updating package dependencies.
Installing the necessary software on the VM.
Creating an instance for the backend and for the frontend.
Managing the cloud infrastructure.
Going through logs to figure out what went wrong.

And many more….

We’re not trying to get fancy here.

We just want to get this transcriber function running online so others can interact with it and see the results.

Instead of turning it into a case of “it works in my machine”.

And for that reason, we are simplifying the process as much as possible.

Here’s how to get that Python script uploaded to the Streamlit Cloud.

There are 2 ways you can go about that.

Create a repository in GitHub to upload the code along with its dependencies. And then create the Streamlit account to deploy an app from that repo.
Create the Streamlit account by connecting with GitHub and then adding the code using GitHub Codespaces (a dev environment online).

Choose the one you think is more convenient. And if you need help creating the account, you can check out the docs.

Now, here are the files we need to have in the repository for the app to work.

First, create a simple text file called “requirements.txt” which will have these 2 entries.

streamlit
openai-whisper

Next, another text file called “packages.txt”. This one is for a dependency that Whisper has ‘ffmpeg’.

ffmpeg

And lastly, the actual code.

You can call this file whatever you want, I called it “audio-transcriber.py”

import streamlit as st
import whisper 

st.title("Podcast Transcriber")

st.sidebar.header("Transcription section")

audio_file = st.file_uploader("Upload audio", type=["wav", "mp3", "m4a"])

process_button = st.sidebar.button("Transcribe Episode")
st.sidebar.markdown("**Note**: Processing can take up to 5 mins, please be patient.")

model = whisper.load_model("base")
st.text("Whisper Model Loaded")

if process_button:
    if audio_file is not None:
        st.sidebar.info("Transcribing audio...")        
        result = model.transcribe(audio_file.name)
        
        st.sidebar.success("Transcription Complete")
        st.markdown(result['text'])
    else:
        st.sidebar.error("Please upload an episode.")

With that code, we’ll now have a web app that looks like this:

Let’s look at the code more closely.

After the imports, we use the Streamlit syntax to create a title and header for the web app.

The next line is an important one

audio_file = st.file_uploader("Upload audio", type=["wav", "mp3", "m4a"])

With this line, we create a file uploader component that it’s going to receive the audio files for the transcription.

And if you’re thinking, “an entire file uploader with only one line of code?”

Yeah, building that with HTML, CSS, and JavaScript is a whole tutorial in itself.

That’s the power of Streamlit.

Now, let’s keep going.

The following 2 lines of code allow us to create a button that will trigger the transcription process. And then a note for the user to be patient.

After that is a familiar piece of code.

model = whisper.load_model(“base”)

And then another message letting the user know the model is now loaded and ready to use.

If we keep going we’ll find some conditional logic.

What that does is to check if the “process_button” was clicked. And if it was then it checks if there is a file uploaded.

If the check passes, we now have the main part of the code.

  st.sidebar.info("Transcribing audio...")        
  result = model.transcribe(audio_file.name)
        
  st.sidebar.success("Transcription Complete")
  st.markdown(result['text'])

What that does is print out a message that the transcription is in progress. Then, the transcription process takes place. And finally, we get a completion message with the results of the transcription.

The rest of the code will only run in the case that the user did not upload a file when clicking the transcription button.

And there you have it.

Both the backend and frontend of the transcriber app have been created thanks to Streamlit.

It wasn’t that difficult, was it?

As I said before, this is the easiest and most straightforward route to get it working online in a way that we can share with friends and family.

But if you do share it, make sure to tell them not to use it with lengthy audio files. They might keep waiting forever or break the app.

Conclusion

Now you have your own transcriber app working online, you can keep improving it further if you want.

A good next step would be to use a Whisper replacement so that longer audio files can get transcribed without taking too long. There are a couple of options out there like the “Faster Whisper” project or “Whisper Jax”.

You can also import the ‘openai’ module and save your API key as part of the “secrets” of the application. And with that, you can make calls to the GPT-3.5 API for summarization, highlight extraction, and more.

This might be the end of our journey together but it can be the beginning of you using all these tools to create your own AI-powered apps like this podcast transcriber.

That’s it for now. Thanks for coming along for the ride.

Don’t forget to subscribe to me on Hackernoon so that you don’t miss the upcoming articles.

Using OpenAI's Whisper and GPT-3 API to Build and Deploy a Transcriber App – Part 2

Act 3: The Deployment (Whisper transcriber)

Conclusion