In this digital age where virtual conferences are a dime a dozen, we see a large number of them recorded for future records. There are many uses for these records, including sharing with people who were unable to attend live, distributing for use as training, and keeping backups for future reference. One aspect of these recordings that is taken for granted, however, is accessibility. In this blog, we will demonstrate how to take recordings from your Communications conferences, and use to transcribe them to text. Dolby.io Deepgram Having text copies of your conference recordings is a good way to offer alternative ways to digest the information. Some people read faster than they listen to spoken words. Some people might not speak the same first language as the one in the conference, and are more comfortable reading it. Others might be hearing impaired, and prefer to read for the most amount of comfort. Whatever reason one might have, we want to make it simple to automate the transcription generation process. Here, we will be using the in tandem with Deepgram’s in Python as an example of how to generate this process, where ideally this could be recreated with the tools of your choosing. Dolby.io Communications REST APIs Pre-recorded Audio API Installing Libraries Before we begin coding, we need to ensure we have all the proper libraries for calling these APIs. We can do this with a simple pip command (use the appropriate pip command for your operating system): pip3 install asyncio deepgram-sdk dolbyio-rest-apis This will install both the and Deepgram SDKs, as well as Python’s native asynchronous function library to aid us in calling the async requests the two SDKs use. Dolby.io It is also a good idea to sign up for a free and Deepgram account if you haven’t already, to get your API credentials. Dolby.io Obtaining an API Token In order to use the Dolby.io Communications REST APIs, we need to first generate a temporary access token. This is to help prevent your permanent account credentials from being accidentally leaked, as the token will expire automatically. To learn more about this, read the . documentation In this case, we want to fill in the consumer key and secret with our from our (not Media). We then call the endpoint within a function so we can generate a fresh token every time we make another call. This is not the most secure way to handle this, but will ensure we don’t run into any expired credentials down the road. To learn more, see . credentials Communications APIs get_api_access_token the Dolby.io security best practices guide from dolbyio_rest_apis.communications import authentication import asyncio # Input your Dolby.io Communications Credentials here CONSUMER_KEY = "<DOLBYIO_CONSUMER_KEY>" CONSUMER_SECRET = "<DOLBYIO_CONSUMER_SECRET>" # Create a function that will generate a new api access token when needed async def gen_token(): response = await authentication.get_api_access_token(CONSUMER_KEY, CONSUMER_SECRET) return response['access_token'] print(f"Access Token: {await gen_token()}") Getting the Conference ID Now that we can call the APIs, we first want to get the internal conference ID of the recording we want to transcribe. We can do this by simply calling the endpoint with our token. Dolby.io get_conferences from dolbyio_rest_apis.communications.monitor import conferences response = await conferences.get_conferences(await gen_token()) # Save the most recent conference. Change '-1' to whichever conference you want. confId = response['conferences'][-1]['confId'] print(confId) Note that in this code sample, we are using the parameter: . This will pull only the most recent conference in the list as noted by the "-1" array value. If you are automating this to work with every newly generated conference, this likely will not be an issue. However if you are looking to do this with a specific conference, we suggest using to obtain the desired conference ID. ['conferences'][-1]['confId'] the optional parameters in the get_conferences endpoint Obtaining the Recording With the conference ID in hand, we can now call an endpoint to generate a URL that contains the audio file of our conference. For this code sample, we are using a conference, so we will use the endpoint to . If you know you are not using Dolby Voice, you can use instead. Note that we are only obtaining the audio track of the conference instead of both the audio and the video. This is for maximum file compatibility with the transcription software. Note that the URL produced is also temporary, and will expire after some time. Dolby Voice Get the Dolby Voice audio recording this endpoint from dolbyio_rest_apis.communications.monitor import recordings # Save only the mp3 file and return as a URL. # If your conference does not use Dolby Voice, use 'download_mp3_recording' instead. # https://github.com/dolbyio-samples/dolbyio-rest-apis-client-python/blob/main/client/src/dolbyio_rest_apis/communications/monitor/recordings.py response = await recordings.get_dolby_voice_recordings(await gen_token(), confId) recording_url = response['url'] print(recording_url) To help illustrate, here is an example conference recording made for transcription generated from the above code: Link Transcoding it with Deepgram While Deepgram does work with local files, the presigned recording url saves us many steps avoiding the hassle of needing to download and upload a file to a secure server. With the URL, we can skip those steps and directly insert the URL into the code below adapted from their . The code provided only uses the , but could easily expanded with an assortment of Deepgram provides. Python Getting Started Guide Punctuation feature the many features from deepgram import Deepgram # Your Deepgram API Key DEEPGRAM_API_KEY = '<DEEPGRAM_API_KEY>' # Location of the file you want to transcribe. Should include filename and extension. FILE = recording_url async def main(): # Initialize the Deepgram SDK deepgram = Deepgram(DEEPGRAM_API_KEY) # file is remote # Set the source source = { 'url': FILE } # Send the audio to Deepgram and get the response response = await asyncio.create_task( deepgram.transcription.prerecorded( source, { 'punctuate': True } ) ) # Write only the transcript to the console print(response['results']['channels'][0]['alternatives'][0]['transcript']) try: await main() # If not running in a Jupyter notebook, run main with this line instead: # asyncio.run(main()) except Exception as e: exception_type, exception_object, exception_traceback = sys.exc_info() line_number = exception_traceback.tb_lineno print(f'line {line_number}: {exception_type} - {e}') The Deepgram response provides many datapoints related to our speech, but to pull only the transcription of the file, we are calling . Feel free to modify the response to generate whatever is most relevant to your needs. For the above sample provided, the result of the transcription is as follows: ['results']['channels'][0]['alternatives'][0]['transcript'] Following text is a transcription of the s en of the parchment declaration of independence. The document on display in the rot the national archives Museum. The spelling and punctuation reflects the originals. Next Steps This is a very basic foray in how to get started with transcribing your conference recordings. We heavily suggest you invest some time into expanding this to fit your specific use case to maximize the benefit you get from using these tools. As mentioned before, we suggest taking a look at what Deepgram has to offer in terms of additional features you could add on to the transcription process. For example: can help differentiate who is saying what when there are multiple people in a conference. Diarization and/or to help increase accuracy by providing prior information of things like names and proper nouns. Named Entity Recognition Keywords The transcription of the example recording was not perfect. There are many reasons for this, including imperfect recording environments, confusing speech patterns, and compression as examples. To help give the transcription algorithms a better chance, one option could be to use the Dolby.io to attempt to clean up the audio before sending it to transcription. Media Enhance API If you want to automatically generate a transcription after every recording is over, we can take advantage of to remove the manual intervention for you. In fact, the event provides the recording URL within the event body itself, reducing the number of steps needed to obtain it. webhooks Recording.Audio.Available One final idea is if you do only have the video file ready for whatever reason, you can use the Dolby.io to convert the video file into a format accepted by the transcription service. Media Transcode API You can find the source code file stored in a notebook at . Good luck coding! Jupyter this GitHub repository Also published . here