Are you looking to create engaging faceless short videos for platforms like YouTube or TikTok but want to avoid the hassle of complex video editing? This article will walk you through how to automate the entire process using OpenAI, ElevenLabs, and MoviePy. By the end of this tutorial, you'll know how to automatically generate visuals and voiceovers for short videos based on any script. Whether you’re creating educational content, storytelling, or meme videos, this workflow will save you tons of time. Prerequisites Before getting started, you’ll need: API keys for both OpenAI (for generating visuals) and ElevenLabs (for voiceovers). Basic Python knowledge. MoviePy and other required Python libraries installed (moviepy, openai, elevenlabs, etc.). Step 1: Setting Up API Keys import openai from elevenlabs import ElevenLabs # Set up your OpenAI and ElevenLabs API keys openai.api_key = "your_openai_api_key" elevenlabs_client = ElevenLabs(api_key="your_elevenlabs_api_key") Start by getting API keys from OpenAI and ElevenLabs. Replace the placeholders in the code with your actual API keys. Step 2: Preparing the Script Your video starts with a story or script. You can replace the story_script variable with the text you want to turn into a video. Here’s an example script about Dogecoin: story_script = """ Dogecoin began as a joke in 2013, inspired by the popular 'Doge' meme featuring a Shiba Inu dog. It unexpectedly gained a massive following thanks to its community's charitable initiatives, eventually evolving into a legitimate cryptocurrency with support from Elon Musk. """ The script will be split into sentences to match each visual and audio segment. Step 3: Generating Images with OpenAI’s DALL-E For each sentence, we generate a corresponding image using OpenAI’s DALL-E model. def generate_image_from_text(sentence, context, idx): prompt = f"Generate an image without any text that describes: {sentence}. Context: {context}" response = openai.images.generate( model="dall-e-3", prompt=prompt, size="1024x1792", response_format="b64_json" ) image_filename = f"images/image_{idx}.jpg" with open(image_filename, "wb") as f: f.write(base64.b64decode(response.data[0].b64_json)) return image_filename This function sends each sentence to DALL-E and saves the generated image. We ensure the generated visuals match the video's theme. Step 4: Generating Voiceovers with ElevenLabs Once we have the visuals, we need voiceovers. ElevenLabs converts each sentence into speech. def generate_audio_from_text(sentence, idx): audio = elevenlabs_client.text_to_speech.convert( voice_id="pqHfZKP75CvOlQylNhV4", model_id="eleven_multilingual_v2", text=sentence, voice_settings=VoiceSettings(stability=0.2, similarity_boost=0.8) ) audio_filename = f"audio/audio_{idx}.mp3" with open(audio_filename, "wb") as f: for chunk in audio: f.write(chunk) return audio_filename This function generates an audio file for each sentence. You can select different voice settings to customize the narration style. Step 5: Combining Audio and Video Next, we pair each image with its corresponding voiceover using MoviePy: from moviepy.editor import ImageClip, AudioFileClip image_clip = ImageClip(image_path, duration=audio_clip.duration) image_clip = image_clip.set_audio(audio_clip) video_clips.append(image_clip.set_fps(30)) Each image is displayed for the duration of its audio clip, ensuring synchronization. Step 6: Applying Video Effects To make the video more dynamic, we apply zoom and fade effects to each image. For example, the apply_zoom_in_center effect slowly zooms into the center of the image: def apply_zoom_in_center(image_clip, duration): return image_clip.resize(lambda t: 1 + 0.04 * t) Other effects include zooming in from the upper part or zooming out. These effects are applied randomly to each clip to keep the video visually engaging. Step 7: Final Video Assembly We combine all video clips into one seamless video and add background music: final_video = concatenate_videoclips(video_clips, method="compose") final_video.write_videofile(output_video_path, codec="libx264", audio_codec="aac", fps=30) Step 8: Adding Captions Captions improve video accessibility and engagement. We use Captacity to automatically add captions based on the audio. captacity.add_captions( video_file=output_video_path, output_file="captioned_video.mp4", font_size=130, font_color="yellow", stroke_width=3 ) Step 9: Adding Background Music To finish the video, background music is added. The volume is reduced so that it doesn't overpower the narration. background_music = AudioFileClip(music_filename).subclip(0, final_video.duration).volumex(0.2) narration_audio = final_video.audio.volumex(1.5) combined_audio = CompositeAudioClip([narration_audio, background_music]) final_video.set_audio(combined_audio) Conclusion The GitHub repository for this project is available here. By using OpenAI and ElevenLabs, we’ve automated the creation of faceless videos from text. You can now quickly generate YouTube Shorts or TikToks without needing a camera or microphone. This automated process has allowed us to create a Faceless Shorts Video service on our Robopost software, offering content creators a seamless way to produce high-quality videos. Whether you are creating educational videos, short stories, or even meme-style content, this service handles everything from visuals to voiceovers with minimal effort. Now, you can focus on creativity and storytelling while Robopost handles the heavy lifting of video production. Happy creating! Are you looking to create engaging faceless short videos for platforms like YouTube or TikTok but want to avoid the hassle of complex video editing? This article will walk you through how to automate the entire process using OpenAI , ElevenLabs , and MoviePy . OpenAI ElevenLabs MoviePy By the end of this tutorial, you'll know how to automatically generate visuals and voiceovers for short videos based on any script. Whether you’re creating educational content, storytelling, or meme videos, this workflow will save you tons of time. Prerequisites Before getting started, you’ll need: API keys for both OpenAI (for generating visuals) and ElevenLabs (for voiceovers). Basic Python knowledge. MoviePy and other required Python libraries installed (moviepy, openai, elevenlabs, etc.). API keys for both OpenAI (for generating visuals) and ElevenLabs (for voiceovers). API keys Basic Python knowledge. MoviePy and other required Python libraries installed ( moviepy , openai , elevenlabs , etc.). moviepy openai elevenlabs Step 1: Setting Up API Keys import openai from elevenlabs import ElevenLabs # Set up your OpenAI and ElevenLabs API keys openai.api_key = "your_openai_api_key" elevenlabs_client = ElevenLabs(api_key="your_elevenlabs_api_key") import openai from elevenlabs import ElevenLabs # Set up your OpenAI and ElevenLabs API keys openai.api_key = "your_openai_api_key" elevenlabs_client = ElevenLabs(api_key="your_elevenlabs_api_key") Start by getting API keys from OpenAI and ElevenLabs . Replace the placeholders in the code with your actual API keys. OpenAI ElevenLabs Step 2: Preparing the Script Your video starts with a story or script. You can replace the story_script variable with the text you want to turn into a video. Here’s an example script about Dogecoin : story_script Dogecoin story_script = """ Dogecoin began as a joke in 2013, inspired by the popular 'Doge' meme featuring a Shiba Inu dog. It unexpectedly gained a massive following thanks to its community's charitable initiatives, eventually evolving into a legitimate cryptocurrency with support from Elon Musk. """ story_script = """ Dogecoin began as a joke in 2013, inspired by the popular 'Doge' meme featuring a Shiba Inu dog. It unexpectedly gained a massive following thanks to its community's charitable initiatives, eventually evolving into a legitimate cryptocurrency with support from Elon Musk. """ The script will be split into sentences to match each visual and audio segment. Step 3: Generating Images with OpenAI’s DALL-E For each sentence, we generate a corresponding image using OpenAI’s DALL-E model. def generate_image_from_text(sentence, context, idx): prompt = f"Generate an image without any text that describes: {sentence}. Context: {context}" response = openai.images.generate( model="dall-e-3", prompt=prompt, size="1024x1792", response_format="b64_json" ) image_filename = f"images/image_{idx}.jpg" with open(image_filename, "wb") as f: f.write(base64.b64decode(response.data[0].b64_json)) return image_filename def generate_image_from_text(sentence, context, idx): prompt = f"Generate an image without any text that describes: {sentence}. Context: {context}" response = openai.images.generate( model="dall-e-3", prompt=prompt, size="1024x1792", response_format="b64_json" ) image_filename = f"images/image_{idx}.jpg" with open(image_filename, "wb") as f: f.write(base64.b64decode(response.data[0].b64_json)) return image_filename This function sends each sentence to DALL-E and saves the generated image. We ensure the generated visuals match the video's theme. Step 4: Generating Voiceovers with ElevenLabs Once we have the visuals, we need voiceovers. ElevenLabs converts each sentence into speech. def generate_audio_from_text(sentence, idx): audio = elevenlabs_client.text_to_speech.convert( voice_id="pqHfZKP75CvOlQylNhV4", model_id="eleven_multilingual_v2", text=sentence, voice_settings=VoiceSettings(stability=0.2, similarity_boost=0.8) ) audio_filename = f"audio/audio_{idx}.mp3" with open(audio_filename, "wb") as f: for chunk in audio: f.write(chunk) return audio_filename def generate_audio_from_text(sentence, idx): audio = elevenlabs_client.text_to_speech.convert( voice_id="pqHfZKP75CvOlQylNhV4", model_id="eleven_multilingual_v2", text=sentence, voice_settings=VoiceSettings(stability=0.2, similarity_boost=0.8) ) audio_filename = f"audio/audio_{idx}.mp3" with open(audio_filename, "wb") as f: for chunk in audio: f.write(chunk) return audio_filename This function generates an audio file for each sentence. You can select different voice settings to customize the narration style. Step 5: Combining Audio and Video Next, we pair each image with its corresponding voiceover using MoviePy : MoviePy from moviepy.editor import ImageClip, AudioFileClip image_clip = ImageClip(image_path, duration=audio_clip.duration) image_clip = image_clip.set_audio(audio_clip) video_clips.append(image_clip.set_fps(30)) from moviepy.editor import ImageClip, AudioFileClip image_clip = ImageClip(image_path, duration=audio_clip.duration) image_clip = image_clip.set_audio(audio_clip) video_clips.append(image_clip.set_fps(30)) Each image is displayed for the duration of its audio clip, ensuring synchronization. Step 6: Applying Video Effects To make the video more dynamic, we apply zoom and fade effects to each image. For example, the apply_zoom_in_center effect slowly zooms into the center of the image: apply_zoom_in_center def apply_zoom_in_center(image_clip, duration): return image_clip.resize(lambda t: 1 + 0.04 * t) def apply_zoom_in_center(image_clip, duration): return image_clip.resize(lambda t: 1 + 0.04 * t) Other effects include zooming in from the upper part or zooming out. These effects are applied randomly to each clip to keep the video visually engaging. Step 7: Final Video Assembly We combine all video clips into one seamless video and add background music: final_video = concatenate_videoclips(video_clips, method="compose") final_video.write_videofile(output_video_path, codec="libx264", audio_codec="aac", fps=30) final_video = concatenate_videoclips(video_clips, method="compose") final_video.write_videofile(output_video_path, codec="libx264", audio_codec="aac", fps=30) Step 8: Adding Captions Captions improve video accessibility and engagement. We use Captacity to automatically add captions based on the audio. Captacity captacity.add_captions( video_file=output_video_path, output_file="captioned_video.mp4", font_size=130, font_color="yellow", stroke_width=3 ) captacity.add_captions( video_file=output_video_path, output_file="captioned_video.mp4", font_size=130, font_color="yellow", stroke_width=3 ) Step 9: Adding Background Music To finish the video, background music is added. The volume is reduced so that it doesn't overpower the narration. background_music = AudioFileClip(music_filename).subclip(0, final_video.duration).volumex(0.2) narration_audio = final_video.audio.volumex(1.5) combined_audio = CompositeAudioClip([narration_audio, background_music]) final_video.set_audio(combined_audio) background_music = AudioFileClip(music_filename).subclip(0, final_video.duration).volumex(0.2) narration_audio = final_video.audio.volumex(1.5) combined_audio = CompositeAudioClip([narration_audio, background_music]) final_video.set_audio(combined_audio) Conclusion The GitHub repository for this project is available here. The GitHub repository for this project is available here. By using OpenAI and ElevenLabs, we’ve automated the creation of faceless videos from text. You can now quickly generate YouTube Shorts or TikToks without needing a camera or microphone. This automated process has allowed us to create a Faceless Shorts Video service on our Robopost software , offering content creators a seamless way to produce high-quality videos. Whether you are creating educational videos, short stories, or even meme-style content, this service handles everything from visuals to voiceovers with minimal effort. Faceless Shorts Video service Faceless Shorts Video service on our Robopost software Robopost software Now, you can focus on creativity and storytelling while Robopost handles the heavy lifting of video production. Happy creating!