I Built a Poor Man’s Pixar Using Python, Gemini, and My Sense of Humor

Animation is too hard.

To make a 30-second clip, you usually need to learn Blender, hire voice actors, frame keyframes, and render for twelve hours. I wanted to make memes, shitposts, and storytelling clips, and I wanted them done in the time it takes to drink a coffee.

So, I built the Stickverse.

It is a scrappy, Python-based animation pipeline that turns a raw text story into a fully voiced, lip-synced (well, "lip-flapped") video. It uses Google's Gemini to parse the script, a CLI tool to record your own bad voice acting, and OpenCV to draw the frames.

Here is how I built it, and how you can use it to create your own episodes of the Stickverse.

The Architecture

The pipeline consists of three distinct Python scripts:

The Parser (parser.py): Uses Gemini to turn a text block into a storyboard (JSON).
The Director (director.py): A CLI tool that prompts you to record lines one by one.
The Animator (animator.py): The engine that stitches audio and vector graphics into an MP4.

Let's break down the code.

Phase 1: The Screenwriter (Powered by Gemini)

The hardest part of procedural animation is structure. If I write a story, I don't want to manually tag every single line with { "character": "Steve" }.

I delegated this to the Gemini API. I feed it raw text, and via a system prompt, it returns clean JSON.

Here is the secret sauce in parser.py. The system prompt enforces strict rules so the animator doesn't crash later:

system_prompt = """
You are a Screenplay parsing engine. Convert the following raw text story into a structured JSON animation script.

RULES:
1. Identify every character speaking.
2. Identify scene changes (backgrounds).
3. Output JSON ONLY. No markdown, no explanations.

JSON STRUCTURE:
[
  { "type": "background", "description": "desert" },
  { "type": "speak", "character": "Walter", "text": "Jesse, we have to cook." }
]
"""

If I feed it a text file where I wrote "Walter yells at Jesse about API keys," Gemini breaks it down into the exact scenes and lines I need to record.

Phase 2: The Director (The Karaoke CLI)

Once we have a script_plan.json, we need audio.

I didn't want to use AI Text-to-Speech (TTS) because AI voices have no soul. They can't capture the panic of a man realizing he missed a deployment deadline.

I wrote director.py using the sounddevice library. It reads the JSON, clears the terminal, and acts like a teleprompter.

# snippet from director.py
print(f"🗣️  CHARACTER: {char.upper()}")
print(f"📝 LINE: \"{text}\"")

# It waits 15 seconds for you to get into character
# Then records automatically when you hit Ctrl+C

It saves every line as a separate .wav file (line_0_Walter.wav) and updates the JSON to link the text to the audio.

Phase 3: The Animator (Math, Not Magic)

This is where the "Stickverse" comes alive. I didn't use a game engine. I used OpenCV (to draw lines) and MoviePy (to stitch frames).

The "Lip-Sync" Algorithm

I couldn't be bothered to train a neural network for lip-syncing. Instead, I used scipy.io.wavfile to analyze the volume of the audio file.

Loud noise = Mouth Open Big.
Quiet noise = Mouth Open Small.
Silence = Mouth Closed.

# snippet from animator.py
def draw_frame(t, character_speaking, mouth_open_amount):
    # ... drawing the body ...
    
    gap = int(mouth_open_amount * 20) # Calculate jaw drop
    
    # Draw top of head moving up and down
    top_center = (x, 300 - gap)
    cv2.ellipse(img, top_center, (50, 50), 0, 180, 360, color, -1)

The result is a flappy-headed aesthetic that looks like South Park meets a terminal window.

The Demo: "Breaking Stories"

https://youtu.be/vwO11W2zwRw?embedable=true

To test the system, I fed it a script about Walter and Jesse arguing about Hackernoon editorial standards.

The Prompt (that I totally followed to the letter):

"The scene takes place in a desert. Walter says: Jesse, we have to cook. These editors want 2 keys of articles by the end of the week."

The Result:

Gemini identified the "Desert" background keyword and changed the sky color to Cyan and the ground to Orange.
It identified Walter and Jesse.
The Animator rendered them in Blue and Yellow (Hazmat suit style).

Join the Stickverse

The code is open source. It is messy, it is funny, and it works.

The beauty of the Stickverse is that it is strictly code-based.

Want a top hat? Add cv2.rectangle to the head logic.
Want explosions? Add a random jitter function to the camera frame.

You can find the repo here: https://github.com/damianwgriggs/The-Stickverse

Clone it, modify the draw_frame function, and create your own episode. We have to cook.

Genesis Art Engine Prompt for featured photo: A blazing hot desert with a radiating sun. Meant to fit the “Breaking Stories” theme.