How to Build an LLM-Powered CLI Tool in Python

Why AI Belongs in the Terminal

Developers spend a huge chunk of their time in the terminal like running commands, reading logs, debugging scripts, working with git, managing servers, and automating tasks.

But the terminal is also unforgiving:

You must know the right flags
You must remember syntax
You need context for errors
Debugging often involves trial-and-error
Scripts quickly become unmanageable

Since LLMs excel at explanation, transformation, and reasoning, the CLI is a perfect environment for AI augmentation.

Imagine tools that can:

Explain complex commands and pipelines
Suggest safer alternatives
Read and summarize logs
Generate bash scripts on the fly
Fix broken git commands
Walk you through debugging steps
Serve as an “AI man page”

In other words, AI can make the terminal friendlier, smarter, and a lot more powerful.

How to Bring AI-Native Interactions Directly Into Your Terminal

The developer terminal hasn’t changed much in decades. It’s still a fast, scriptable, text-based interface designed for humans who know exactly what they’re doing. But what if your terminal could help you? What if the CLI itself could explain unfamiliar commands, auto-correct mistakes, generate scripts, reason about logs, or even execute actions with intelligence?

In this tutorial, we’ll build an LLM-powered CLI assistant using Python, the Realtime API, and a lightweight terminal UI. Our sample tool, called llm-explain, lets you type any shell command and get a real-time explanation streamed directly in your terminal. The experience feels like ChatGPT running natively inside your CLI.

This article covers:

How the OpenAI Realtime API works
Why it’s ideal for CLI tooling
Step-by-step implementation
Complete working Python example
Optional tool-calling (agents that can take actions)
Ideas for more advanced tools

What Is the OpenAI Realtime API?

The Realtime API is a WebSocket-based interface that provides:

a) Low-latency token-by-token streaming: Great for CLI output where you want text to appear naturally.

b) Event-driven communication: You can send and receive events such as:

input_text
response.output_text.delta
response.completed
response.tool_call

This enables multi-turn conversations and dynamic behaviors.

c) Built for interactive apps: Unlike the classic REST API, Realtime APIs are optimized for IDE assistants, Terminals, Real-time agents, Live coding or Voice interfaces

d) Optional "tool calling": Tools let you define functions the model can request, enabling command execution, file manipulation, queries, retrieval or anything your Python program can do

This is extremely powerful and makes the model feel alive.

Project Overview: Building llm-explain

Our example tool mimics a smart, AI-powered version of man pages, i.e,

You run:

python explain.py "tar -xzf backup.tar.gz -C /tmp"

And the system streams back:

This command extracts (-x) a gzip-compressed archive (-z)
from backup.tar.gz into the /tmp directory (-C /tmp). The -f 
flag specifies the archive file.

All streamed live, token by token.

The project is tiny but demonstrates the full power of the Realtime API.

Project Structure

llm-explain/
 ├── client.py
 ├── explain.py
 └── README.md

Two files:

client.py: a small wrapper for connecting to the Realtime WebSocket
explain.py: our command line interface

Step 1: Implement the Realtime Client

Create client.py:

# client.py
import asyncio
import websockets
import json
from openai import OpenAI

REALTIME_URL = "wss://api.openai.com/v1/realtime?model=gpt-4.1-realtime"

class RealtimeClient:
    def __init__(self, api_key):
        self.api_key = api_key

    async def connect(self):
        self.ws = await websockets.connect(
            REALTIME_URL,
            extra_headers={"Authorization": f"Bearer {self.api_key}"}
        )

    async def send_event(self, event):
        await self.ws.send(json.dumps(event))

    async def listen(self):
        async for msg in self.ws:
            yield json.loads(msg)

This class:

Establishes a WebSocket connection
Sends events to the model
Returns events as they’re streamed

This is the entire “real-time engine” powering the CLI.

Step 2: Create the CLI Tool

Now, create explain.py:

# explain.py
import asyncio
import argparse
import os
from rich.console import Console
from client import RealtimeClient

console = Console()

async def explain_command(command):
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        raise RuntimeError("Set OPENAI_API_KEY environment variable.")

    client = RealtimeClient(api_key)
    await client.connect()

    # Send user prompt
    await client.send_event({
        "type": "input_text",
        "text": f"Explain what this command does:\n\n{command}"
    })

    console.print(f"[bold green]🔍 Explaining:[/bold green] {command}\n")

    # Stream output in real time
    async for event in client.listen():
        if event["type"] == "response.output_text.delta":
            console.print(event["delta"], end="")
        elif event["type"] == "response.completed":
            break

def main():
    parser = argparse.ArgumentParser(description="Explain any CLI command using LLMs.")
    parser.add_argument("cmd", type=str, help="Command to explain")
    args = parser.parse_args()

    asyncio.run(explain_command(args.cmd))

if __name__ == "__main__":
    main()

This script:

Reads the command passed via CLI
Sends the message through the Realtime API
Displays the model’s response as a live stream

This gives developers an AI-native terminal experience.

Step 3: Run the Tool

Set your OpenAI key:

export OPENAI_API_KEY="your_key"

Explain any command:

python explain.py "git rev-list --count HEAD"

Example output (streamed):

🔍 Explaining: git rev-list --count HEAD
This command counts how many commits exist in the current branch up to HEAD. The --count flag returns the numeric total instead of listing individual revisions.

The result is fast, fluid, and extremely helpful when you’re unsure what a command does.

Step 4: Optional — Add Tool Calling (AI That Executes Commands)

You can expose functions that the model can call.

Define a tool:

tools = [
    {
        "name": "run_shell",
        "description": "Execute a shell command",
        "parameters": {
            "cmd": {"type": "string"},
        }
    }
]

Listen for tool calls:

elif event["type"] == "response.tool_call":
    if event["name"] == "run_shell":
        output = subprocess.getoutput(event["args"]["cmd"])
        await client.send_event({
            "type": "tool_output",
            "content": output
        })

Important: Only allow safe, sandboxed execution - especially for multi-user systems.

But once sandboxed, this unlocks:

llm-git that automatically fixes your errors
llm-logs that identifies failure patterns
llm-devops that applies infrastructure changes
llm-shell where the model becomes your command runner

This is where things get insanely powerful. With this pattern, developers can build a whole ecosystem of AI CLI assistants. Here are real projects you can build:

1) AI Man Page 2.0

Ask questions like: llm-help "What is the difference between grep -r and grep -R?"

2) Git Doctor

Automatically fix common git issues: llm-git "help me resolve this merge conflict"

3) AI Log Debugger

Paste logs to get root cause analysis.

Conclusion

The CLI has always been one of the most powerful environments for developers but also one of the least accessible. With the OpenAI Realtime API, it’s now possible to bring AI directly into that workflow in a natural, real-time, low-latency way.