Building AxonerAI: A Rust Framework for Agentic Systems

Written by mnjkshrm_86h0lvqo | Published 2025/11/26
Tech Story Tags: rust | ai-agent | llms | ai-agent-tooling | ai | machine-learning | concurrency | agentic-ai-systems

TLDRAxonerAI is a Rust-based agentic framework with blazing fast speed which comes with the below features: standalone binaries (4.0MB), embedded systems, and high-concurrency production workloads. It delivers ~10MB RAM per agent, sub-200ms concurrent tool execution, and thousands of parallel sessions without the Python GIL. The architecture uses trait-based provider abstraction (Anthropic, OpenAI, Groq), async tools, and pluggable session management. Not replacing Python—complementing it for edge deployment, embedded systems, CLI tools, and production scale where compile-time safety and true parallelism matter. via the TL;DR App

I've worked with Python frameworks like LangChain/LangGraph, CrewAI, and StrandsSDK, and they have defined how we build AI agents. They excel at rapid prototyping, offer extensive integrations, production-ready agentic workflows, and leverage Python's rich ML ecosystem. But here’s the gap: what if you need to embed an agent in a binary, ensure memory safety, type safety, or run thousands of concurrent agent sessions without the Global Interpreter Lock? (Yes, Pydantic helps with type safety, but it's runtime validation, not compile-time guarantees). I indeed believe Python frameworks are phenomenal for most use cases and irreplaceable.

But certain deployment scenarios demand different trade-offs: edge deployment, data-intensive tooling, or high-concurrency environments. Let's talk about the OGs of performance—C and C++. These languages have been running the show for decades. Want proof? TensorFlow's computational engine is C++; OpenCV's entire codebase is C++. The list goes on. When you need blazing speed and metal-close control, you write it in C or C++. Period. But here's the catch: with great power comes great responsibility, and in this case, that means manual memory management, potential security vulnerabilities, and bugs that only show up at 3 AM in production.

Enter Rust—born in 2006 at Mozilla, designed specifically to solve C++’s memory safety problems. Rust keeps the performance, ditches the memory nightmares. And that’s exactly the foundation AxonerAI is built on.

Design Goals

AxonerAI isn't trying to replace Python frameworks. It's addressing specific use cases:

  • Standalone Distribution: Ship a single binary with zero dependencies. No Python runtime, no virtual environments, no pip install. Download and run.
  • Embedded Agents: Compile agents directly into applications. No subprocess management, no FFI overhead, no runtime dependencies.
  • High-Concurrency Workloads: Run thousands of agent sessions in parallel without GIL contention. Each session on its own tokio task with proper resource isolation.
  • Developer Tooling: Build CLI tools and terminal interfaces where startup time matters and users shouldn't need Python installed.

The architecture needed to support these goals while maintaining the ergonomics developers expect from modern agent frameworks.

Python itself uses Rust backed libraries. A few examples:

Pydantic V2 significantly leverages Rust to enhance its performance and core functionality. The underlying validation and parsing engine, known as pydantic-core, is written in Rust.

Polars, a DataFrame library written in Rust, is emerging as a strong alternative to Pandas for data manipulation in Python, particularly for performance-critical applications and larger datasets.

Architecture Decisions

Provider Abstraction

Rather than coupling the framework to a single LLM provider, I designed a trait-based system:

#[async_trait]
pub trait Provider: Send + Sync {
    async fn generate(
        &self,
        messages: &[Message],
        system_prompt: Option<&str>,
    ) -> Result<String>;
}

This abstraction layer means switching between Anthropic, OpenAI, or Groq is a configuration change, not a code rewrite. The framework currently supports all three with zero performance overhead thanks to Rust's monomorphization.

Tool System Design

The tool system needed to be both extensible and type-safe. Here's the interface:

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    async fn execute(&self, args: &str) -> Result<String>;
}

Tools register themselves into a ToolRegistry that handles discovery and invocation. The agent can query available tools, understand their capabilities, and execute them—all with compile-time type checking.

I implemented three core tools to prove the concept:

  • Calculator: Handles basic math operations - add, subtract, multiply and divide
  • WebSearch: Google Search API integration for real-time information retrieval (this can be done for free)
  • WebScraper: Extracts and cleans content from the search result’s URLs for agent context

Session Management

Production agents need persistent context. The SessionManager trait provides a clean abstraction:

#[async_trait]
pub trait SessionManager: Send + Sync {
    async fn load_session(&self, session_id: &str) -> Result<Vec<Message>>;
    async fn save_message(&self, session_id: &str, message: Message) -> Result<()>;
}

The current implementation uses file-based storage with UUIDs, but the trait makes it trivial to swap in Redis, PostgreSQL, or any other backend. Each session maintains full conversation history with proper serialization.

Provider Abstraction Layer

The framework needed to support multiple LLM providers without coupling to any single API. The trait-based design achieves this:

#[async_trait]
pub trait Provider: Send + Sync {
    async fn generate(
        &self,
        messages: &[Message],
        system_prompt: Option<&str>,
    ) -> Result<String>;
}

Each provider (Anthropic, OpenAI, Groq) implements this trait with its specific API details. The agent works with the abstraction, not the implementation. Switching providers becomes a configuration change:

// Development: Groq (generous free tier)
let provider = GroqProvider::new(api_key);

// Production: Anthropic (superior reasoning)
let provider = AnthropicProvider::new(api_key);

Zero-cost abstraction means no runtime overhead. The compiler specializes each implementation, generating specialized code paths.

Tool System Architecture

Tools needed extensibility without reflection or runtime type checking. The solution: a trait system with explicit registration:

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    async fn execute(&self, args: &str) -> Result<String>;
}

pub struct ToolRegistry {
    tools: HashMap<String, Arc<dyn Tool>>,
}

Tools register at compile time. The agent queries the registry, understands capabilities through descriptions, and invokes tools with proper error boundaries:

let calculator = Calculator::new();
let web_search = WebSearch::new(brave_api_key);
let scraper = WebScraper::new();

let mut registry = ToolRegistry::new();
registry.register(Arc::new(calculator));
registry.register(Arc::new(web_search));
registry.register(Arc::new(scraper));

Each tool runs in its own async context. Network calls don't block computation. The agent can invoke multiple tools concurrently when needed.

Session Management Design

Production agents require persistent context. The SessionManager abstraction handles storage:

#[async_trait]
pub trait SessionManager: Send + Sync {
    async fn load_session(&self, session_id: &str) -> Result<Vec<Message>>;
    async fn save_message(&self, session_id: &str, message: Message) -> Result<()>;
}

The current implementation uses file-based storage with UUIDs. Each session maintains full conversation history with JSON serialization. But the trait design means swapping in Redis, PostgreSQL, or DynamoDB is straightforward - just implement the trait.

Sessions persist across process restarts. An agent can pick up exactly where it left off, with complete context preserved.

Performance Characteristics

Where AxonerAI shows advantages:

  • Binary Size: ~4.0MB stripped release binary. Entire framework, all dependencies, ready to run. Compare this to distributing Python applications with PyInstaller (50MB+) or requiring users to manage virtual environments.
  • Startup Time: Cold start in ~50ms. Hot start (cached binary) under 10ms. Critical for CLI tools where users expect instant response.
  • Memory Footprint: Base agent with tool registry consumes ~10MB RAM. Session history scales linearly with conversation length. A thousand concurrent agents fit comfortably in 16GB RAM with proper session management.
  • Concurrent Sessions: Tokio's async runtime handles thousands of concurrent agent sessions. No GIL contention. Each session gets its own task with proper CPU scheduling.

These numbers matter for specific deployment scenarios: edge devices with limited resources, CLI tools users expect to feel instant, serverless environments with strict memory limits, or high-throughput agent orchestration.

Real-World Applications

Here's where AxonerAI actually makes sense:

  • Hybrid Agent Architectures: This is the pattern I'm most excited about. Your main orchestration layer stays in Python—because let's be honest, Python's ecosystem for ML is unbeatable. But the data-intensive tools your agent calls? Those can be Rust. Think about an agent that needs to process logs, parse massive CSVs, scrape dozens of websites concurrently, or query databases at scale. Build those tools in Rust, expose them to your Python agent which will orchestrate them. Best of both worlds.
  • Embedded Systems: Agents running on hardware where you can't install Python. IoT devices, embedded controllers, edge gateways. The 4MB binary and 10MB RAM footprint make this feasible. Compile your agent directly into the device, no runtime dependencies, no OS-level Python installation required.
  • Data Processing Pipelines: Agents sometimes need to ingest, transform and analyze large datasets before reasoning over them. The combination of Rust's performance and zero-cost async I/O means your agent can handle data-intensive preprocessing without blocking on I/O operations.
  • Developer Tooling: CLI tools, terminal-based assistants, code analysis tools. Users shouldn't need to pip install anything. They download a binary, run it, and it works. Period. The instant startup time means these tools feel native, not like they're spinning up a Python interpreter every time.
  • Production Traffic at Scale: When you're serving real users—thousands of concurrent sessions, each maintaining context, each potentially invoking multiple tools—the GIL becomes a bottleneck. AxonerAI's async runtime doesn't have this problem. Each agent session runs on its own tokio task with true parallelism. This isn't theoretical—this is what happens when your agent platform goes from proof-of-concept to handling production load.

The common thread? Scenarios where Python's strengths (ecosystem, rapid development) don't outweigh Rust’s advantages (performance, memory safety, concurrency, distribution).

Distribution Model

AxonerAI is available through multiple channels:

Crates.io: The primary distribution for Rust developers. Published at crates.io/crates/axonerai.

As a Library:

[dependencies]
axonerai = "0.1"

Developers integrate agent capabilities directly into their applications. No separate runtime, no subprocess management, no FFI overhead. It's just another Rust dependency that compiles into your binary.

As a Binary:

cargo install axonerai

For end users with Rust installed, this gives you a CLI tool. Single command installation, works across platforms, automatic version management through cargo.

GitHub: Source code, documentation, and platform-specific binaries at github.com/Manojython/axonerai. For users who don't have Rust installed, pre-built binaries for macOS, Linux, and Windows are available in releases. Download, run, done.

When Python Still Makes Sense

Let's be clear: most ML workflows should stay in Python. The ecosystem is unmatched—transformers, PyTorch, NumPy, pandas, scikit-learn. If you're doing model training, experimentation, deploying real-time models or rapid prototyping, Python is the right choice. Period.

AxonerAI isn't trying to replace Python for these use cases. It's targeting the gaps: standalone distribution, embedded deployment, edge AI and resource-constrained environments. Think of it as complementary, not competitive.

Open Source and Contributions

AxonerAI is MIT licensed and available on GitHub. The architecture is built for extensibility:

  • New LLM providers: Implement the Provider trait
  • Custom tools: Implement the Tool trait
  • Storage backends: Implement the SessionManager trait

The crate includes examples for common patterns: basic usage, custom tools, multi-session management, and provider switching. Want to add a new provider? Check the existing implementations—they're clean and straightforward.

Conclusion

Building AxonerAI showed me there's real room in the AI agent ecosystem for alternatives to Python—not because Python frameworks are bad, but because different deployment scenarios have fundamentally different requirements.

When you need standalone binaries, sub-second startup, thousands of concurrent sessions, or agents embedded directly in applications, Rust's properties are a game-changer. AxonerAI delivers on these requirements while maintaining the developer ergonomics we expect from modern frameworks. As the AI agent space matures and deployment contexts diversify, having tools optimized for specific use cases strengthens the ecosystem for everyone.

Try AxonerAI: cargo install axonerai

Links:


Written by mnjkshrm_86h0lvqo | Enjoy dabbling in LLMs
Published by HackerNoon on 2025/11/26