Pushing the Limits: Running Local LLMs and a 24/7 Personal News Curator on 4GB of RAM

The Raspberry Pi has always been a go-to for developers, and I’ve been seeing exactly how far I can push my 4GB Pi 4 lately. Instead of just running standard benchmarks, I wanted to build an actual application to get a real feel for how the hardware and models perform when handling data in real-time.

I’m a big fan of technical blogs, but I’ve found that most recommendations just chase the latest trends rather than focusing on the specific topics I actually care about. To solve that, I built a Curator App. It uses Playwright to browse a handful of whitelisted sites, validates the articles, and then uses a local LLM to check if the content actually matches my taste. While my original goal was just to build a solid pipeline to test the Pi’s limits, it ended up turning into a tool that I now use every single day.

Hosting LLMs on Raspberry Pi

There were multiple ways to do this on Pi, I went with ollama. It is very easy to install the ollama binary on Pi.

Installation Ollama

curl -fsSL https://ollama.com/install.sh | sh

Once I had it installed, all I had to do was pull some models. I started with functionGemma, which is a 270m version of Gemma that supports tool calling. It is lightweight and a good one to begin with experimentation.

Testing Inference

You can run the following command and actually chat with the model in the terminal.

ollama run functiongemma

Basic App to use the models

The logic behind the app was straightforward: grab stories from the web, categorize them, store them in a database, and send me a notification. Since the functiongemma model doesn’t have a built-in web search like the big players (Gemini or Claude), I set up Playwright with Node.js to scrape a specific list of whitelisted sites. Once the script gathered the links and metadata, the LLM would judge which ones were worth my time, sending a shortlist of the top five directly to my phone via Pushover.

Technically, it worked, but it wasn’t a great experience yet. I ran into a few frustrating roadblocks:

Manual Effort: I had to manually SSH into the Pi every time I wanted to trigger a run.
Poor Filtering: The model was being a bit too picky and ignoring some really solid articles.
Zero Visibility: I had no way to look back at the history of articles the script had already visited.
No Feedback: There was no way for me to “coach” the LLM or correct it when it made a bad call.

MVP version

To solve those early issues, I rebuilt the CLI as a proper web server using SQLite for storage. I also switched to the Qwen2.5:0.5b model, which has been a much better fit for this project. For the UI, I kept things lean with a plain HTML dashboard. I didn’t want to bloat the app with a massive node_modules folder by using React. I also built in a feedback loop so I can “teach” the model my preferences; the LLM uses my ratings and topic scores to make better decisions on the next batch of articles.

For automation, I’m using PM2 to manage the server and ngrok to give the Pi a persistent dev domain I can access from anywhere. I even added a cron job so the discovery process kicks off automatically twice a day.

Now, the moment the Pi boots up:

ngrok starts and links the server to my domain.
Ollama initializes to handle the model inference.
PM2 handles the build and ensures the server is running.

The result is a 24/7 curation service that sends me five targeted stories every day. I can jump into the dashboard at any time to trigger a manual fetch, give feedback, or even generate a quick TLDR summary of an article. It’s all running smoothly on just 4GB of RAM with zero hosting or API costs.

Pushing it further

Next, I tried to replace Qwen2.5:0.5b with Qwen3:1.7b. But the hardware could not give enough performance to ensure the user experience is good and snappy. But for inference with fewer tokens, it worked well.

Final Thoughts

Building this project taught me that you don’t need a massive server or expensive cloud APIs to run a practical, daily AI agent. My 4GB Pi 4 has transformed from a simple hobbyist board into a 24/7 personal assistant that filters the noise of the internet for me, all without costing a cent in hosting or inference fees.

While larger models like the 1.7b version started to push the hardware’s limits, finding the “sweet spot” with lightweight models like Qwen2.5:0.5b keeps the experience snappy and reliable for a single-user setup. There is something incredibly satisfying about having a completely private, self-hosted system that lives on my desk and actually gets smarter the more I interact with it. It’s proof that with the right pipeline, even “modest” hardware can deliver a high-end AI experience.