I Built a Tool for Mobile and Computer Operator Using Local and Remote LLMs

Written by mkagenius | Published 2025/02/04
Tech Story Tags: open-source | openai | automation | android | ollama | openai-operator | open-source-tools | remote-llm-use-cases

TLDROpen-sourced tool can be found [here.](https://github.com/BandarLabs/clickclickclick) ![If you are a developer, drop a star!](https://cdn.hackernoon.com:null-dv333au)via the TL;DR App

It was the dawn of the AI age when, all of a sudden, new capabilities emerged from computer codes, the same 0s and 1s that once were used to create drawings using Turtle - if you are old enough to have used them. Now, we can control phones by just talking to them!

Click3

Open-sourced tool can be found here.

Claude Computer Use and OpenAI Operator

They are expensive, restrictive, and not privacy-friendly, so I decided to make use of local models while allowing people to use famous closed-source models if they prefer them.

Demos

https://youtu.be/zrCH_oKoZXI?embedable=true

In the above video, I asked the tool to search for bus stops at a certain location and it successfully found those. You can run the tool from the command line also along with the web interface created by Gradio.

https://youtu.be/Iej0rw7NS-w?embedable=true

Above, I asked the tool to start a 3+2 chess game on lichess. It successfully opened the lichess app and then clicked on the 3+2 game.

Architecture

The architecture is divided into three main modules - Planner, Finder, Executor

Planner: Creates plan of action

Finder: Finds UI bounds of elements

Executor: Scrolls, clicks, navigates, etc.

There is a flexibility of using either local model (Molmo via mlx-vlm vs closed source model for either Planner or Finder. So far, the recommendations are like below:

Use Cases

You can use this to create walkthrough overlays over any app.

Someone can automate stuff like filtering your matches on Tinder by auto-swiping in the apps based on some feature you tell it to look for.

Conclusion

For now, only Finder uses structured output. Soon, the planner will also be driven by some open-source model. Waiting on either of the open-source models to implement function/tool calling.

Tool - https://github.com/BandarLabs/clickclickclick


Published by HackerNoon on 2025/02/04