A tiny LLM living on my Android phone. Routed through a backend proxy to a Cloudflare Tunnel into Termux + llama.cpp.
Technical details
- Model:
llama-3.2-3b-instruct.q4km.gguf
- Runtime: llama.cpp / llama-server
- Device: Android phone running Termux
- Tunnel: Cloudflare Tunnel
- API: OpenAI-compatible
/v1/chat/completions
- Key never touches the browser — proxied via Pages Function.
Runs on a real Android phone — replies may be slower than cloud AI.