Building a local-first AI agent that genuinely never phones home
Most "local AI" Mac apps cheat. They run a model locally for the chat surface but ship telemetry, account systems, and crash reporters that quietly stream usage data home. Aguacatech is built on the constraint that this is unacceptable. Here is what that looks like in practice, and the four architectural choices that made it possible.
Constraint #1, no cloud, ever, even for "convenience"
The first decision was that the app would not ship a single network client of its own. No analytics SDK. No Sentry. No "anonymized usage statistics." No remote feature flags. No update-check pings. Even the license key never round-trips to a server, activation is local-by-format.
This was a hard rule, not a goal. Hard rules make architecture decisions easier later. When I needed an embedding for the local memory store, "let's use OpenAI's embeddings API" was off the table by definition, and I went looking for Apple's bundled NLEmbedding instead. When I wanted speech recognition for Voice mode, "let's use Whisper.cpp bundled in the app" was on the table (it runs locally), but "let's call Apple's SFSpeechRecognizer" was simpler and runs on-device too, so it won.
Constraint #2, talk to LM Studio over its OpenAI shim
I did not want to embed a model runner inside the app. Two reasons:
- Trust. If Aguacatech bundles llama.cpp, every model file becomes the app's responsibility. Updates, quantizations, hardware-specific kernels, it is a second product to maintain. LM Studio and Ollama already do this well.
- RAM. The model is usually 4–20 GB resident. If it lived in-process, every relaunch of Aguacatech would re-load the model. With an external runtime, the model stays warm.
The architecture is then trivially small: Aguacatech is a SwiftUI app that talks to http://localhost:1234/v1/chat/completions (LM Studio's default). The same client also works against http://localhost:11434/v1 (Ollama) and http://localhost:8080/v1 (mlx-server). Streaming is implemented via URLSession.bytes(for:) reading SSE data: lines.
let (bytes, _) = try await session.bytes(for: request)
for try await line in bytes.lines {
guard line.hasPrefix("data: ") else { continue }
let payload = line.dropFirst(6)
if payload == "[DONE]" { break }
let delta = try JSONDecoder().decode(StreamDelta.self, from: Data(payload.utf8))
continuation.yield(delta)
}
That's the whole streaming reader. The OpenAI shim is the lingua franca of local-LLM tooling now, every runner speaks it, every prompt-engineering frontend speaks it. Build to the standard, get a whole ecosystem for free.
Constraint #3, tools, not magic
When the agent needs to read a file, it calls read_file(path). When it needs to drive Calendar, it calls run_applescript(script). There is no special framework, no "agent autonomy", no implicit capability. Every tool is a typed Swift struct conforming to a single protocol:
nonisolated struct ChatTool {
let schema: ChatToolSchema
let run: (String) async throws -> String
}
The OpenAI tool-call wire format means the model produces structured JSON arguments that map directly onto each tool's signature. AgentRunner dispatches the call, awaits the result, appends it as a tool-role message, and loops up to 8 iterations.
This is mundane code. That is the point. The "agent" is not magic, it is a loop. The intelligence is in the model.
Constraint #4, approval gates with unified diffs
The first version of the file-write tool wrote files directly. The first time the agent rewrote a config file I had open in another editor, I knew that was wrong.
Every destructive tool now pauses the agent loop and surfaces an approval card in the chat. write_file renders a unified diff between the on-disk content and the proposed write. run_applescript shows the literal script. run_shortcut shows the shortcut name. run_shell_command shows the literal command and cwd. MCP tools flagged destructive show their argument JSON.
The approval card has three buttons: Allow once, Always allow this tool, Deny. Always-allow is per-tool (not per-(tool, args)), persists in UserDefaults, and can be wiped by toggling the approval system off and back on. Deny returns an error to the model, which adapts and tries something else, instead of stalling.
One small detail that took longer than it should have: the diff for write_file has to be computed against the file as it is right now, not as it was when the model decided to write. Local files mutate between turns. The approval card recomputes the diff each time it is shown.
The Quick Actions trick: double-tap Option
One of the most-used surfaces is Quick Actions, a floating panel that reads your clipboard, runs an LLM transformation, and copies the result back. The trigger is double-tap Option, anywhere in macOS.
Implementing this without conflicting with system shortcuts was harder than expected. Modifier-plus-letter combinations are taken by half the apps on a developer's Mac. Pure-modifier combinations (just ⌘, just ⌥) require listening to NSEvent.flagsChanged with both global and local monitors and tracking the time delta between presses:
NSEvent.addGlobalMonitorForEvents(matching: .flagsChanged) { event in
handleFlagsChanged(event)
}
NSEvent.addLocalMonitorForEvents(matching: .flagsChanged) { event in
handleFlagsChanged(event)
return event
}
The handler detects the release of Option (so a tap, not a hold), with no other modifier engaged, within ~350 ms of the previous Option release. Two taps in that window = trigger. Adding any other modifier mid-sequence resets it, so ⌥⌘, ⌥⇧, ⌥⌃ all pass through cleanly to whatever app owns them.
The global monitor requires Accessibility permission. The local monitor works without it, so even before the user grants Accessibility, the gesture works inside Aguacatech's own windows. This makes the unhappy path tolerable.
(There was an earlier ⌘⇧A binding via the Carbon Hot Key API. It crashed with a MainActor isolation trap and the UX was worse anyway. We left HotkeyManager in the source tree for future use, with all dispatch routed through MainActor.assumeIsolated, but it is not wired to anything user-visible.)
What MCP buys you, and what it doesn't
Aguacatech is a Model Context Protocol client. MCP is an open protocol, Aguacatech speaks the spec, any compliant MCP server can join the agent's tool registry as mcp__<server>__<tool>. Servers run as local subprocesses (stdio) or as remote HTTP endpoints. The handshake is small: initialize, tools/list, optional resources/list.
What this means in practice: when someone publishes an MCP server for Linear, or Postgres, or browser automation, Aguacatech inherits it. The Power-tier user adds a server config and the tools appear in the next chat. No app update needed.
What MCP doesn't buy you: free safety. MCP servers run with Aguacatech's privileges. Approval gates apply, but the user still has to trust the server's code. We do not yet sandbox stdio servers, that is on the roadmap, alongside Keychain-backed env-var storage for tokens.
Persistence and the part where the user owns the data
- Conversations: JSON at
~/Library/Application Support/aguacatech/conversations.json. - Logs: JSONL at
~/Library/Application Support/aguacatech/logs.jsonl. - Memory: SQLite at
~/Library/Application Support/aguacatech/memory.sqlite, embedded withNLEmbedding. - Clipboard history: AES-GCM-encrypted SQLite, key in Keychain.
- Camera & mic activation log: SQLite, append-only.
- Connection log: SQLite, indexed by opened_at / process / remote_addr.
- License:
UserDefaultsunder the bundle ID.
To uninstall and forget: trash the app via the in-app Uninstaller (it handles its own orphan scan), then defaults delete com.aguacatech. Nothing leaves a trace anywhere else, because nothing was ever sent anywhere else.
The honest measurement
Build a wireshark filter for everything to and from com.aguacatech. Run the app for a week. The only outbound traffic you will see is whatever you have configured the LLM endpoint as. If that is localhost:1234 (LM Studio), the count is zero packets leaving the machine, by definition. That is the bar.
What's next
- Keychain-backed MCP tokens. Today they live in plaintext on disk in
mcp-servers.json. Not great. - Local model manager v2. Per-conversation model overrides, "use the coder model just for this thread".
- Multi-agent inside scheduled tasks. The wiring is half-done; the headless run still uses the single-role loop.
- Whisper-backed voice for multilingual users. Apple's stack is great for English; Whisper is still the bar for everything else.
If any of this resonates: download Aguacatech (Free is generous), point it at LM Studio, send a message. Reply to hello@aguacatech.eu with thoughts.
Download Aguacatech→ See pricing→