§ 01 · Start here

Run local AI, agents, knowledge, tools, channels, and automation from one macOS workstation.

Ai Keeper is a native macOS control surface for Apple Silicon local models and agent workflows. It can run local serving engines, expose OpenAI-compatible APIs, coordinate multi-agent workspaces, index private knowledge, connect messaging channels, and manage advanced security and system diagnostics.

12Primary hubs

5Local engines

51Slash commands

23Channel families

Use this manual

Screen reference →What every hub, tab, and advanced system panel is for. Every setting →What each setting changes and when to touch it. Recipes & FAQ →First run, model setup, RAG, agents, automations, channels, recovery. Glossary →Server vs Client mode, the four registries, identity, presence, protocols.

Tip: hit ⌘K from any page to search across the manual. Click the Glossary button bottom-right to look up a term without leaving the page.

What Ai Keeper can do

Local model serving

Run AI models on your own Mac instead of paying a cloud company. The app supports five different "engines" — pick one based on the model you want to use.

Turns local model assets into running instances. Each instance bundles engine, model path, port, context budget, output limit, sampling profile, tool policy, and access policy.

omlx: default for most local agent workflows. SSD-tiered KV cache, multi-model serving, native tool calling, continuous batching, OpenAI/Anthropic-compatible APIs.
vllm-mlx: tool calling, reasoning parsers, API-key protection, rate limiting.
vmlx: MLX serving with adaptive quantization and media-oriented capabilities.
mlx-lm: lightweight MLX chat/streaming server with prompt-cache and speculative controls.
llama.cpp: GGUF compatibility, Jinja templates, embeddings/rerank, llama.cpp-specific tuning.

Unified API proxy

A single web address other apps (VS Code, Cursor, your scripts) can use to talk to your AI. Behind the scenes, Ai Keeper picks the right model.

Single OpenAI/Anthropic/Ollama-compatible endpoint that fans out to local instances or routed providers. System > API Access shows the base URL, ready models, management URL, and copy-paste examples.

Agents and workspaces

Build AI helpers with their own job, tools, and personality. Group several into a "team" that works together.

Agents are named AI roles with their own model, tool access, skills, memory scope, standing orders, hooks, and personality. Workspaces group agents for repeatable runs — sequential when order matters, mirrored parallel when independent specialists work concurrently.

Knowledge and memory

Give the AI access to your files, notes, and saved facts about you — so it actually knows your work and remembers what you told it.

Knowledge hub combining prompt context, skills, persistent memory, RAG documents, wiki pages, dreaming insights, contacts, and Obsidian vaults. Memory holds durable facts; Documents are source-grounded retrieval; Wiki is curated internal pages; Obsidian connects existing vault notes.

Automation and channels

Schedule the AI to run jobs by itself, and connect it to messaging apps so people can chat with it on Slack, Telegram, iMessage, etc.

Scheduled jobs, triggers, workflows, heartbeat turns, background tasks, hooks, standing orders, webhooks, and shared-inbox channel connectors. 23 channel families: Telegram, Slack, Discord, Twitch, IRC, Mattermost, WhatsApp, Matrix, WebChat, iMessage, Teams, Signal, and more.

Security and operator tools

Tools for keeping things safe and figuring out what went wrong — logs, audit trails, password vault, backups, and lots of advanced controls.

Request logs, inspectors, runtime/crash logs, diagnostics, sessions, usage trends, audit trails, failover chains, remote access, security scans, secrets vault, backups, tool groups, server lanes, ACP, node mesh, device pairing, media capture, exec sessions, personality editing.

First-run operating order

Follow this order on a fresh install. Each step assumes the previous one succeeded — if something fails, fix that layer before moving on.

Open System > Health. Run Diagnostics and Doctor before changing runtime settings. Fix dependency warnings first.
Pick Server or Client mode. Server mode runs models locally. Client mode connects to another Ai Keeper server.
Set the model directory. Choose a disk location with enough space for downloaded MLX or GGUF assets.
Download or register a model. Use Download > HuggingFace for MLX/GGUF search, filtering, model cards, and downloads.
Create a Runtime instance. Start with Automatic backend selection and use Optimize before manual tuning.
Test in Chat. Confirm streaming, attachments, tool approvals, slash commands, and context usage.
Expose API only when needed. Enable the proxy and copy the base URL from API Access.
Add knowledge. Turn on memory or index documents after the base chat path works.
Create agents and workflows. Add multi-agent workspaces, automations, channels, and webhooks after the model and tools are reliable.

Quick concept dictionary

Just the most-used terms. For the full glossary, open the bottom-right Glossary button or visit the Glossary page.

Engine

The "motor" that runs an AI model.

Backend process family. Examples: omlx, vllm-mlx, vmlx, mlx-lm, llama.cpp.

Instance

One running copy of a model with its settings.

Runnable configuration: backend + port + context + output limits + sampling + tools + access + custom args.

Proxy

A single address other apps point at to talk to your AI.

Unified API layer. OpenAI-compatible clients call local instances or routed providers through one base URL.

Token

A word-piece. Models read and write in tokens, not whole words. Roughly 1 token ≈ 0.75 English words.

Smallest unit consumed and produced by the model. Used to measure context, output, and cost.

Context window

How much the AI can "see" at once. Bigger window = better memory but more RAM.

Total token budget per request. Consumed by system prompt, history, tools, retrieved documents, and output.

RAG

The AI searches your files for relevant info, then answers using that info.

Retrieval-Augmented Generation. Indexes local files; relevant chunks are inserted into the prompt at answer time.

MCP

A standard for AI agents to borrow tools from external programs.

Model Context Protocol. Ai Keeper can consume MCP tools/resources and expose its own tools to other agents.

Hook

An auto-action that fires when something happens — like "log every Slack message".

Event-triggered action. Can inject prompts, execute skills, notify webhooks, log memory, or run commands.

Source areas: AppShellNavigation, SettingsView, InstanceDetailSections, ModelTypes, CloudProvider, channel connector services, RAGService, MemoryView, AgentsView, AutomationView, BrowserView, ClawHubView, HuggingFaceDownloadView, MCPRegistryBrowser, slash command definitions.