How to operate Ai Keeper in real workflows.
The safest pattern is: prove the model, prove the endpoint, prove the knowledge/tools, then automate. Most problems become easier when you isolate which layer is failing.
Workflow 1: Run A Local Model
- Open System > Health. Run Diagnostics and Doctor. Fix missing dependencies first.
- Open System > Settings > Storage. Confirm the Models Directory points to the intended disk.
- Open Download > HuggingFace. Search for a model. Use MLX for Apple Silicon local serving or GGUF for llama.cpp.
- Open Runtime > Models. Confirm the model appears and inspect size, quantization, fit, and loaded state.
- Open Runtime > Instances. Create an instance, choose the model, leave backend on Automatic, and run Optimize.
- Start the instance. If launch fails, open diagnostics for that instance, then System > Logs.
- Open Chat. Send
/status, then ask a small test question. - If response quality is weak, adjust sampling. If speed is weak, benchmark before changing context or cache settings.
Workflow 2: Expose The OpenAI-Compatible API
- Start at least one ready Runtime instance.
- Open System > API Access. Confirm proxy status, ready instances, and base URL.
- Use the proxy base URL for clients. It usually ends in
/v1. - Use API keys or management keys when exposing beyond localhost.
- Open System > Requests while testing. Check status code, endpoint, latency, prompt tokens, completion tokens, and errors.
- If a client expects Ollama-style routes, use the compatible routes shown in API Access instead of direct backend URLs.
Workflow 3: Add Private Knowledge
Memory
For things you want the AI to remember about you forever — your preferences, projects, the way you like things done.
Persistent facts across conversations: preferences, project names, constraints, and durable notes.
- Enable Persistent Memory when you want recall across sessions.
- Enable Auto-Extract only when chat content is safe to store as facts.
- Export as Markdown before deleting or migrating memory.
Documents
For getting answers from your own files. Drop in PDFs, notes, code, spreadsheets — the AI can read and quote them.
Source-grounded answers from files, folders, PDFs, markdown, code, office documents, images, ebooks, and other indexed inputs.
- Choose an embedding-capable instance.
- Use Add Files for selected sources or Add Folder for a project corpus.
- Adjust chunk size and overlap if retrieval is too fragmented.
- Adjust Top-K if answers miss relevant facts or include too much noise.
Wiki and Obsidian
A built-in note system with links and tags (Wiki), or use your existing Obsidian vault.
Use Wiki for curated reference pages with tags, links, backlinks, and edit history. Use Obsidian when your knowledge already lives in a vault.
Dreaming
When idle, the AI writes a journal of patterns and ideas it noticed. Read for insights, not for hard facts.
Idle-generated diary entries, memory-palace pages, and improvement ideas. Insight generation, not authoritative memory.
Workflow 4: Build A Multi-Agent Workspace
- Create or start the model instances each agent will use.
- Open Workspaces > Directory and create agents with clear names and roles.
- Assign an instance or provider route to each agent.
- Set Memory Scope. Use broad memory for personal assistants, narrow memory for task specialists, and none for isolated reviews.
- Set Tool Access. Use all tools only for trusted agents. Use selected tools for specialists. Use none for pure reasoning or writing roles.
- Add skills, standing orders, and hooks only when they clearly improve the role.
- Create a workspace, choose agents, choose an orchestrator if one should coordinate, and set execution mode.
- Keep confirmations enabled for terminal commands, Python scripts, file operations, and browser actions until the workflow is proven safe.
- Open Workspaces > Run, launch a task, monitor messages, approve or deny tools, and save useful outcomes.
Workflow 5: Automate Safely
Prototype first
Always do the task by hand once before letting the AI do it on a schedule. Catch surprises while you're watching.
Run manually in Chat or Workspaces. Verify model, tools, prompts, and permissions behave before scheduling.
Choose trigger type
Pick what kicks off the automation: a clock, an external event, or a regular heartbeat.
Jobs for schedules, Webhooks for external events, Workflows for multi-step operations, Heartbeat for recurring agent turns, Background for long-running jobs.
Apply authority rules
Decide what the automation can do on its own and what it must ask permission for first.
Combine policies, standing orders, hooks, and approval gates to scope autonomous behavior.
Monitor
Always set up some way to see if the automation actually did what you wanted — logs, notifications, the audit trail.
Use logs, request history, background task state, audit trail, and notifications to verify each run.
Workflow 6: Connect Channels
- Open Extensions > Channels.
- Add a connector or native source. Telegram has its own first-class service (Extensions > Channels > Telegram). Core webhook/streaming connectors include Slack, Discord, Twitch, IRC, Mattermost, Nextcloud Talk, WhatsApp, Matrix, and WebChat.
- Use extended channel support for Google Chat, Signal, BlueBubbles, iMessage, Microsoft Teams, Feishu/Lark, LINE, Nostr, Synology Chat, Tlon/Urbit, Zalo, QQ Bot, and WeChat.
- Configure transport: polling, webhook, stream, or local.
- Set channel IDs, workspace IDs, server URLs, webhook paths, verify tokens, recipients, or phone-number fields as required by the platform.
- Configure voice handoff templates only for connectors that use external voice flows. WebChat voice is local and does not need templates.
- Use DM Access for unknown senders and Auto-Reply for when the assistant should answer.
- Watch the shared Inbox and channel health before enabling broad automation.
Slash Command Dictionary
Slash commands are shortcuts you type in chat starting with /. Type /commands in the Chat to see a live list with autocomplete.
The 51 built-in commands below are registered by SlashCommandHandler. Custom commands installed by skills, plugins, or ClawHub packages appear alongside them in /commands. Tab-completion is case-insensitive.
Manage the conversation: /new, /clear, /reset, /stop, /regenerate, /compact, /export-session — start over, kill a runaway response, regenerate the last reply, summarize, or save the chat to a file.
Lifecycle and persistence controls for the active session: clear context, reset agent state, stop generation, regenerate last response, force-trigger compaction, or export the transcript.
Change how the AI responds: /models, /think, /reasoning, /fast, /verbose, /elevated — see what models are available, turn on slow careful thinking, or switch between quick / detailed / expert reply styles.
List runnable/running models, toggle thinking mode, show step-by-step reasoning, or switch response modes (fast, verbose, elevated/expert).
"What's going on right now?": /status, /context, /session, /whoami, /id, /usage — show running models, what the AI's seeing in this chat, who you are, and how much you've spent.
Inspect active model and tasks, effective prompt and policies for the current agent, session/thread binding, identity and channel info, and runtime usage/limits.
One-click prompts: /summarize, /translate, /code, /explain, /fix, /review, /brainstorm, /proscons — type the command, drop a paragraph or file, get the obvious thing back.
Single-shot prompts that pre-shape the next turn. Useful before piping a paragraph, snippet, or attached file.
What the AI knows about you: /memory — see the saved facts. Edit them in Knowledge > Memory.
Inspect the assistant's stored memory facts. Extract policy and per-fact controls live under Knowledge > Memory.
What tools/extensions are available: /tools, /commands, /plugins, /mcp, /hooks, /orders, /tasks — list everything the AI can use right now.
List active-agent tools, all slash commands, installed ClawHub packages, connected MCP servers, active hooks, standing orders, and scheduled tasks.
Multi-agent control: /agent, /agents, /subagents, /focus, /unfocus, /steer, /queue, /kill — turn agent mode on, hand off to another agent, focus on one task, or stop a runaway agent.
Toggle agent mode, manage agents and sub-agents, focus/unfocus a task, redirect to another agent or mode, switch queue mode, or terminate a session.
Granting and limiting power: /approve, /exec, /acp, /allowlist, /activation — say yes/no to pending tool approvals, run a shell command, or change who's allowed to DM the assistant.
Resolve pending exec/tool approvals, configure exec policy or run a shell command, edit ACP policies, manage DM/channel allowlists, or change channel activation policy.
Outbound and live audio: /send messages another channel, /vc manages voice chat, /restart restarts what's running.
/send targets a channel; /vc shows or manages voice-channel status; /restart restarts the active runtime or connector.
Removed in this revision because they are not registered: /model (use /models), /config, /help. /vc is voice-channel control, not version control.
Provider Guide
Local
Models running on your own Mac. Free, private, works offline.
Local instances for privacy, offline operation, or predictable cost. Tune context and output limits to fit RAM.
CLI providers
For Claude Code and ChatGPT Codex — they sign in through their own desktop apps, not with a regular API key.
CLI-based agent providers. Auth is handled by the provider's installed binary rather than an API key route.
Cloud APIs
Pay another company (OpenAI, Anthropic, Google, etc.) to run a model for you. Need an API key from them.
Available routes: OpenAI, Anthropic, Google Gemini, Groq, Mistral, DeepSeek, xAI, OpenRouter, Together AI, Perplexity, Fireworks AI, Cerebras, Custom.
Failover
A backup plan: if one provider stops working (rate-limited, slow, down), the app tries the next one automatically.
Failover chains trip on rate-limit, auth failure, latency, or downtime. Key rotation distributes load across multiple keys.
Context caps
Some cloud providers limit how much you can send/receive in one go, even if you set a higher number. The app respects their limit automatically.
Provider-enforced max prompt+output windows. Ai Keeper clamps to the smaller of the user-configured value or the provider's metadata cap.
Cost
Cloud models cost money per use. Check the Usage screen to see what you've spent. Use local models for routine stuff to save money.
Track token consumption and cost trends in System > Usage. Use local models for high-volume routine work; reserve cloud providers for quality- or capability-sensitive tasks.
Troubleshooting FAQ
Run System > Health first. Then switch the backend back to Automatic and remove any custom command-line flags. Most "won't start" issues are missing dependencies or a bad flag.
Check System > Health (Diagnostics + Doctor), instance diagnostics, System > Logs, dependency status, model format vs. backend choice, custom args, and port conflicts. Use Automatic backend and clear custom args to reduce variables.
Make sure at least one instance is running and that whatever app is calling Ai Keeper is using the proxy URL (the one that ends in /v1).
Verify a ready instance, proxy enabled, route exists, API key correct, client targeting the proxy base URL. Inspect endpoint/status/latency in System > Requests.
Bump up the max output tokens setting. Also check if you accidentally turned on /fast mode, which keeps replies brief.
Increase max output tokens; check for active fast/terse modes; inspect stop sequences and structured-output constraints.
The AI is stuck in a loop. Try raising the Repetition Penalty in settings. Don't go too high or it'll start dropping necessary words.
Increase repetition penalty; adjust frequency/presence penalties for loops. For code, avoid excessive penalties — they break required syntax. Lower temperature only if randomness is the actual issue.
The AI tried to use a tool but couldn't. Type /tools to see what's actually available, then check if you've approved the tool and whether the right model family is selected.
Audit tool policy, parser, model family alignment, MCP server state, approval queue, and per-surface web/tool-execution permissions. Use /tools, the MCP tabs, and request logs.
The AI didn't find what's in your docs. Try increasing Top-K (give it more pieces to read), or shrink the chunk size for more precise matches.
Search indexed docs manually first; increase Top-K; tune chunk size/overlap; re-index stale corpora; verify the selected instance produces embeddings (Runtime Role = Embedding).
Stop any models you're not using right now. Don't keep three big models warm if you only need one.
Stop unused instances, lower context window, reduce cache budgets, enable KV cache quantization, lower idle threshold, or enable auto-unload on critical pressure.
Pause it. Open the Audit Trail to see what it actually did. Re-test the same task manually before turning it back on.
Pause the job, inspect Audit Trail, review standing orders/hooks, check workspace safety settings, require approvals for external side effects. Prototype manually before re-enabling.
A connector (Slack, Telegram, etc.) isn't getting messages. Check that the connector is on, the URL/token is right, and look at Channel Health.
Verify connector enabled state, transport, webhook URL, verify token, server URL, workspace/channel IDs, and Channel Health Monitor state. Cross-check inbound traffic in shared Inbox.
Another Mac can't reach this one. Make sure Server mode is on, the API key matches, and your firewall isn't blocking the port.
Confirm Server mode, management URL, Allow Client Access toggle, matching API key, firewall/network reachability, and port. Keep Web Tool Execution off unless explicitly required.