Settings dictionary

What each important control means, how it changes outcomes, and when to touch it.

Settings exist at two levels. System settings define global defaults and access rules. Runtime instance settings define how a specific model runs. Existing instances keep their own values when you change new-instance defaults.

System > Settings

Each row has two layers. The plain English line is for new users. The operator detail line is the precise behaviour and the rule for when to change the value.

Theme

Just changes how the app looks. Pick whatever's easiest on your eyes.

Light, dark, or system-follow. Does not change model output or runtime behavior.

Compact Mode

Squishes the spacing so more fits on screen. Turn on if your screen feels cramped.

Reduces global UI padding. Useful on small displays or when monitoring many panes side-by-side.

Tooltips

Shows little help bubbles when you hover over things. Leave on until the app feels familiar.

Inline contextual help. Turn off once you know the interface to reduce visual noise.

Sidebar Destinations

Hides sections of the left menu you don't use, so the app looks simpler.

Shows or hides primary sidebar hubs. Hidden hubs still exist and remain reachable via route aliases or direct links.

Launch at Login

Makes Ai Keeper start automatically when your Mac starts.

Enable for always-on proxy, channel connectors, scheduled automations, or server mode. Disable if you only run models on demand.

Auto-save Chat Drafts

Remembers what you were typing so you don't lose it if the app closes.

Persists unfinished chat input to local storage. Disable if drafts should not persist on disk.

Audio Transcription Model

The voice-to-text model. Pick a small one for quick notes; a larger one if you record meetings or noisy audio.

Model repo used for voice messages, microphone capture, and the audio_transcribe tool. Larger models are slower but handle accents and noise better.

Auto-compact

When a chat gets very long, the app automatically writes a short summary of the older parts so the conversation can keep going.

Summarizes older messages when the conversation approaches the context-window threshold. Keeps long sessions alive but can lose exact phrasing — export important transcripts before it kicks in.

Auto-compact threshold

How "full" a chat has to get before the auto-summary kicks in. Lower means it summarizes sooner.

Percent of the context window at which compaction triggers. Lower preserves memory for new content; higher preserves raw transcript longer.

Summarizer model

Which model writes the summary. Leave blank to use the same model you're chatting with.

Optional model id used during compaction. A small local or cheap cloud model makes compaction fast and inexpensive without affecting chat quality.

Recent compactions

A history of summaries the app has made, so you can see when and how much was condensed.

Tracked compaction events with token-reduction stats. Use to verify auto-compaction is firing as expected.

Connection And Access

Server mode

This Mac is the one doing the heavy lifting — it runs the models and other devices can connect to it.

Local instances run here; the management API and proxy accept remote clients. Pick this on the machine with the most RAM, fastest disk, and your model storage.

Client mode

This Mac is a remote control — it talks to another Mac that's running the models.

Defers runtime, RAG, and automation to a remote server. Cannot start local instances or host channels itself. Use on a laptop or secondary machine.

Management Port

The number other apps use to reach Ai Keeper on this Mac. Like the door number for a building.

TCP port for the management API and web surfaces. Change only for port conflicts or network policy. Clients must update their URL after a change.

Management API Key

A password that lets other devices talk to your Ai Keeper. Don't share it.

Bearer secret required for remote management API calls and web dashboard access. Rotate immediately if exposed.

Allow Web Access

Lets you open Ai Keeper in a normal web browser, not just the app.

Serves the web UI at the management root URL. Disable to fully block browser access.

Allow Client Access

Lets other Macs running Ai Keeper connect to this one.

Permits remote Ai Keeper clients to use the management API. Disable when the host should be local-only.

Allow Tool Execution from Web

Lets the web version actually run things on your computer (open files, run commands). Risky — only turn on for people you trust.

Permits web-initiated tool calls to execute locally. Keep off for read-only browsing; enable only for trusted users on trusted networks.

Remote Server URL

Only used in client mode — the address of the other Mac you're connecting to.

Client-mode-only. Copy verbatim from the server's Connection settings (includes scheme, host, port).

Remote API Key

Only used in client mode — the password to talk to the other Mac. Get it from that Mac's settings.

Client-mode credential matching the remote server's Management API Key. Required for real remote use.

Proxy Base URL

A web address other apps (like coding tools) can use to talk to your local AI as if it were ChatGPT.

OpenAI-compatible endpoint path, typically ending in /v1. Use in any external client that expects an OpenAI-shaped API.

Storage And Maintenance

Models Directory

The folder where downloaded AI models are kept. They're big — put this on a drive with lots of free space.

Library path scanned for model assets. Use a fast SSD; MLX and GGUF weights can be tens of gigabytes each.

Data Directory

Where Ai Keeper saves your settings, chats, and memory.

Application support location for state, sessions, and configuration. Back this up regularly.

Download Cache

Temporary copies of files you downloaded. Safe to clear if you need disk space.

Cached Hugging Face downloads. Deleting frees disk; assets re-download on next use.

OMLX Cache

Speed-up files the engine builds while running. Clear them if a model starts acting weird.

Runtime cache at ~/.omlx/cache. Rebuilds on next use; clear when stale state causes startup or generation issues.

Secret Store

A safe spot for your API keys and passwords so you never have to paste them into chat.

Encrypted local credential vault. Use exclusively — never inline secrets in prompts, custom args, or saved profiles.

Auto-update Dependencies

Keeps the supporting tools Ai Keeper uses up to date by itself.

Updates Homebrew-managed packages (llama.cpp, oMLX, Python). Disable when reproducibility matters more than freshness.

Dependencies

A list of helper tools the app needs. If something is broken here, fix this first.

Required and optional runtime packages. Use Re-check, Install, Update, Repair, and Uninstall actions to recover local runtime health.

New Instance Defaults

These apply when creating new local instances. Existing instances keep their own per-instance settings until edited directly.

Context Window

How much the AI can "see" at once — your message, the chat history, attached files, and its reply, all measured in word-pieces. Bigger means it remembers more, but costs more memory.

Default token budget for new local instances. Consumed by system prompt, history, tools, retrieved chunks, and output. Larger windows increase RAM and prefill time.

Default Max Output Tokens

A cap on how long each AI reply can be. Leave blank to let the model decide.

Per-response output length cap. Lower for chatty models that ramble; higher for code, long-form writing, or reports.

Idle Threshold

How long a model can sit unused before the app offers to unload it and free up memory.

Idle timer for memory-freeing prompts. Lower aggressive RAM recovery; higher keeps models warm for faster next response.

Auto-Unload on Critical Pressure

If your Mac is running out of memory, the app stops unused models automatically instead of asking you.

Stops idle past-threshold instances on macOS critical memory pressure events. Enable on low-RAM systems; disable if you prefer manual control.

Runtime Instance Settings

Instance Name

A nickname for this running model. Pick something that tells you what it's for.

Human-readable label. Use names that reveal model, role, or route — e.g. qwen-coder-fast, llama-vision-vlm.

Model

Which AI model this instance will run.

Local asset or provider model. MLX for Apple Silicon local serving; GGUF for llama.cpp; provider model for cloud/CLI routes.

Host

Who's allowed to reach this model. localhost means only your Mac; broader values let other devices in.

Bind address. 127.0.0.1/localhost for local-only; 0.0.0.0 exposes to the network — never enable without an API key and firewall rule.

Port

The number this instance listens on. Like a doorway number — only one model can use it at a time.

TCP listening port. Change to avoid conflicts when running multiple instances or to match a client's hardcoded expectation.

Backend Selection

Which engine runs the model. Leave on Automatic unless a model needs a specific one.

Automatic, omlx, vllm-mlx, vmlx, mlx-lm, or llama.cpp. Override only when a format or feature requires a specific engine.

Optimize

A button that picks good settings for you based on your Mac and the chosen model. Run it before fiddling manually.

Analyzer that chooses engine and settings from model metadata + hardware profile. Reset reapplies defaults and clears local overrides.

Speed / Quality

A slider that trades faster replies against better-quality replies.

Family-specific preset slider. Fast lowers latency; Best uses higher-effort sampling and reasoning when memory and time permit.

Cache Memory MB

How much memory this instance is allowed to set aside for short-term speed-ups. Bigger is faster on repeat questions but uses more RAM.

Runtime cache budget. Raise for repeated prompts or sustained workloads; lower to protect system RAM headroom.

Chunked Prefill Tokens

When you send a very long prompt, it's processed in pieces. This sets the piece size.

Splits long prefill into chunks. Useful for huge contexts; values too small add scheduling overhead.

Scheduler Policy

How the instance handles several requests at once.

Continuous batching improves multi-user throughput; stricter policies make single-user latency more predictable.

Prompt Cache Size

How many recent prompts the instance remembers verbatim to speed up repeats.

Number of cached prompt-prefix entries. Helps system prompts, agent scaffolds, and templated workflows.

Prompt Cache Bytes

A memory cap for the prompt-remembering feature above.

RAM ceiling for prompt cache. Raise only if cache-hit rate matters more than headroom for new requests.

Tensor Parallel Size

Advanced — splits the model across multiple chips. Most users leave this alone.

Parallel split degree. Backend- and hardware-dependent. Wrong values fail launch — verify with the engine's docs first.

Custom Args

Power-user box for typing raw command-line flags directly. Skip unless you know what you're doing.

Raw backend CLI arguments. Verify against the backend's --help output before saving — a single bad flag prevents launch.

Generation And Tool Behavior

Temperature

How creative versus consistent the AI is. Low = predictable; high = surprising.

Sampling randomness. Low for deterministic code, extraction, and tests; higher for brainstorming or prose variety.

Top P

Limits how adventurous each word choice can be. Most people leave this alone.

Nucleus sampling cumulative probability. Lower narrows the candidate pool; higher allows broader alternatives.

Top K

Limits the number of words the AI considers each step. Default is usually best.

Hard cap on candidate token count. Lower stabilizes small models; the default is tuned for the model family.

Min P

Filters out very unlikely word choices so creative replies don't go off the rails.

Rejects tokens whose probability is too small relative to the top candidate. Keeps creative output coherent without making it rigid.

Repetition Penalty

Discourages the AI from saying the same thing over and over.

Penalizes repeated token patterns. Raise for stuck loops; lower if it starts avoiding necessary syntax or exact terms in code.

Frequency Penalty

Pushes the AI to use a wider vocabulary. Good for writing; risky for code.

Discourages tokens by usage frequency. Useful for prose; risky for code, citations, and exact identifiers.

Presence Penalty

Encourages the AI to bring up new ideas instead of staying on one.

Boosts unseen tokens to encourage topic shifts. Useful for ideation; avoid for tight, factual answers.

Seed

Locks in a "luck number" so the AI gives the same answer every time. Useful for testing.

PRNG seed for repeatability. Fix for regression tests; leave default for normal conversation.

Tool Calling Policy

Decides whether the AI is allowed to actually use tools (browse, run code, read files), and how strict to be about it.

Controls tool exposure and call interpretation. Auto for agent work; Restricted or Disabled for sensitive conversations.

Tool Call Parser

How the app understands tool requests from the model. If tool calls keep failing, try a different one.

Parser for model-family-specific tool-call syntax. Switch when calls are malformed — match the model family (Llama, Hermes, Mistral, etc.).

Thinking Policy

Lets the AI take more time to "think" before answering. Better answers, slower replies.

Enables/disables reasoning preambles. Use deep thinking for hard planning; disable for speed or models that degrade with decorated prompts.

Structured Output Policy

Forces the AI to reply in a specific shape (like a form). Useful when another tool needs to read the answer.

Constrains response format to a JSON schema or grammar. Use for automation; disable for freeform chat.

Runtime Role

What kind of job this instance does — chat, look-at-images, search, audio, etc. The app uses this to send the right work to the right model.

Declares specialization: LLM, VLM, Embedding, Reranker, Audio STT, Audio TTS, or Audio STS. Routing and UI selectors filter on this.

Backend-Specific Controls

These controls only appear on engines that support them. Most users never touch them.

KV cache quantization

Squeezes the model's short-term memory smaller so more fits in RAM. Stronger squeeze saves more memory but can hurt quality.

Compresses the key/value attention cache. Off preserves quality at full memory cost. Q8_0 ≈ half the memory with negligible quality loss. Q4_0 saves more but can affect generation stability.

Speculative prefill

Lets the model skip re-reading parts of the prompt it has already seen. Speeds up agent loops and very long chats.

Reuses a percentage of cached prefix state above a similarity threshold. Reduces redundant prompt processing for repeated agent scaffolds.

Speculative decoding

Uses a tiny "draft" model to predict ahead, then the main model checks. Faster replies — but you have to load two models.

Draft-model accelerated decoding. Improves throughput at the cost of additional VRAM/RAM and configuration complexity.

Adapters

Tiny add-on files that customize a model for a specific task without retraining the whole thing.

LoRA / adapter paths attached at load time. Verify base-model compatibility before launch — mismatches cause silent quality drops.

llama.cpp extras

A pile of advanced knobs that only the llama.cpp engine exposes. Skip unless you're tuning a specific GGUF model.

GGUF-oriented controls: Mirostat, rope scaling, tail-free sampling, dynamic temperature, slot similarity, concurrency. Only relevant on the llama.cpp backend.

Cloud providers

If you don't want to run a model locally, you can use one from a company like OpenAI, Anthropic, or Google instead. Most need an API key from that company; some (Claude Code, ChatGPT Codex) just use their CLI app.

Provider list: ChatGPT Codex, Claude Code, OpenAI, Anthropic, Google Gemini, Groq, Mistral, DeepSeek, xAI, OpenRouter, Together AI, Perplexity, Fireworks AI, Cerebras, Custom. CLI providers auth via their installed binary; the rest use API keys from the Secret Store.

Security Settings

Audit Trail

A diary the app keeps of everything it did — every tool it ran, every file it touched. Useful for "wait, what did it do last night?" moments.

Append-only event log of tool calls, approvals, exec sessions, channel events, and identity changes. Export periodically for forensic review.

Security Audit

A health check that flags risky settings: open doors, missing passwords, plugins from unknown sources.

Posture report covering exposed ports, allowlist gaps, default-deny coverage, secret hygiene, and unsigned plugins. Treat warnings as blockers.

Secret Store / Secrets Manager

A locked vault for your passwords, API keys, and tokens. Always store them here.

Encrypted local credential vault for API keys, OAuth tokens, webhook secrets, and connector credentials. Never inline these in prompts or saved profiles.

Backup

Save a copy of all your settings, chats, and notes so you can restore them later if something breaks.

Snapshot, restore, and export of state, sessions, knowledge, prompts, and configuration. Restore wipes post-snapshot state — don't rollback past channel webhooks still in use.

Content Scanner

A safety net that catches sensitive info (like phone numbers or social security numbers) before it leaves your machine — and catches sneaky inputs trying to trick the AI.

Input/output filters for PII, prompt-injection patterns, and policy keywords. Layer over Tool Policy and approvals; not a sole defense.

DM Pairing / DM Access

A guest list — only people on the list can DM the assistant on connected channels (Telegram, Slack, etc.).

Per-channel allowlist of senders permitted to direct-message the assistant. Configure welcome/rejection messages; enable Auto-Reply only for paired senders.

Connectivity Settings

Failover Chain

A backup plan: if your first AI provider fails (down, rate-limited, slow), the app automatically tries the next one on the list.

Ordered list of providers/instances per route, with cooldowns and key rotation. Keep a local fallback at the tail of the chain for offline resilience.

Remote Access

Settings for letting other devices reach this Mac. Don't turn on without setting a password (the API key) first.

Server-mode reachability profile. Always pair with a Management API Key — never enable without authentication.

Server Lanes

Advanced — lets one Ai Keeper server serve different groups (teams, apps, batch jobs) without their traffic stepping on each other.

Server-side routing lanes that group endpoints by purpose, priority, or tenant. Use to isolate batch traffic from latency-sensitive interactive traffic.

ACP Server

Lets your AI agents talk to other AI agents using a shared language. Advanced — only relevant if you're hooking up multiple agent systems.

Agent Communication Protocol surface for structured agent-to-agent messages. Configure inbound auth and allowlists before exposing.

Node Mesh

Connects multiple Macs running Ai Keeper into a small network so they can share work.

Peer-to-peer discovery and work-routing across hosts. Signed pairing only; keep per-peer roles narrow — a mesh is not a substitute for proper auth.

Device Pairing

Pair an iPhone, iPad, or other Mac with this server, like pairing a Bluetooth speaker.

Trust handshake for iOS/iPadOS or other Mac clients via QR or short code. Revoke a pairing immediately if the device is lost.

Webhooks

Web addresses that other services can ping to trigger something here, or that Ai Keeper pings to notify other services.

Inbound HTTP triggers and outbound notification targets. Each webhook holds its own signing secret — rotate after any audit-trail anomaly.

Media & Devices Settings

Voice

Mic input, spoken replies, and how voice gets handed off when chatting in places like Slack.

Microphone capture, TTS playback, and channel voice-handoff templates. WebChat voice is local; external-channel voice uses the configured handoff template.

Screen Capture

Lets the AI see your screen. Be careful — share specific windows, not the whole desktop, so notifications don't leak.

Permission and pipeline for sharing the screen with vision-capable models. Limit to specific windows or displays — full-screen capture exposes notifications and unrelated apps.

Camera

Lets the AI use your webcam. Off by default for privacy.

Webcam capture for VLM and presence flows. Keep disabled by default; enable per-session.

Language

The app's display language and the language the AI replies in by default.

App localization and default assistant language hint. Models that respect locale hints will switch reply language accordingly.

Operator Settings

Tool Groups

Bundles of tools you can give an AI agent in one go, like a "web tools" or "file tools" pack — instead of toggling each one.

Named bundles of individual tools (e.g. fs-read, web, shell-safe). Assign groups to agent roles for cleaner audit and rename-safety.

Templates / Bootstrap Templates

Starter files an agent can drop into a folder when starting work — like project boilerplate.

File scaffolds dropped at task start (lint configs, README skeletons, .editorconfig). Keep templates idempotent.

Exec Sessions

Persistent terminal windows the AI can keep using, instead of opening a fresh one for every command. Faster, but cap the lifetime so they don't get stale.

Persistent shell/REPL handles the model can attach to. Avoids spawn-per-call overhead. Limit per-session lifetime — an idle session that survived a long compaction may carry stale state.

Personality / Soul Editor

A document describing how your default assistant should sound — its voice, tone, what it cares about. Different from a system prompt because it sticks across all chats.

Long-form persona document applied outside per-conversation system prompts. Edit here for permanent voice; edit the agent's role prompt for task-specific behavior.

Contacts / Contact Directory

An address book the assistant can use — names, groups, tags, and how to reach each person.

People, groups, tags, and reachability metadata used by channels and presence routing. Treat as personal data; back up encrypted, scrub on export.

Obsidian Bridge

Connects to your Obsidian notes vault so the assistant can read (and optionally write) to it.

Vault path, note search, and import. Read-only by default. Enable write only on an isolated vault — never your primary notes.

Knowledge & Memory Settings

Persistent Memory

Decides whether the AI remembers things about you across separate chats and days.

Survival of memory facts across sessions/devices. Enable for personal-assistant flows; disable for task-specialist or audit roles.

Auto-Extract Memory

Lets the AI save things from your conversation as long-term notes by itself. Only turn on if you're OK with what's being said being remembered.

Permits the model to write durable facts during chat. Each write is captured in the Audit Trail. Enable only on chats safe to persist.

Embedding Instance

Which model is in charge of indexing your documents for search. A small, dedicated one is best.

Runtime that produces embeddings for Documents and Wiki search. Use a small, fast embedding model — not the main chat model.

Chunk Size / Overlap

When documents get indexed, they're cut into pieces. This sets the size of those pieces and how much they overlap. Smaller for precise lookups, bigger for context.

Document split parameters. Smaller chunks favor precise retrieval; larger preserve narrative continuity. Re-index after any change.

Top-K

How many document pieces the AI gets to read when you ask it a question. More = thorough but noisy; less = focused but might miss facts.

Number of retrieved chunks per query. Raise when answers miss facts; lower when irrelevant context floods the prompt.

Dreaming Cadence

How often the AI runs in the background to "think about" your notes and journal new ideas. Treat the output like suggestions, not facts.

Idle-synthesis schedule. Output is suggestion-grade, not authoritative memory. Disable on shared or untrusted workstations.

Automation & Workspace Settings

Approval Policy

Decides whether the AI asks for permission before doing things. Pick "Confirm" until you're sure a workflow is safe.

Per-tool / per-agent gating: Auto, Confirm, Block. Default to Confirm for anything touching files, network, or external services; lift to Auto only after a flow has run clean under Confirm.

Standing Orders

Permanent rules the AI must follow, like "never share my address" or "always reply in English". They stick around forever.

Persistent instructions applied globally or per agent. Use for invariants, not transient task details.

Hooks

Auto-run actions on events — like "every time a Slack message arrives, log it" or "if there's an error, send me a notification".

Event-triggered actions (pre-prompt, post-tool, on-channel-message, on-error). Each can inject context, run a skill, log memory, call a webhook, or exec. Audit before enabling on production channels.

Heartbeat Cadence

How often a recurring AI task wakes up to do something. Always set a stop condition so it doesn't run forever.

Recurring agent-turn schedule. Always attach explicit max-iterations — heartbeats without a stop condition burn tokens silently.

Cron Editor

A friendly way to schedule recurring jobs ("every Monday at 9am"). Always check the next-run preview to make sure it's right.

Visual cron-expression builder. Validate the next-fire preview before saving — typo'd cron strings are a leading cause of silent automation failures.

Background Tasks

Long jobs that run in the background. Always wire up a notification so you know when they finish.

Long-running non-interactive jobs. Cap concurrency. Always attach a notification or webhook so completion is observable.