Ai Keeper Mascot
Native macOS App

Your Local AI
Command Center

Run, manage, and orchestrate AI models on your Mac. Multi-backend inference, OpenAI-compatible API, 20+ built-in tools, and seamless cloud integration — all in one beautiful native app.

4
Inference Backends
13+
Cloud Providers
20+
Built-in Tools
100%
Local & Private

Everything you need to run
AI locally

From model management to production-grade serving, Ai Keeper gives you full control over your AI stack.

💬

Intelligent Chat

Real-time streaming conversations with persistent history, message bookmarking, regeneration, and export. Supports images, audio, file uploads, and multimodal models.

Multimodal
🤖

Agent Mode

Autonomous multi-step task execution with goal decomposition, automatic tool selection, and safety approval gates. Let AI plan and execute complex workflows.

Autonomous
🔌

Multi-Instance Orchestration

Run multiple models simultaneously on different ports. Independent start/stop controls, auto-restart on failure, GPU memory management, and per-instance configuration.

Multi-Model
🌐

OpenAI-Compatible API

Built-in proxy server that exposes all your local models through a standard OpenAI-compatible API. Dynamic routing, model aliasing, rate limiting, and API key auth.

API
🧠

Persistent Memory

Automatic fact extraction from conversations organized by category. Your AI remembers your identity, preferences, projects, and instructions across sessions.

Context-Aware
📚

RAG & Documents

Ingest PDFs, text, and markdown files. Automatic chunking, vector embedding, and similarity search with a built-in SQLite vector store for grounded responses.

Retrieval
📊

Real-Time Monitoring

Live dashboards with GPU memory, CPU usage, token throughput, request latency, and health metrics. Rolling performance charts with anomaly detection.

Live
🚀

Model Downloads

Browse and download models directly from HuggingFace. Filter by format (MLX, GGUF), size, popularity, and recency. One-click download with progress tracking.

HuggingFace
⚖️

Benchmark & Compare

Side-by-side model comparison with identical prompts. Throughput benchmarking, time-to-first-token measurement, and performance metrics with CSV export.

Analysis
📩

Telegram Integration

Bridge your AI to Telegram. Chat with your local models from anywhere, with voice message transcription, file attachments, and full conversation context.

Remote
🔋

MCP & Plugins

Extend with Model Context Protocol servers (stdio & SSE) and a native plugin system. Create tool, preprocessor, and postprocessor plugins with Python.

Extensible
🗣️

Voice & TTS

Speak to your AI with Whisper-powered voice transcription. Hear responses with text-to-speech via system voices, OpenAI API, or custom TTS servers.

Audio

Four engines. One app.

Choose the best backend for your hardware and model. Switch between engines without leaving the app.

vllm-mlx
Full-featured inference with continuous batching, native tool calling parsers, and production-grade serving.
mlx-lm
Lightweight and fast. Broad model support with minimal overhead, perfect for quick local inference.
vmlx
JANG adaptive quantization, image generation, speech-to-text, and text-to-speech in a single runtime.
llama.cpp
Run GGUF models with grammar-constrained output, speculative decoding, and efficient CPU/GPU inference.

20+ built-in tools your
AI can use

Give your models superpowers. From web browsing to code execution, every tool runs locally on your machine.

🔍 Web Search
🌐 Web Fetch
💻 Python Execution
📄 File Operations
📋 Clipboard Read/Write
🎨 Image Generation
👁️ Vision Analysis
📶 HTTP Requests
📃 PDF Reading
🎤 Audio Transcription
🗃️ JSON/CSV/SQLite
📦 Archive Tools
🎬 Video Extraction
🛠️ Git Operations
📖 Log Tailing
🔭 Model Info
🎓 Browser Automation
⏱️ Scheduled Tasks
🔨 File Patching
🔒 MCP Servers

Local-first. Cloud-ready.

Seamlessly switch between local models and 13+ cloud providers. One unified interface, one API.

OpenAI
Anthropic
Google Gemini
Groq
Mistral
DeepSeek
xAI
OpenRouter
Together AI
Perplexity
Fireworks
Cerebras
Custom API

Built for power users

Simultaneous model instances
0ms
No cloud latency for local models
100%
Private — your data never leaves
Native
Built with SwiftUI for macOS
MLX
Apple Silicon optimized
API
OpenAI-compatible endpoint

Every detail, covered

🔬 API Playground

Interactive API testing with templates, custom headers, and instant response inspection.

📖 Request Inspector

Capture and debug every API request with headers, bodies, timing, and error tracking.

📡 Management Server

Remote API for headless operation with HMAC auth, rate limiting, and a built-in web dashboard.

💪 Performance Tuning

Flash attention, KV cache quantization, speculative decoding, LoRA adapters, and tensor parallelism.

🧰 Thinking Models

Extended thinking support for reasoning models with visible chain-of-thought and configurable budgets.

💾 Backup & Restore

Export and import your full configuration, conversations, and memories with one-click backup.

Ai Keeper

Ready to take control
of your AI?

Download Ai Keeper and start running models locally on your Mac today. Free and private.

Download for macOS

Requires macOS 14+ • Apple Silicon