Architecture
A hybrid RAG system combining retrieval and generation for grounded conversations
System Overview
MindVault uses a hybrid AI architecture: Azure AI Foundry with GPT-4o-mini handles semantic retrieval and vector search, while Anthropic's Claude Sonnet 4.6 handles response generation. This separation lets each component be optimized independently for cost and performance.
Rather than fine-tuning models, MindVault uses in-context learning: each query retrieves relevant documents from per-bot vector stores, which are assembled into context and passed to Claude for grounded, in-character generation. The Ian Kelley bot also features a real-time 3D talking avatar with ElevenLabs text-to-speech and lip sync.
The Hybrid RAG Flow
User Query
User sends a message in the chat interface
Frontend to API
Request routed to FastAPI backend with bot ID and message history
Semantic Retrieval
Query embedded and searched in Azure Foundry for relevant documents
Context Assembly
Retrieved documents combined with system prompt and conversation history
Claude Sonnet 4.6
Context + system prompt sent to Anthropic's Claude for in-character generation
Response Delivery
Generated response + source documents returned to frontend
The 26 Bots
Each bot has a unique system prompt, data sources, and personality. The bots span five categories:
Literary (8 bots)
Characters from classic novels: Frankenstein's Creature, Sherlock Holmes, Captain Nemo, Alice, Dracula, The Time Traveller, Dr. Jekyll, The Invisible Man
Philosophers (4 bots)
Historical thinkers: Marcus Aurelius, Sun Tzu, Nietzsche, Machiavelli
Experts (10 bots)
Specialized knowledge: Mythologist, Battlefield Historian, Cryptid Hunter, Ancient Engineer, Plague Doctor, Codebreaker, Alchemist, Cartographer, Dream Interpreter, War Correspondent
NASA (3 bots)
Space exploration: Space Guide, Mission Control, Asteroid Tracker (with live NASA APIs)
Meta (1 bot)
Ian Kelley: AI clone of the builder, with a 3D talking avatar powered by ElevenLabs TTS
Cost Model
MindVault is designed to be cost-efficient with transparent pricing.
| Component | Cost per 1000 ops | Monthly (typical) |
|---|---|---|
| Claude API (Sonnet 4.6 generation) | ~$0.80 per 1000 queries | $5-15 |
| Azure Foundry (embedding + retrieval) | $0.02 per embedding, $0.002 per query | $0.50-2 |
| Frontend hosting (static) | - | $3-5 |
| FastAPI backend (Azure Container Apps) | - | $0.50-3 |
| ElevenLabs TTS (avatar voice) | ~$0.30 per 1000 chars | $1-5 |
| Total | - | $9-25 |
Costs scale linearly with usage. At 100 messages per day average, monthly costs stay under $20. Using GPT-4o-mini for retrieval instead of a larger model keeps vector search costs near zero.
Tech Stack
Frontend
- Astro 4 (static SSG)
- React islands (ChatPanel)
- Tailwind CSS
- 26 custom bot themes
Backend
- FastAPI (Python)
- Azure Container Apps
- Azure Key Vault (secrets)
- Managed identity (RBAC)
- Cloudflare edge SSL
Data
- Azure Foundry (vector DB)
- Semantic search
- Project organization
- ~2.5M tokens indexed
AI
- Claude Sonnet 4.6 (generation)
- GPT-4o-mini (vector retrieval)
- ElevenLabs TTS + lip sync
- TalkingHead.js 3D avatar
- System prompts per bot
Design Philosophy
MindVault reflects several core design choices:
- Transparency: Sources are always shown. Users know where information comes from.
- Character Integrity: Each bot stays in character using carefully crafted system prompts.
- Grounded Responses: Answers draw from real sources, not hallucinations.
- Cost Efficiency: Semantic search + in-context learning vs. fine-tuning.
- Scalability: Serverless architecture with managed identity and container orchestration.
- Beautiful UX: 26 distinct themes, one for each bot, creating immersive experiences.
- Human Touch: A 3D talking avatar with real-time lip sync brings the builder to life.
Built By
MindVault was built by Ian Kelley as a portfolio project demonstrating full-stack AI systems design.
The architecture prioritizes clarity, elegance, and maintainability over complexity.