Architecture

A hybrid RAG system combining retrieval and generation for grounded conversations

System Overview

MindVault uses a hybrid AI architecture: Azure AI Foundry with GPT-4o-mini handles semantic retrieval and vector search, while Anthropic's Claude Sonnet 4.6 handles response generation. This separation lets each component be optimized independently for cost and performance.

Rather than fine-tuning models, MindVault uses in-context learning: each query retrieves relevant documents from per-bot vector stores, which are assembled into context and passed to Claude for grounded, in-character generation. The Ian Kelley bot also features a real-time 3D talking avatar with ElevenLabs text-to-speech and lip sync.

The Hybrid RAG Flow

User Query

User sends a message in the chat interface

→

Frontend to API

Request routed to FastAPI backend with bot ID and message history

→

Semantic Retrieval

Query embedded and searched in Azure Foundry for relevant documents

→

Context Assembly

Retrieved documents combined with system prompt and conversation history

→

Claude Sonnet 4.6

Context + system prompt sent to Anthropic's Claude for in-character generation

→

Response Delivery

Generated response + source documents returned to frontend

The 26 Bots

Each bot has a unique system prompt, data sources, and personality. The bots span five categories:

Literary (8 bots)

Characters from classic novels: Frankenstein's Creature, Sherlock Holmes, Captain Nemo, Alice, Dracula, The Time Traveller, Dr. Jekyll, The Invisible Man

Philosophers (4 bots)

Historical thinkers: Marcus Aurelius, Sun Tzu, Nietzsche, Machiavelli

Experts (10 bots)

Specialized knowledge: Mythologist, Battlefield Historian, Cryptid Hunter, Ancient Engineer, Plague Doctor, Codebreaker, Alchemist, Cartographer, Dream Interpreter, War Correspondent

NASA (3 bots)

Space exploration: Space Guide, Mission Control, Asteroid Tracker (with live NASA APIs)

Meta (1 bot)

Ian Kelley: AI clone of the builder, with a 3D talking avatar powered by ElevenLabs TTS

Cost Model

MindVault is designed to be cost-efficient with transparent pricing.

Component	Cost per 1000 ops	Monthly (typical)
Claude API (Sonnet 4.6 generation)	~$0.80 per 1000 queries	$5-15
Azure Foundry (embedding + retrieval)	$0.02 per embedding, $0.002 per query	$0.50-2
Frontend hosting (static)	-	$3-5
FastAPI backend (Azure Container Apps)	-	$0.50-3
ElevenLabs TTS (avatar voice)	~$0.30 per 1000 chars	$1-5
Total	-	$9-25

Costs scale linearly with usage. At 100 messages per day average, monthly costs stay under $20. Using GPT-4o-mini for retrieval instead of a larger model keeps vector search costs near zero.

Tech Stack

Frontend

Astro 4 (static SSG)
React islands (ChatPanel)
Tailwind CSS
26 custom bot themes

Backend

FastAPI (Python)
Azure Container Apps
Azure Key Vault (secrets)
Managed identity (RBAC)
Cloudflare edge SSL

Data

Azure Foundry (vector DB)
Semantic search
Project organization
~2.5M tokens indexed

AI

Claude Sonnet 4.6 (generation)
GPT-4o-mini (vector retrieval)
ElevenLabs TTS + lip sync
TalkingHead.js 3D avatar
System prompts per bot

Design Philosophy

MindVault reflects several core design choices:

Transparency: Sources are always shown. Users know where information comes from.
Character Integrity: Each bot stays in character using carefully crafted system prompts.
Grounded Responses: Answers draw from real sources, not hallucinations.
Cost Efficiency: Semantic search + in-context learning vs. fine-tuning.
Scalability: Serverless architecture with managed identity and container orchestration.
Beautiful UX: 26 distinct themes, one for each bot, creating immersive experiences.
Human Touch: A 3D talking avatar with real-time lip sync brings the builder to life.

Built By

MindVault was built by Ian Kelley as a portfolio project demonstrating full-stack AI systems design.

The architecture prioritizes clarity, elegance, and maintainability over complexity.