Architecture

A hybrid RAG system combining retrieval and generation for grounded conversations

System Overview

MindVault uses a hybrid AI architecture: Azure AI Foundry with GPT-4o-mini handles semantic retrieval and vector search, while Anthropic's Claude Sonnet 4.6 handles response generation. This separation lets each component be optimized independently for cost and performance.

Rather than fine-tuning models, MindVault uses in-context learning: each query retrieves relevant documents from per-bot vector stores, which are assembled into context and passed to Claude for grounded, in-character generation. The Ian Kelley bot also features a real-time 3D talking avatar with ElevenLabs text-to-speech and lip sync.

The Hybrid RAG Flow

1

User Query

User sends a message in the chat interface

2

Frontend to API

Request routed to FastAPI backend with bot ID and message history

3

Semantic Retrieval

Query embedded and searched in Azure Foundry for relevant documents

4

Context Assembly

Retrieved documents combined with system prompt and conversation history

5

Claude Sonnet 4.6

Context + system prompt sent to Anthropic's Claude for in-character generation

6

Response Delivery

Generated response + source documents returned to frontend

The 26 Bots

Each bot has a unique system prompt, data sources, and personality. The bots span five categories:

Literary (8 bots)

Characters from classic novels: Frankenstein's Creature, Sherlock Holmes, Captain Nemo, Alice, Dracula, The Time Traveller, Dr. Jekyll, The Invisible Man

Philosophers (4 bots)

Historical thinkers: Marcus Aurelius, Sun Tzu, Nietzsche, Machiavelli

Experts (10 bots)

Specialized knowledge: Mythologist, Battlefield Historian, Cryptid Hunter, Ancient Engineer, Plague Doctor, Codebreaker, Alchemist, Cartographer, Dream Interpreter, War Correspondent

NASA (3 bots)

Space exploration: Space Guide, Mission Control, Asteroid Tracker (with live NASA APIs)

Meta (1 bot)

Ian Kelley: AI clone of the builder, with a 3D talking avatar powered by ElevenLabs TTS

Cost Model

MindVault is designed to be cost-efficient with transparent pricing.

Component Cost per 1000 ops Monthly (typical)
Claude API (Sonnet 4.6 generation) ~$0.80 per 1000 queries $5-15
Azure Foundry (embedding + retrieval) $0.02 per embedding, $0.002 per query $0.50-2
Frontend hosting (static) - $3-5
FastAPI backend (Azure Container Apps) - $0.50-3
ElevenLabs TTS (avatar voice) ~$0.30 per 1000 chars $1-5
Total - $9-25

Costs scale linearly with usage. At 100 messages per day average, monthly costs stay under $20. Using GPT-4o-mini for retrieval instead of a larger model keeps vector search costs near zero.

Tech Stack

Frontend

  • Astro 4 (static SSG)
  • React islands (ChatPanel)
  • Tailwind CSS
  • 26 custom bot themes

Backend

  • FastAPI (Python)
  • Azure Container Apps
  • Azure Key Vault (secrets)
  • Managed identity (RBAC)
  • Cloudflare edge SSL

Data

  • Azure Foundry (vector DB)
  • Semantic search
  • Project organization
  • ~2.5M tokens indexed

AI

  • Claude Sonnet 4.6 (generation)
  • GPT-4o-mini (vector retrieval)
  • ElevenLabs TTS + lip sync
  • TalkingHead.js 3D avatar
  • System prompts per bot

Design Philosophy

MindVault reflects several core design choices:

  • Transparency: Sources are always shown. Users know where information comes from.
  • Character Integrity: Each bot stays in character using carefully crafted system prompts.
  • Grounded Responses: Answers draw from real sources, not hallucinations.
  • Cost Efficiency: Semantic search + in-context learning vs. fine-tuning.
  • Scalability: Serverless architecture with managed identity and container orchestration.
  • Beautiful UX: 26 distinct themes, one for each bot, creating immersive experiences.
  • Human Touch: A 3D talking avatar with real-time lip sync brings the builder to life.

Built By

MindVault was built by Ian Kelley as a portfolio project demonstrating full-stack AI systems design.

The architecture prioritizes clarity, elegance, and maintainability over complexity.