HealthBridge
An AI-powered symptom triage assistant that uses RAG, tool orchestration, and multi-step reasoning to provide reliable health guidance—built with Next.js 16, Vercel AI SDK, and Google Gemini.
Hi! I can help you understand your symptoms. What are you experiencing?
I have a bad headache and slight fever for 3 days
Searching knowledge base...
Checking symptom interactions...
Assessing severity...
Based on my assessment, a headache with fever for 3 days warrants a GP visit.
TRIAGE: AMBEROverview
The Problem
People searching symptoms online get unreliable information from SEO-driven articles, leading to unnecessary anxiety or missed warning signs. There's no middle ground between Googling symptoms and calling a doctor.
The Solution
A multi-step AI agent that searches a curated medical knowledge base, checks symptom interactions, assesses severity with structured rules, and provides a clear triage recommendation — all through a streaming chat interface.
My Role
Sole AI engineer — designed the system prompt, built 3 specialized tools, implemented the RAG pipeline with embeddings and cosine similarity, and deployed the full-stack application.
Key Deliverables
3 AI tools (RAG search, symptom interaction checker, severity assessor), 6 knowledge documents with embedding-based retrieval, a streaming chat UI, and a safety-first triage system (GREEN/AMBER/RED).
Architecture & Design Decisions
Every architectural choice was driven by one question: how do we give reliable health guidance without hallucinating or missing emergencies?
Key Technical Decisions
RAG Over Fine-Tuning
Fine-tuning a model on medical data is expensive, hard to update, and risks hallucination. RAG lets us ground every answer in specific, verifiable knowledge documents that can be updated instantly.
Tools Over Prompting
Instead of stuffing everything into one prompt, each capability (search, interaction check, severity assessment) is a separate tool the AI calls when needed — making the system modular and debuggable.
Streaming First
Health questions create anxiety. Streaming responses token-by-token reduces perceived wait time and shows the user that something is happening — critical for a medical context.
Safety-First Design
Emergency symptoms (chest pain, stroke signs) bypass all tools and immediately return a RED triage. The system errs on the side of caution — better to over-refer than under-refer.
In-Memory Vector Cache
Knowledge documents are embedded once on first request, then cached in memory. This avoids re-calling the embedding API on every search while keeping the architecture simple.
Engineering Principles
Decompose, Don't Delegate
Break complex reasoning into discrete tool calls instead of asking one prompt to do everything
Ground in Data
Every health claim is backed by a specific knowledge document retrieved via embeddings
Fail Safe
When uncertain, escalate severity — a false alarm is better than a missed emergency
Show the Thinking
Log every agent step to the terminal so the multi-step reasoning is observable and debuggable
The AI Agent: How It Thinks
Three specialized tools that the AI calls in sequence, choosing which to use based on the information available — like a doctor running through a checklist.
searchHealthInfo
RAG-Powered Knowledge Search
searchHealthInfo("tension headache causes and treatment")
Inputs
- •query — the symptom or health topic to search
Returns
Top 3 most relevant knowledge chunks ranked by cosine similarity
When Used
Always called first. Grounds the AI's response in verified medical information instead of relying on training data.
checkSymptomInteractions
Dangerous Combination Detector
checkSymptomInteractions(["headache", "fever", "stiff neck"])
Inputs
- •symptoms — array of symptoms to check against known dangerous combinations
Returns
Matching combinations with severity level (RED/AMBER) and specific medical advice
When Used
Called when the user reports 2+ symptoms. Catches dangerous combinations like headache + fever + stiff neck (possible meningitis).
assessSeverity
Rule-Based Severity Scorer
assessSeverity("headache", 7, 6)
Inputs
- •symptom — the main symptom being assessed
- •duration_days — how long the user has had it
- •pain_level — severity on a 1-10 scale
Returns
Structured assessment with severity (LOW/MODERATE/HIGH), recommendation, and risk flags
When Used
Called after gathering duration and pain information. Uses rule-based logic to consistently categorize severity.
The RAG Pipeline
Retrieval-Augmented Generation grounds the AI in real medical knowledge instead of relying on training data that could be outdated or hallucinated.
Load & Chunk
Read all 6 markdown files from the knowledge directory. Split each file by ## headings so each section becomes its own chunk — keeping related information together.
Why: Splitting by headings preserves context better than arbitrary character limits.
Embed Documents
Convert each text chunk into a 768-dimensional vector using Google's text-embedding-004 model. These vectors capture the semantic meaning of each chunk.
Why: Embeddings are cached in memory after the first request — no re-embedding needed.
Embed Query
When the AI calls searchHealthInfo, the user's query is also converted into a vector using the same embedding model.
Why: Using the same model for documents and queries ensures vectors are in the same space.
Cosine Similarity
Compare the query vector against every document chunk vector. Cosine similarity measures the angle between vectors — 1.0 means identical meaning, 0.0 means unrelated.
Why: The custom implementation runs a dot product divided by the product of magnitudes.
Return Top 3
Sort all chunks by similarity score and return the 3 most relevant passages. The AI reads these to ground its response in verified information.
Why: Top-3 balances enough context for the AI without overwhelming the prompt.
Multi-Step Agent in Action
Three real scenarios showing how the agent reasons through problems—from simple lookups to emergency bypasses.
Simple Lookup
A single vague symptom — the agent searches the knowledge base and asks follow-up questions instead of guessing.
User: "My head hurts"
Step 1 → searchHealthInfo("headache causes and types")
📊 Top match: headaches.md (0.87)
Step 2 → Response:
"I'd like to help you with your headache.
To give you better guidance, can you tell me:
• How long have you had it?
• Where exactly is the pain?
• Pain level from 1-10?"Multi-Tool Chaining
Two symptoms with details — the agent chains three tools in sequence, building context at each step before synthesizing a response.
User: "I've had a headache and fever for 5 days,
pain is about 6/10"
Step 1 → searchHealthInfo("headache with fever")
📊 Top match: headaches.md (0.84)
Step 2 → checkSymptomInteractions(
["headache", "fever"])
⚠️ Match: "Possible infection or flu"
Severity: AMBER
Step 3 → assessSeverity("headache", 5, 6)
📊 Result: MODERATE
Flag: longDuration = false
Step 4 → Final Response + [TRIAGE: AMBER]Emergency Bypass
Emergency symptoms skip all tools entirely — the safety rules in the system prompt override the normal workflow.
User: "I'm having chest pain and can't breathe"
Step 1 → Response (NO tools called):
"This sounds like it could be a medical
emergency. Please:
• Call 999 immediately
• Do not wait to see if it improves
• If someone is with you, let them know
[TRIAGE: RED]"Development Process
Prompt Engineering
Designed the system prompt with role definition, triage rules, safety guardrails, multi-step workflow instructions, and output formatting. Iterated through multiple versions to balance helpfulness with safety.
Knowledge Base + RAG
Wrote 6 medical knowledge documents, built the chunking pipeline (split by ## headings), implemented embedding with text-embedding-004, and wrote cosine similarity search with in-memory caching.
Tool Orchestration
Built 3 typed tools with Zod schemas (RAG search, symptom interaction checker, severity assessor). Configured multi-step execution with stopWhen: stepCountIs(5) and onStepFinish logging.
Streaming UI + Deploy
Connected the chat UI using useChat from @ai-sdk/react with real-time streaming. Added triage badge rendering, auto-scroll, and deployed to Vercel with environment variable configuration.
Key Technical Features
Five capabilities that make the system reliable, safe, and responsive.
Multi-Step Reasoning
The AI agent chains up to 3 tool calls per conversation turn, building context at each step before synthesizing a final response with triage recommendation.
Embedding-Based RAG
6 knowledge documents are embedded with Google's text-embedding-004 model. Cosine similarity search retrieves the 3 most relevant chunks for every query.
Safety-First Triage
A three-tier system (GREEN/AMBER/RED) with hardcoded emergency bypass rules. Chest pain and stroke symptoms skip all tools and immediately direct to emergency services.
Real-Time Streaming
Responses stream token-by-token using Vercel AI SDK's streamText, reducing perceived latency and showing users the AI is actively working on their question.
Typed Tool Contracts
Every tool has a Zod-validated input schema, ensuring the AI can only call tools with correctly structured parameters — no runtime type errors.
Tech Stack & Architecture
Four layers working together—from the AI model to the user interface.
AI / Model Layer
- •Google Gemini 2.5 Flash
- •Vercel AI SDK v6 (streamText)
- •text-embedding-004 for RAG
- •Temperature: 0.2 (low creativity)
Tool Layer
- •3 typed tools with Zod schemas
- •RAG search (embedding + cosine)
- •Symptom interaction lookup
- •Rule-based severity assessor
Safety Layer
- •GREEN / AMBER / RED triage
- •Emergency symptom bypass
- •No prescriptions or dosages
- •Always recommends professional care
Frontend Layer
- •Next.js 16 App Router
- •useChat from @ai-sdk/react
- •Tailwind CSS styling
- •Deployed on Vercel
Learnings & Outcomes
What I Learned
- RAG is more maintainable than fine-tuning — updating a knowledge doc is instant, retraining a model takes hours
- Decomposing AI reasoning into discrete tools makes the system debuggable and each capability independently testable
- Safety rules need to live in the system prompt, not just in tools — emergency bypass must happen before any tool is called
- Multi-step agents need step limits (stopWhen) to prevent infinite loops and control API costs
- Streaming responses are essential for health queries — users need to see the AI is working, not staring at a blank screen
- Building the full pipeline from embeddings to UI taught me how each AI concept connects in a production system
Skills Demonstrated
Try HealthBridge
Chat with the AI triage assistant or explore the source code.