HealthBridge

An AI-powered symptom triage assistant that uses RAG, tool orchestration, and multi-step reasoning to provide reliable health guidance—built with Next.js 16, Vercel AI SDK, and Google Gemini.

AI EngineeringRAGTool UseStreamingNext.js 16

Try Live Demo View Source

HealthBridge

Hi! I can help you understand your symptoms. What are you experiencing?

I have a bad headache and slight fever for 3 days

Searching knowledge base...

Checking symptom interactions...

Assessing severity...

Based on my assessment, a headache with fever for 3 days warrants a GP visit.

TRIAGE: AMBER

Overview

The Problem

People searching symptoms online get unreliable information from SEO-driven articles, leading to unnecessary anxiety or missed warning signs. There's no middle ground between Googling symptoms and calling a doctor.

The Solution

A multi-step AI agent that searches a curated medical knowledge base, checks symptom interactions, assesses severity with structured rules, and provides a clear triage recommendation — all through a streaming chat interface.

My Role

Sole AI engineer — designed the system prompt, built 3 specialized tools, implemented the RAG pipeline with embeddings and cosine similarity, and deployed the full-stack application.

Key Deliverables

3 AI tools (RAG search, symptom interaction checker, severity assessor), 6 knowledge documents with embedding-based retrieval, a streaming chat UI, and a safety-first triage system (GREEN/AMBER/RED).

Architecture & Design Decisions

Every architectural choice was driven by one question: how do we give reliable health guidance without hallucinating or missing emergencies?

Key Technical Decisions

RAG Over Fine-Tuning

Fine-tuning a model on medical data is expensive, hard to update, and risks hallucination. RAG lets us ground every answer in specific, verifiable knowledge documents that can be updated instantly.

Tools Over Prompting

Instead of stuffing everything into one prompt, each capability (search, interaction check, severity assessment) is a separate tool the AI calls when needed — making the system modular and debuggable.

Streaming First

Health questions create anxiety. Streaming responses token-by-token reduces perceived wait time and shows the user that something is happening — critical for a medical context.

Safety-First Design

Emergency symptoms (chest pain, stroke signs) bypass all tools and immediately return a RED triage. The system errs on the side of caution — better to over-refer than under-refer.

In-Memory Vector Cache

Knowledge documents are embedded once on first request, then cached in memory. This avoids re-calling the embedding API on every search while keeping the architecture simple.

Engineering Principles

Decompose, Don't Delegate

Break complex reasoning into discrete tool calls instead of asking one prompt to do everything

Ground in Data

Every health claim is backed by a specific knowledge document retrieved via embeddings

Fail Safe

When uncertain, escalate severity — a false alarm is better than a missed emergency

Show the Thinking

Log every agent step to the terminal so the multi-step reasoning is observable and debuggable

The AI Agent: How It Thinks

Three specialized tools that the AI calls in sequence, choosing which to use based on the information available — like a doctor running through a checklist.

🔍

searchHealthInfo

RAG-Powered Knowledge Search

searchHealthInfo("tension headache causes and treatment")

Inputs

•query — the symptom or health topic to search

Returns

Top 3 most relevant knowledge chunks ranked by cosine similarity

When Used

Always called first. Grounds the AI's response in verified medical information instead of relying on training data.

⚠️

checkSymptomInteractions

Dangerous Combination Detector

checkSymptomInteractions(["headache", "fever", "stiff neck"])

Inputs

•symptoms — array of symptoms to check against known dangerous combinations

Returns

Matching combinations with severity level (RED/AMBER) and specific medical advice

When Used

Called when the user reports 2+ symptoms. Catches dangerous combinations like headache + fever + stiff neck (possible meningitis).

📊

assessSeverity

Rule-Based Severity Scorer

assessSeverity("headache", 7, 6)

Inputs

•symptom — the main symptom being assessed
•duration_days — how long the user has had it
•pain_level — severity on a 1-10 scale

Returns

Structured assessment with severity (LOW/MODERATE/HIGH), recommendation, and risk flags

When Used

Called after gathering duration and pain information. Uses rule-based logic to consistently categorize severity.

The RAG Pipeline

Retrieval-Augmented Generation grounds the AI in real medical knowledge instead of relying on training data that could be outdated or hallucinated.

headaches.mdfever.mdsore-throat.mdstomach-pain.mddizziness.mdknee-pain.md

Step 1

Load & Chunk

Read all 6 markdown files from the knowledge directory. Split each file by ## headings so each section becomes its own chunk — keeping related information together.

Why: Splitting by headings preserves context better than arbitrary character limits.

Step 2

Embed Documents

Convert each text chunk into a 768-dimensional vector using Google's text-embedding-004 model. These vectors capture the semantic meaning of each chunk.

Why: Embeddings are cached in memory after the first request — no re-embedding needed.

Step 3

Embed Query

When the AI calls searchHealthInfo, the user's query is also converted into a vector using the same embedding model.

Why: Using the same model for documents and queries ensures vectors are in the same space.

Step 4

Cosine Similarity

Compare the query vector against every document chunk vector. Cosine similarity measures the angle between vectors — 1.0 means identical meaning, 0.0 means unrelated.

Why: The custom implementation runs a dot product divided by the product of magnitudes.

Step 5

Return Top 3

Sort all chunks by similarity score and return the 3 most relevant passages. The AI reads these to ground its response in verified information.

Why: Top-3 balances enough context for the AI without overwhelming the prompt.

Multi-Step Agent in Action

Three real scenarios showing how the agent reasons through problems—from simple lookups to emergency bypasses.

Simple Lookup

A single vague symptom — the agent searches the knowledge base and asks follow-up questions instead of guessing.

User: "My head hurts"

Step 1 → searchHealthInfo("headache causes and types")
   📊 Top match: headaches.md (0.87)

Step 2 → Response:
   "I'd like to help you with your headache.
    To give you better guidance, can you tell me:
    • How long have you had it?
    • Where exactly is the pain?
    • Pain level from 1-10?"

Multi-Tool Chaining

Two symptoms with details — the agent chains three tools in sequence, building context at each step before synthesizing a response.

User: "I've had a headache and fever for 5 days,
        pain is about 6/10"

Step 1 → searchHealthInfo("headache with fever")
   📊 Top match: headaches.md (0.84)

Step 2 → checkSymptomInteractions(
           ["headache", "fever"])
   ⚠️  Match: "Possible infection or flu"
   Severity: AMBER

Step 3 → assessSeverity("headache", 5, 6)
   📊 Result: MODERATE
   Flag: longDuration = false

Step 4 → Final Response + [TRIAGE: AMBER]

Emergency Bypass

Emergency symptoms skip all tools entirely — the safety rules in the system prompt override the normal workflow.

User: "I'm having chest pain and can't breathe"

Step 1 → Response (NO tools called):
   "This sounds like it could be a medical
    emergency. Please:
    • Call 999 immediately
    • Do not wait to see if it improves
    • If someone is with you, let them know

    [TRIAGE: RED]"

Development Process

Prompt Engineering

Designed the system prompt with role definition, triage rules, safety guardrails, multi-step workflow instructions, and output formatting. Iterated through multiple versions to balance helpfulness with safety.

Knowledge Base + RAG

Wrote 6 medical knowledge documents, built the chunking pipeline (split by ## headings), implemented embedding with text-embedding-004, and wrote cosine similarity search with in-memory caching.

Tool Orchestration

Built 3 typed tools with Zod schemas (RAG search, symptom interaction checker, severity assessor). Configured multi-step execution with stopWhen: stepCountIs(5) and onStepFinish logging.

Streaming UI + Deploy

Connected the chat UI using useChat from @ai-sdk/react with real-time streaming. Added triage badge rendering, auto-scroll, and deployed to Vercel with environment variable configuration.

Key Technical Features

Five capabilities that make the system reliable, safe, and responsive.

🧠

Multi-Step Reasoning

The AI agent chains up to 3 tool calls per conversation turn, building context at each step before synthesizing a final response with triage recommendation.

📚

Embedding-Based RAG

6 knowledge documents are embedded with Google's text-embedding-004 model. Cosine similarity search retrieves the 3 most relevant chunks for every query.

🛡️

Safety-First Triage

A three-tier system (GREEN/AMBER/RED) with hardcoded emergency bypass rules. Chest pain and stroke symptoms skip all tools and immediately direct to emergency services.

⚡

Real-Time Streaming

Responses stream token-by-token using Vercel AI SDK's streamText, reducing perceived latency and showing users the AI is actively working on their question.

🔒

Typed Tool Contracts

Every tool has a Zod-validated input schema, ensuring the AI can only call tools with correctly structured parameters — no runtime type errors.

Tech Stack & Architecture

Four layers working together—from the AI model to the user interface.

AI / Model Layer

•Google Gemini 2.5 Flash
•Vercel AI SDK v6 (streamText)
•text-embedding-004 for RAG
•Temperature: 0.2 (low creativity)

Tool Layer

•3 typed tools with Zod schemas
•RAG search (embedding + cosine)
•Symptom interaction lookup
•Rule-based severity assessor

Safety Layer

•GREEN / AMBER / RED triage
•Emergency symptom bypass
•No prescriptions or dosages
•Always recommends professional care

Frontend Layer

•Next.js 16 App Router
•useChat from @ai-sdk/react
•Tailwind CSS styling
•Deployed on Vercel

Learnings & Outcomes

What I Learned

RAG is more maintainable than fine-tuning — updating a knowledge doc is instant, retraining a model takes hours
Decomposing AI reasoning into discrete tools makes the system debuggable and each capability independently testable
Safety rules need to live in the system prompt, not just in tools — emergency bypass must happen before any tool is called
Multi-step agents need step limits (stopWhen) to prevent infinite loops and control API costs
Streaming responses are essential for health queries — users need to see the AI is working, not staring at a blank screen
Building the full pipeline from embeddings to UI taught me how each AI concept connects in a production system

Skills Demonstrated

Prompt EngineeringRAGTool UseMulti-Step AgentsEmbeddingsCosine SimilarityStreamingVercel AI SDKNext.js 16Zod

Try HealthBridge

Chat with the AI triage assistant or explore the source code.

Try Live Demo View on GitHub