We Built an AI Medical Analyst in a Weekend at the Caltech Longevity Hackathon

Written by gokulsrinaths | Published 2025/10/14
Tech Story Tags: ai-in-healthcare | supabase | ocr | healthtech | carltech-longevity-hackathon | ai-pipeline-development | supabase-edge-functions | clinical-data-pipeline

TLDRWe built a web app that turns unstructured medical documents (PDFs/images) into actionable insights. Users upload a file, we extract text (pdf-parse/tesseract.js), analyze it with a structured LLaMA prompt, store files and metadata in Supabase (Storage + Postgres), and present summaries, categories, abnormalities, and references in a clean UI. There’s also a Supabase Edge Function for server-side/batch processing. via the TL;DR App

Why longevity?

  • Longevity care is longitudinal and data-heavy. Patients accumulate lab panels, imaging, and clinical notes over years.
  • Clinicians and individuals need fast, explainable triage: What’s abnormal? What changed? What should I read next?
  • A generalizable pipeline that works “in a weekend” helps teams experiment faster and validate real-world impact.

What we shipped at the hackathon

  • A Next.js/React UI with a dead-simple uploader and a results table.
  • Client-side text extraction for PDFs and images to support mixed inputs quickly.
  • A structured LLaMA prompt that returns summary, keywords, categories, abnormal flags, suggested filename, and PubMed titles.
  • Supabase Storage for raw files; Postgres table documents for structured metadata.
  • A Supabase Edge Function to process stored PDFs server-side (useful for background jobs and batch workflows).

Architecture

  • Upload → Extract text → LLM analysis → Persist → Render
  • Two processing paths:
    • Client-led: immediate feedback, great for demos and small files.
    • Server-led (Edge Function): scalable, secure, and good for background/batch processing.

Key moving parts:

  • Frontend: Next.js + Tailwind
  • OCR/Parsing: pdf-parse, tesseract.js
  • AI: LLaMA chat completions API with a rigid, parse-friendly prompt
  • Backend: Supabase (Storage for blobs, Postgres for metadata)
  • Serverless: Supabase Edge Function for server-side PDF processing

Product walk-through

Home page

            <h1 className="text-3xl font-bold leading-tight text-gray-900">
              Medical Document Analysis
            </h1>
          </div>
        </header>
        <main>
          <div className="max-w-7xl mx-auto sm:px-6 lg:px-8">
            <MedicalDocUploader />
  • Clean landing with a single CTA: upload a document.

Upload → extract → analyze → persist (client path)

  • Extracts text conditionally based on file type.
  • Calls LLaMA with a structured prompt to ensure predictable parsing.
  • Uploads original file to Supabase Storage and inserts metadata into documents.
  const handleFileUpload = useCallback(async (event: React.ChangeEvent<HTMLInputElement>) => {
    try {
      setIsUploading(true);
      setError(null);
      
      const file = event.target.files?.[0];
      if (!file) return;

      // Extract text based on file type
      const text = file.type === 'application/pdf' 
        ? await extractTextFromPDF(file)
        : await extractTextFromImage(file);

      // Analyze the text with Llama
      const analysis = await analyzeWithLlama(text, file.name);

      // Upload to Supabase Storage
      const { data: uploadData, error: uploadError } = await supabase.storage
        .from('medical-documents')
        .upload(analysis.renamed_file, file);

      if (uploadError) throw uploadError;

      // Store metadata in Supabase
      const { data: metaData, error: metaError } = await supabase
        .from('documents')
        .insert({
          filename: file.name,
          renamed_file: analysis.renamed_file,
          file_url: uploadData.path,
          summary: analysis.summary,
          keywords: analysis.keywords,
          categories: analysis.categories,
          word_count: countWords(text),
          report_type: detectReportType(text),
          threshold_flags: analysis.threshold_flags,
          pubmed_refs: analysis.pubmed_refs,
          ai_notes: analysis.ai_notes,
          status: 'processed',
          version: 1
        })
        .select()
        .single();

      if (metaError) throw metaError;

      setDocuments(prev => [...prev, metaData]);
    } catch (err) {
      setError(err instanceof Error ? err.message : 'An error occurred');
    } finally {
      setIsUploading(false);
    }
  }, []);

The prompt that makes it reliable

  • The LLM is only as useful as its prompt structure. We force a schema, so parsing is straightforward and less brittle than free-form responses.
  const analyzeWithLlama = async (text: string, originalFilename: string) => {
    const prompt = `Analyze this medical document and provide a detailed analysis in the following format:

1. Summary: Provide a clear, plain-English summary
2. Keywords: Extract key medical terms and their values (if any)
3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")}
4. Filename: Suggest a clear, descriptive filename
5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal"
6. PubMed References: Suggest relevant PubMed articles (just article titles)
7. Additional Notes: Any important medical guidance or observations

Document text:
${text}

Please format your response exactly as follows:
Summary: [summary]
Keywords: [key:value pairs]
Categories: [categories]
Filename: [filename]
Flags: [abnormal values]
References: [article titles]
Notes: [additional guidance]`;

Server-side processing (Edge Function)

  • Useful for background jobs, webhook-driven processing, or scaling beyond client limits.
  • Downloads file from Supabase Storage, extracts text, calls the same LLaMA prompt, inserts documents row.
    // Prepare LLaMA prompt
    const prompt = `Analyze this medical document and provide a detailed analysis in the following format:

1. Summary: Provide a clear, plain-English summary
2. Keywords: Extract key medical terms and their values (if any)
3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")}
4. Filename: Suggest a clear, descriptive filename
5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal"
6. PubMed References: Suggest relevant PubMed articles (just article titles)
7. Additional Notes: Any important medical guidance or observations

Document text:
${text}

Please format your response exactly as follows:
Summary: [summary]
Keywords: [key:value pairs]
Categories: [categories]
Filename: [filename]
Flags: [abnormal values]
References: [article titles]
Notes: [additional guidance]`
    // Insert into Supabase
    const { data: insertData, error: insertError } = await supabase
      .from('documents')
      .insert(documentData)
      .select()
      .single()

Implementation details

Database schema (Supabase Postgres)

Use JSONB where the structure can vary or expand over time.

-- documents table
create table if not exists public.documents (
  id bigint generated always as identity primary key,
  created_at timestamp with time zone default now() not null,
  user_id uuid null,
  filename text not null,
  renamed_file text not null,
  file_url text not null,
  summary text not null,
  keywords jsonb not null default '{}'::jsonb,
  categories text[] not null default '{}',
  word_count integer not null,
  report_type text not null,
  threshold_flags jsonb not null default '{}'::jsonb,
  pubmed_refs jsonb not null default '{}'::jsonb,
  ai_notes text not null default '',
  status text not null check (status in ('uploaded','processed','failed')),
  user_notes text null,
  version integer not null default 1
);

-- Optional: RLS policies for multi-tenant setups
alter table public.documents enable row level security;

-- Example policies (tune for your auth model)
create policy "Allow read to authenticated users"
  on public.documents for select
  to authenticated
  using (true);

create policy "Insert own documents"
  on public.documents for insert
  to authenticated
  with check (auth.uid() = user_id);

create policy "Update own documents"
  on public.documents for update
  to authenticated
  using (auth.uid() = user_id);

Storage bucket

  • Create medical-documents bucket.
  • Lock down access if you’re storing PHI; consider signed URLs and RLS on the storage.objects table.

Environment configuration

  • Never hardcode secrets client-side. Use environment variables and server-side access.
  • For local dev, rely on .env.local and do not commit it.

Example variables to configure:

  • SUPABASE_URL
  • SUPABASE_ANON_KEY (client reads ok if your RLS is correct)
  • SUPABASE_SERVICE_ROLE_KEY (server-only; never expose to browser)
  • LLAMA_API_KEY (server-only)

For the Edge Function, set:

  • SUPABASE_URL
  • SUPABASE_ANON_KEY (or service role if needed)
  • LLAMA_API_KEY

Deploying the Edge Function

  • Install the Supabase CLI
  • Link your project
  • Deploy function
supabase functions deploy process-medical-pdf
supabase functions list
supabase functions serve process-medical-pdf --no-verify-jwt

Wire the function behind an HTTP trigger or call it from your app to process files already stored in the bucket.

UX considerations

  • Drag-and-drop uploader with clear accept types.
  • Progress and error state visibility.
  • Terse, readable summaries with expandable details.
  • Badges for categories and flags for abnormalities.

Reliability strategies

  • Structured prompt → predictable parsing.
  • Keep LLM temperature moderate (0.3–0.7) to reduce variance.
  • Validate parsed JSON fields; default to safe fallbacks.
  • Track version and status to support re-processing and migrations.

Security and compliance

  • Treat all uploads as potentially sensitive (PHI).
  • Don’t expose secrets in the browser. Move LLaMA calls server-side if needed.
  • Consider de-identification or redaction at upload.
  • Encrypt at rest (Supabase handles storage encryption), and use HTTPS for all calls.
  • RLS across documents and signed URLs for downloads.

Performance and cost

  • OCR (tesseract.js) can be CPU-heavy; pre-processing images helps (deskew, denoise, contrast).
  • Use server-side processing for large PDFs or batch jobs.
  • Cache repeated LLM calls when re-processing the same file or version.

What we’d build next

  • Normalized lab values with medical ontologies (e.g., LOINC) and unit conversions.
  • Trend analysis across time and change detection.
  • Confidence scoring and a reviewer checklist for clinical safety.
  • Human-in-the-loop editing with audit trails.
  • Export to FHIR-compatible bundles.

Demo script (5 minutes)

  • Upload a lab report PDF.
  • Show immediate “Processing document…” state.
  • Reveal results: summary, categories, abnormal flags, and PubMed suggestions.
  • Click through the stored file link (signed URL if private).
  • Open Supabase Studio to show the corresponding documents row.

Credits

  • Built at the Caltech Longevity Hackathon by our team in a sprint focused on turning complex medical paperwork into fast, explainable insights.


Written by gokulsrinaths | AI Lead
Published by HackerNoon on 2025/10/14