We Built an AI Medical Analyst in a Weekend at the Caltech Longevity Hackathon

Why longevity? Longevity care is longitudinal and data-heavy. Patients accumulate lab panels, imaging, and clinical notes over years. Clinicians and individuals need fast, explainable triage: What’s abnormal? What changed? What should I read next? A generalizable pipeline that works “in a weekend” helps teams experiment faster and validate real-world impact. Longevity care is longitudinal and data-heavy. Patients accumulate lab panels, imaging, and clinical notes over years. Clinicians and individuals need fast, explainable triage: What’s abnormal? What changed? What should I read next? A generalizable pipeline that works “in a weekend” helps teams experiment faster and validate real-world impact. What we shipped at the hackathon A Next.js/React UI with a dead-simple uploader and a results table. Client-side text extraction for PDFs and images to support mixed inputs quickly. A structured LLaMA prompt that returns summary, keywords, categories, abnormal flags, suggested filename, and PubMed titles. Supabase Storage for raw files; Postgres table documents for structured metadata. A Supabase Edge Function to process stored PDFs server-side (useful for background jobs and batch workflows). A Next.js/React UI with a dead-simple uploader and a results table. Client-side text extraction for PDFs and images to support mixed inputs quickly. A structured LLaMA prompt that returns summary, keywords, categories, abnormal flags, suggested filename, and PubMed titles. Supabase Storage for raw files; Postgres table documents for structured metadata. documents A Supabase Edge Function to process stored PDFs server-side (useful for background jobs and batch workflows). Architecture Upload → Extract text → LLM analysis → Persist → Render Two processing paths: Client-led: immediate feedback, great for demos and small files. Server-led (Edge Function): scalable, secure, and good for background/batch processing. Upload → Extract text → LLM analysis → Persist → Render Two processing paths: Client-led: immediate feedback, great for demos and small files. Server-led (Edge Function): scalable, secure, and good for background/batch processing. Client-led: immediate feedback, great for demos and small files. Server-led (Edge Function): scalable, secure, and good for background/batch processing. Client-led: immediate feedback, great for demos and small files. Server-led (Edge Function): scalable, secure, and good for background/batch processing. Key moving parts: Frontend: Next.js + Tailwind OCR/Parsing: pdf-parse, tesseract.js AI: LLaMA chat completions API with a rigid, parse-friendly prompt Backend: Supabase (Storage for blobs, Postgres for metadata) Serverless: Supabase Edge Function for server-side PDF processing Frontend: Next.js + Tailwind OCR/Parsing: pdf-parse, tesseract.js pdf-parse tesseract.js AI: LLaMA chat completions API with a rigid, parse-friendly prompt Backend: Supabase (Storage for blobs, Postgres for metadata) Serverless: Supabase Edge Function for server-side PDF processing Product walk-through Home page Medical Document Analysis Medical Document Analysis Clean landing with a single CTA: upload a document. Clean landing with a single CTA: upload a document. Upload → extract → analyze → persist (client path) Extracts text conditionally based on file type. Calls LLaMA with a structured prompt to ensure predictable parsing. Uploads original file to Supabase Storage and inserts metadata into documents. Extracts text conditionally based on file type. Calls LLaMA with a structured prompt to ensure predictable parsing. Uploads original file to Supabase Storage and inserts metadata into documents. documents const handleFileUpload = useCallback(async (event: React.ChangeEvent ) => { try { setIsUploading(true); setError(null); const file = event.target.files?.[0]; if (!file) return; // Extract text based on file type const text = file.type === 'application/pdf' ? await extractTextFromPDF(file) : await extractTextFromImage(file); // Analyze the text with Llama const analysis = await analyzeWithLlama(text, file.name); // Upload to Supabase Storage const { data: uploadData, error: uploadError } = await supabase.storage .from('medical-documents') .upload(analysis.renamed_file, file); if (uploadError) throw uploadError; // Store metadata in Supabase const { data: metaData, error: metaError } = await supabase .from('documents') .insert({ filename: file.name, renamed_file: analysis.renamed_file, file_url: uploadData.path, summary: analysis.summary, keywords: analysis.keywords, categories: analysis.categories, word_count: countWords(text), report_type: detectReportType(text), threshold_flags: analysis.threshold_flags, pubmed_refs: analysis.pubmed_refs, ai_notes: analysis.ai_notes, status: 'processed', version: 1 }) .select() .single(); if (metaError) throw metaError; setDocuments(prev => [...prev, metaData]); } catch (err) { setError(err instanceof Error ? err.message : 'An error occurred'); } finally { setIsUploading(false); } }, []); const handleFileUpload = useCallback(async (event: React.ChangeEvent ) => { try { setIsUploading(true); setError(null); const file = event.target.files?.[0]; if (!file) return; // Extract text based on file type const text = file.type === 'application/pdf' ? await extractTextFromPDF(file) : await extractTextFromImage(file); // Analyze the text with Llama const analysis = await analyzeWithLlama(text, file.name); // Upload to Supabase Storage const { data: uploadData, error: uploadError } = await supabase.storage .from('medical-documents') .upload(analysis.renamed_file, file); if (uploadError) throw uploadError; // Store metadata in Supabase const { data: metaData, error: metaError } = await supabase .from('documents') .insert({ filename: file.name, renamed_file: analysis.renamed_file, file_url: uploadData.path, summary: analysis.summary, keywords: analysis.keywords, categories: analysis.categories, word_count: countWords(text), report_type: detectReportType(text), threshold_flags: analysis.threshold_flags, pubmed_refs: analysis.pubmed_refs, ai_notes: analysis.ai_notes, status: 'processed', version: 1 }) .select() .single(); if (metaError) throw metaError; setDocuments(prev => [...prev, metaData]); } catch (err) { setError(err instanceof Error ? err.message : 'An error occurred'); } finally { setIsUploading(false); } }, []); The prompt that makes it reliable The LLM is only as useful as its prompt structure. We force a schema, so parsing is straightforward and less brittle than free-form responses. The LLM is only as useful as its prompt structure. We force a schema, so parsing is straightforward and less brittle than free-form responses. const analyzeWithLlama = async (text: string, originalFilename: string) => { const prompt = `Analyze this medical document and provide a detailed analysis in the following format: 1. Summary: Provide a clear, plain-English summary 2. Keywords: Extract key medical terms and their values (if any) 3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")} 4. Filename: Suggest a clear, descriptive filename 5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal" 6. PubMed References: Suggest relevant PubMed articles (just article titles) 7. Additional Notes: Any important medical guidance or observations Document text: ${text} Please format your response exactly as follows: Summary: [summary] Keywords: [key:value pairs] Categories: [categories] Filename: [filename] Flags: [abnormal values] References: [article titles] Notes: [additional guidance]`; const analyzeWithLlama = async (text: string, originalFilename: string) => { const prompt = `Analyze this medical document and provide a detailed analysis in the following format: 1. Summary: Provide a clear, plain-English summary 2. Keywords: Extract key medical terms and their values (if any) 3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")} 4. Filename: Suggest a clear, descriptive filename 5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal" 6. PubMed References: Suggest relevant PubMed articles (just article titles) 7. Additional Notes: Any important medical guidance or observations Document text: ${text} Please format your response exactly as follows: Summary: [summary] Keywords: [key:value pairs] Categories: [categories] Filename: [filename] Flags: [abnormal values] References: [article titles] Notes: [additional guidance]`; Server-side processing (Edge Function) Useful for background jobs, webhook-driven processing, or scaling beyond client limits. Downloads file from Supabase Storage, extracts text, calls the same LLaMA prompt, inserts documents row. Useful for background jobs, webhook-driven processing, or scaling beyond client limits. Downloads file from Supabase Storage, extracts text, calls the same LLaMA prompt, inserts documents row. documents // Prepare LLaMA prompt const prompt = `Analyze this medical document and provide a detailed analysis in the following format: 1. Summary: Provide a clear, plain-English summary 2. Keywords: Extract key medical terms and their values (if any) 3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")} 4. Filename: Suggest a clear, descriptive filename 5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal" 6. PubMed References: Suggest relevant PubMed articles (just article titles) 7. Additional Notes: Any important medical guidance or observations Document text: ${text} Please format your response exactly as follows: Summary: [summary] Keywords: [key:value pairs] Categories: [categories] Filename: [filename] Flags: [abnormal values] References: [article titles] Notes: [additional guidance]` // Prepare LLaMA prompt const prompt = `Analyze this medical document and provide a detailed analysis in the following format: 1. Summary: Provide a clear, plain-English summary 2. Keywords: Extract key medical terms and their values (if any) 3. Categories: Classify into these categories: ${VALID_CATEGORIES.join(", ")} 4. Filename: Suggest a clear, descriptive filename 5. Threshold Flags: Identify any abnormal values and mark as "high", "low", or "normal" 6. PubMed References: Suggest relevant PubMed articles (just article titles) 7. Additional Notes: Any important medical guidance or observations Document text: ${text} Please format your response exactly as follows: Summary: [summary] Keywords: [key:value pairs] Categories: [categories] Filename: [filename] Flags: [abnormal values] References: [article titles] Notes: [additional guidance]` // Insert into Supabase const { data: insertData, error: insertError } = await supabase .from('documents') .insert(documentData) .select() .single() // Insert into Supabase const { data: insertData, error: insertError } = await supabase .from('documents') .insert(documentData) .select() .single() Implementation details Database schema (Supabase Postgres) Use JSONB where the structure can vary or expand over time. -- documents table create table if not exists public.documents ( id bigint generated always as identity primary key, created_at timestamp with time zone default now() not null, user_id uuid null, filename text not null, renamed_file text not null, file_url text not null, summary text not null, keywords jsonb not null default '{}'::jsonb, categories text[] not null default '{}', word_count integer not null, report_type text not null, threshold_flags jsonb not null default '{}'::jsonb, pubmed_refs jsonb not null default '{}'::jsonb, ai_notes text not null default '', status text not null check (status in ('uploaded','processed','failed')), user_notes text null, version integer not null default 1 ); -- Optional: RLS policies for multi-tenant setups alter table public.documents enable row level security; -- Example policies (tune for your auth model) create policy "Allow read to authenticated users" on public.documents for select to authenticated using (true); create policy "Insert own documents" on public.documents for insert to authenticated with check (auth.uid() = user_id); create policy "Update own documents" on public.documents for update to authenticated using (auth.uid() = user_id); -- documents table create table if not exists public.documents ( id bigint generated always as identity primary key, created_at timestamp with time zone default now() not null, user_id uuid null, filename text not null, renamed_file text not null, file_url text not null, summary text not null, keywords jsonb not null default '{}'::jsonb, categories text[] not null default '{}', word_count integer not null, report_type text not null, threshold_flags jsonb not null default '{}'::jsonb, pubmed_refs jsonb not null default '{}'::jsonb, ai_notes text not null default '', status text not null check (status in ('uploaded','processed','failed')), user_notes text null, version integer not null default 1 ); -- Optional: RLS policies for multi-tenant setups alter table public.documents enable row level security; -- Example policies (tune for your auth model) create policy "Allow read to authenticated users" on public.documents for select to authenticated using (true); create policy "Insert own documents" on public.documents for insert to authenticated with check (auth.uid() = user_id); create policy "Update own documents" on public.documents for update to authenticated using (auth.uid() = user_id); Storage bucket Create medical-documents bucket. Lock down access if you’re storing PHI; consider signed URLs and RLS on the storage.objects table. Create medical-documents bucket. medical-documents Lock down access if you’re storing PHI; consider signed URLs and RLS on the storage.objects table. storage.objects Environment configuration Never hardcode secrets client-side. Use environment variables and server-side access. For local dev, rely on .env.local and do not commit it. Never hardcode secrets client-side. Use environment variables and server-side access. For local dev, rely on .env.local and do not commit it. .env.local Example variables to configure: SUPABASE_URL SUPABASE_ANON_KEY (client reads ok if your RLS is correct) SUPABASE_SERVICE_ROLE_KEY (server-only; never expose to browser) LLAMA_API_KEY (server-only) SUPABASE_URL SUPABASE_ANON_KEY (client reads ok if your RLS is correct) SUPABASE_SERVICE_ROLE_KEY (server-only; never expose to browser) LLAMA_API_KEY (server-only) For the Edge Function, set: SUPABASE_URL SUPABASE_ANON_KEY (or service role if needed) LLAMA_API_KEY SUPABASE_URL SUPABASE_URL SUPABASE_ANON_KEY (or service role if needed) SUPABASE_ANON_KEY LLAMA_API_KEY LLAMA_API_KEY Deploying the Edge Function Install the Supabase CLI Link your project Deploy function Install the Supabase CLI Link your project Deploy function supabase functions deploy process-medical-pdf supabase functions list supabase functions serve process-medical-pdf --no-verify-jwt supabase functions deploy process-medical-pdf supabase functions list supabase functions serve process-medical-pdf --no-verify-jwt Wire the function behind an HTTP trigger or call it from your app to process files already stored in the bucket. UX considerations Drag-and-drop uploader with clear accept types. Progress and error state visibility. Terse, readable summaries with expandable details. Badges for categories and flags for abnormalities. Drag-and-drop uploader with clear accept types. Progress and error state visibility. Terse, readable summaries with expandable details. Badges for categories and flags for abnormalities. Reliability strategies Structured prompt → predictable parsing. Keep LLM temperature moderate (0.3–0.7) to reduce variance. Validate parsed JSON fields; default to safe fallbacks. Track version and status to support re-processing and migrations. Structured prompt → predictable parsing. Keep LLM temperature moderate (0.3–0.7) to reduce variance. Validate parsed JSON fields; default to safe fallbacks. Track version and status to support re-processing and migrations. version status Security and compliance Treat all uploads as potentially sensitive (PHI). Don’t expose secrets in the browser. Move LLaMA calls server-side if needed. Consider de-identification or redaction at upload. Encrypt at rest (Supabase handles storage encryption), and use HTTPS for all calls. RLS across documents and signed URLs for downloads. Treat all uploads as potentially sensitive (PHI). Don’t expose secrets in the browser. Move LLaMA calls server-side if needed. Consider de-identification or redaction at upload. Encrypt at rest (Supabase handles storage encryption), and use HTTPS for all calls. RLS across documents and signed URLs for downloads. documents Performance and cost OCR (tesseract.js) can be CPU-heavy; pre-processing images helps (deskew, denoise, contrast). Use server-side processing for large PDFs or batch jobs. Cache repeated LLM calls when re-processing the same file or version. OCR (tesseract.js) can be CPU-heavy; pre-processing images helps (deskew, denoise, contrast). tesseract.js Use server-side processing for large PDFs or batch jobs. Cache repeated LLM calls when re-processing the same file or version. What we’d build next Normalized lab values with medical ontologies (e.g., LOINC) and unit conversions. Trend analysis across time and change detection. Confidence scoring and a reviewer checklist for clinical safety. Human-in-the-loop editing with audit trails. Export to FHIR-compatible bundles. Normalized lab values with medical ontologies (e.g., LOINC) and unit conversions. Trend analysis across time and change detection. Confidence scoring and a reviewer checklist for clinical safety. Human-in-the-loop editing with audit trails. Export to FHIR-compatible bundles. Demo script (5 minutes) Upload a lab report PDF. Show immediate “Processing document…” state. Reveal results: summary, categories, abnormal flags, and PubMed suggestions. Click through the stored file link (signed URL if private). Open Supabase Studio to show the corresponding documents row. Upload a lab report PDF. Show immediate “Processing document…” state. Reveal results: summary, categories, abnormal flags, and PubMed suggestions. Click through the stored file link (signed URL if private). Open Supabase Studio to show the corresponding documents row. documents Credits Built at the Caltech Longevity Hackathon by our team in a sprint focused on turning complex medical paperwork into fast, explainable insights. Built at the Caltech Longevity Hackathon by our team in a sprint focused on turning complex medical paperwork into fast, explainable insights.