Why Your AI JSON Always Breaks (And How to Fix It)

You've added AI to your app. The demo works beautifully. Your JSON comes back perfectly structured, and you ship it to production feeling like a genius. Then users start complaining. "The chart is empty." "The chart is empty." "The timeline is all scrambled." "The timeline is all scrambled." "It just says 'TBD' everywhere." "It just says 'TBD' everywhere." Welcome to the reality of AI-generated JSON in production. It breaks. A lot. The research backs this up: LLMs achieve a roughly 82% success rate for JSON generation across diverse tasks. That means nearly 1 in 5 requests returns something your app can't use. I've been building SlideMaker app, an AI presentation generator that creates slides with charts, timelines, funnels, and 30 different content types. After 50,000+ generations running at 500-600 per day, I've learned exactly where AI JSON breaks—and how to catch it before users do. 82% success rate SlideMaker app Here's the validation stack that got reliability above 95%. The Problem is Worse Than You Think JSON Mode ≠ Schema Compliance First, let's clear up a dangerous misconception. When you enable "JSON mode" on GPT-4, Gemini, or Claude, you're getting a guarantee that the output will be syntactically valid JSON. Brackets will match. Quotes will be escaped. It will parse. syntactically valid That's it. That's it. JSON mode does NOT guarantee: Your required fields existValues are the correct typeEnums contain valid optionsThe data makes logical sense Your required fields exist Values are the correct type Enums contain valid options The data makes logical sense This distinction kills production apps. The JSON parses fine, so no error is thrown. But your frontend receives an object missing half the fields it needs, and things silently break. The Four Ways AI JSON Actually Breaks After analyzing thousands of failed generations, I've categorized them into four buckets: 1. Missing Fields 1. Missing Fields You asked for title, body, and image_keywords. You got title and body. No error, just a missing field that crashes your image loader. 2. Wrong Types 2. Wrong Types Your schema expects data: [10, 20, 30]. The model returns data: "10, 20, 30". Valid JSON. Broken chart. data: [10, 20, 30] data: "10, 20, 30". 3. Invalid Enum Values 3. Invalid Enum Values You specified chart_type must be one of: bar, line, pie. The model decides horizontal_bar sounds better. Your chart library doesn't agree. 4. Semantic Nonsense 4. Semantic Nonsense This is the sneaky one. The JSON is structurally perfect. Every field exists. Every type is correct. But the meaning is wrong. meaning Real example from production: A funnel diagram where the values go UP instead of down. { "type": "funnel", "stages": [ {"label": "Visitors", "value": 100}, {"label": "Leads", "value": 250}, {"label": "Customers", "value": 500} ] } { "type": "funnel", "stages": [ {"label": "Visitors", "value": 100}, {"label": "Leads", "value": 250}, {"label": "Customers", "value": 500} ] } Syntactically perfect. Semantically absurd. Funnels go DOWN. That's why they're called funnels. The model knows what a funnel is. It just doesn't always care. Layer 1: Stop Trusting the Model The first rule of AI JSON: validate everything. Don't assume the model followed instructions. Check every field you need actually exists. def validate_required_fields(slide, required_fields): errors = [] for field in required_fields: if field not in slide or not slide[field]: errors.append(f"Missing required field: {field}") return errors def validate_required_fields(slide, required_fields): errors = [] for field in required_fields: if field not in slide or not slide[field]: errors.append(f"Missing required field: {field}") return errors Simple? Yes. Essential? Absolutely. Type-Specific Validation Different output types need different validation rules. A chart needs different fields than a timeline. VALIDATION_RULES = { "chart": { "required_fields": ["title", "chart_type", "chart_data", "body"], "min_data_points": 3 }, "timeline": { "required_fields": ["title", "diagram_data"], "min_events": 4, "max_events": 6 }, "funnel": { "required_fields": ["title", "diagram_data"], "min_stages": 3, "max_stages": 5 } } VALIDATION_RULES = { "chart": { "required_fields": ["title", "chart_type", "chart_data", "body"], "min_data_points": 3 }, "timeline": { "required_fields": ["title", "diagram_data"], "min_events": 4, "max_events": 6 }, "funnel": { "required_fields": ["title", "diagram_data"], "min_stages": 3, "max_stages": 5 } } Then create a validator registry: VALIDATORS = { "chart": ChartValidator, "timeline": TimelineValidator, "funnel": FunnelValidator, "bullet_points": BulletValidator, } def validate_slide(slide): slide_type = slide.get("type", "bullet_points") validator = VALIDATORS.get(slide_type, DefaultValidator)() return validator.validate(slide) VALIDATORS = { "chart": ChartValidator, "timeline": TimelineValidator, "funnel": FunnelValidator, "bullet_points": BulletValidator, } def validate_slide(slide): slide_type = slide.get("type", "bullet_points") validator = VALIDATORS.get(slide_type, DefaultValidator)() return validator.validate(slide) Each validator knows exactly what its type needs. No guessing. Layer 2: Semantic Validation Here's where most validation systems stop—and where most production bugs hide. Fields exist. Types are correct. But is the data correct? data Real Semantic Rules That Catch Real Bugs Content TypeValidation RuleWhy It MattersFunnelValues must decreaseIt's a FUNNELTimelineEvents must be chronologicalIt's a TIMELINEChartLabel count must match data countOr the chart breaksBullet PointsEach bullet needs title + bodyOr it renders empty Content TypeValidation RuleWhy It MattersFunnelValues must decreaseIt's a FUNNELTimelineEvents must be chronologicalIt's a TIMELINEChartLabel count must match data countOr the chart breaksBullet PointsEach bullet needs title + bodyOr it renders empty Content TypeValidation RuleWhy It Matters Content TypeValidation RuleWhy It Matters Content Type Content Type Content Type Content Type Validation Rule Validation Rule Validation Rule Validation Rule Why It Matters Why It Matters Why It Matters Why It Matters FunnelValues must decreaseIt's a FUNNELTimelineEvents must be chronologicalIt's a TIMELINEChartLabel count must match data countOr the chart breaksBullet PointsEach bullet needs title + bodyOr it renders empty FunnelValues must decreaseIt's a FUNNEL Funnel Funnel Values must decrease Values must decrease It's a FUNNEL It's a FUNNEL TimelineEvents must be chronologicalIt's a TIMELINE Timeline Timeline Events must be chronological Events must be chronological It's a TIMELINE It's a TIMELINE ChartLabel count must match data countOr the chart breaks Chart Chart Label count must match data count Label count must match data count Or the chart breaks Or the chart breaks Bullet PointsEach bullet needs title + bodyOr it renders empty Bullet Points Bullet Points Each bullet needs title + body Each bullet needs title + body Or it renders empty Or it renders empty Funnel Validation Example def validate_funnel(slide): stages = slide.get("diagram_data", {}).get("stages", []) # Check minimum stages if len(stages) = values[i-1]: return False, "Funnel values must decrease at each stage" # Check for flat funnels (all same value) if len(set(values)) == 1: return False, "All stages have identical values" return True, None def validate_funnel(slide): stages = slide.get("diagram_data", {}).get("stages", []) # Check minimum stages if len(stages) = values[i-1]: return False, "Funnel values must decrease at each stage" # Check for flat funnels (all same value) if len(set(values)) == 1: return False, "All stages have identical values" return True, None Timeline Validation Example def validate_timeline(slide): events = slide.get("diagram_data", {}).get("events", []) if len(events) < 4: return False, "Timeline needs at least 4 events" # Check chronological order years = [] for event in events: year_str = event.get("year", "") # Extract numeric year (handles "2020", "Q1 2020", "Jan 2020") year = extract_year(year_str) if year: years.append(year) if years != sorted(years): return False, "Timeline events must be in chronological order" return True, None def validate_timeline(slide): events = slide.get("diagram_data", {}).get("events", []) if len(events) < 4: return False, "Timeline needs at least 4 events" # Check chronological order years = [] for event in events: year_str = event.get("year", "") # Extract numeric year (handles "2020", "Q1 2020", "Jan 2020") year = extract_year(year_str) if year: years.append(year) if years != sorted(years): return False, "Timeline events must be in chronological order" return True, None These checks catch the "perfect JSON, broken output" problem that JSON mode completely misses. Layer 3: Catch the Lazy Model Sometimes the model gets lazy. Instead of generating real content, it outputs placeholders. { "title": "Key Benefits", "bullet_points": [ {"title": "Benefit 1", "body": "Details to be added..."}, {"title": "Benefit 2", "body": "TBD"}, {"title": "Benefit 3", "body": "..."} ] } { "title": "Key Benefits", "bullet_points": [ {"title": "Benefit 1", "body": "Details to be added..."}, {"title": "Benefit 2", "body": "TBD"}, {"title": "Benefit 3", "body": "..."} ] } Structurally valid. Completely useless. The Forbidden Content List FORBIDDEN_CONTENT = [ "tbd", "todo", "placeholder", "...", "xxx", "fill in", "to be determined", "coming soon", "insert here", "details about", "information about", "content about", "lorem ipsum" ] def check_forbidden_content(text): text_lower = text.lower() for forbidden in FORBIDDEN_CONTENT: if forbidden in text_lower: return False, f"Contains placeholder content: '{forbidden}'" return True, None FORBIDDEN_CONTENT = [ "tbd", "todo", "placeholder", "...", "xxx", "fill in", "to be determined", "coming soon", "insert here", "details about", "information about", "content about", "lorem ipsum" ] def check_forbidden_content(text): text_lower = text.lower() for forbidden in FORBIDDEN_CONTENT: if forbidden in text_lower: return False, f"Contains placeholder content: '{forbidden}'" return True, None Scan All Text Fields Don't just check the title. Check everything: def validate_content_quality(slide): errors = [] # Check title title = slide.get("title", "") valid, error = check_forbidden_content(title) if not valid: errors.append(f"Title: {error}") # Check body body = slide.get("body", "") if body: valid, error = check_forbidden_content(body) if not valid: errors.append(f"Body: {error}") # Check bullet points for i, bullet in enumerate(slide.get("bullet_points", [])): bullet_body = bullet.get("body", "") valid, error = check_forbidden_content(bullet_body) if not valid: errors.append(f"Bullet {i+1}: {error}") return len(errors) == 0, errors def validate_content_quality(slide): errors = [] # Check title title = slide.get("title", "") valid, error = check_forbidden_content(title) if not valid: errors.append(f"Title: {error}") # Check body body = slide.get("body", "") if body: valid, error = check_forbidden_content(body) if not valid: errors.append(f"Body: {error}") # Check bullet points for i, bullet in enumerate(slide.get("bullet_points", [])): bullet_body = bullet.get("body", "") valid, error = check_forbidden_content(bullet_body) if not valid: errors.append(f"Bullet {i+1}: {error}") return len(errors) == 0, errors Layer 4: The Retry Loop That Actually Works Here's the key insight: retrying with the same prompt gives the same failure. The model made a mistake for a reason. Maybe the schema wasn't clear. Maybe it prioritized brevity over completeness. Whatever the cause, blindly retrying won't fix it. retrying with the same prompt gives the same failure Error Context Injection When validation fails, tell the model exactly what went wrong: def generate_with_retry(prompt, max_retries=2): for attempt in range(max_retries + 1): response = call_llm(prompt) try: data = json.loads(response) is_valid, errors = validate_slide(data) if is_valid: return data # Build retry prompt with specific errors if attempt < max_retries: error_context = "\n".join(f"- {e}" for e in errors) prompt = f""" Previous attempt failed validation: {error_context} Please fix these specific issues and regenerate. Original request: {prompt} """ except json.JSONDecodeError as e: if attempt < max_retries: prompt = f""" Previous response was not valid JSON. Error: {str(e)} Please return ONLY valid JSON with no markdown or extra text. Original request: {prompt} """ # All retries exhausted return None def generate_with_retry(prompt, max_retries=2): for attempt in range(max_retries + 1): response = call_llm(prompt) try: data = json.loads(response) is_valid, errors = validate_slide(data) if is_valid: return data # Build retry prompt with specific errors if attempt < max_retries: error_context = "\n".join(f"- {e}" for e in errors) prompt = f""" Previous attempt failed validation: {error_context} Please fix these specific issues and regenerate. Original request: {prompt} """ except json.JSONDecodeError as e: if attempt < max_retries: prompt = f""" Previous response was not valid JSON. Error: {str(e)} Please return ONLY valid JSON with no markdown or extra text. Original request: {prompt} """ # All retries exhausted return None Why This Works The retry prompt now contains: 1. The specific validation errors 2. A direct instruction to fix those issues 3. The original request for context This transforms a ~85% first-attempt success rate into 95%+ after retry. Set a Retry Budget Don't retry forever: MAX_RETRIES = 2 # Total of 3 attempts def should_retry(attempt, error_type): if attempt >= MAX_RETRIES: return False # Some errors aren't worth retrying if error_type == "rate_limit": return True # Wait and retry if error_type == "context_too_long": return False # Need to reduce input, not retry return True def should_retry(attempt, error_type): if attempt >= MAX_RETRIES: return False # Some errors aren't worth retrying if error_type == "rate_limit": return True # Wait and retry if error_type == "context_too_long": return False # Need to reduce input, not retry return True After exhausting retries, fall back gracefully: • Use a simpler output type • Return a partial result with error indication • Log for manual review The retry mechanism isn't error handling. It's part of the system. Bonus: The Constraint Family Trick There's one more problem that validation alone doesn't solve: repetitive output. repetitive output Ask for a 10-slide presentation and sometimes you get 5 comparison slides. Each one is valid JSON. Each one passes validation. But the presentation is boring and repetitive. Constraint Families Group similar content types into families. Allow only ONE from each family per generation: CONSTRAINT_FAMILIES = { "comparison": ["comparison_split", "before_after", "pros_cons"], "highlight": ["big_number", "stat_highlight", "quote"], "steps": ["numbered_steps", "process_flow"], "data": ["chart", "table"] } def check_family_constraints(slides): used_families = {} for slide in slides: slide_type = slide.get("type") for family, types in CONSTRAINT_FAMILIES.items(): if slide_type in types: if family in used_families: return False, f"Multiple {family} types: {used_families[family]} and {slide_type}" used_families[family] = slide_type return True, None CONSTRAINT_FAMILIES = { "comparison": ["comparison_split", "before_after", "pros_cons"], "highlight": ["big_number", "stat_highlight", "quote"], "steps": ["numbered_steps", "process_flow"], "data": ["chart", "table"] } def check_family_constraints(slides): used_families = {} for slide in slides: slide_type = slide.get("type") for family, types in CONSTRAINT_FAMILIES.items(): if slide_type in types: if family in used_families: return False, f"Multiple {family} types: {used_families[family]} and {slide_type}" used_families[family] = slide_type return True, None This ensures variety without manual intervention. The Complete Validation Stack Here's everything together: def validate_ai_output(slide, verbosity="balanced"): """ Complete validation pipeline for AI-generated JSON. Returns (is_valid, error_list) """ errors = [] # Layer 1: Required fields slide_type = slide.get("type", "unknown") rules = VALIDATION_RULES.get(slide_type, {}) for field in rules.get("required_fields", []): if field not in slide or not slide[field]: errors.append(f"Missing: {field}") # Layer 2: Type-specific semantic validation validator = VALIDATORS.get(slide_type) if validator: valid, semantic_errors = validator().validate(slide) if not valid: errors.extend(semantic_errors) # Layer 3: Content quality (forbidden patterns) valid, quality_errors = validate_content_quality(slide) if not valid: errors.extend(quality_errors) return len(errors) == 0, errors def validate_ai_output(slide, verbosity="balanced"): """ Complete validation pipeline for AI-generated JSON. Returns (is_valid, error_list) """ errors = [] # Layer 1: Required fields slide_type = slide.get("type", "unknown") rules = VALIDATION_RULES.get(slide_type, {}) for field in rules.get("required_fields", []): if field not in slide or not slide[field]: errors.append(f"Missing: {field}") # Layer 2: Type-specific semantic validation validator = VALIDATORS.get(slide_type) if validator: valid, semantic_errors = validator().validate(slide) if not valid: errors.extend(semantic_errors) # Layer 3: Content quality (forbidden patterns) valid, quality_errors = validate_content_quality(slide) if not valid: errors.extend(quality_errors) return len(errors) == 0, errors Use it in your generation def generate_slide(topic, slide_type): prompt = build_prompt(topic, slide_type) result = generate_with_retry(prompt, max_retries=2) if result is None: # Log failure, use fallback log_generation_failure(topic, slide_type) return create_fallback_slide(topic) return result def generate_slide(topic, slide_type): prompt = build_prompt(topic, slide_type) result = generate_with_retry(prompt, max_retries=2) if result is None: # Log failure, use fallback log_generation_failure(topic, slide_type) return create_fallback_slide(topic) return result The Mindset Shift Stop thinking of validation as error handling. It's not. Validation is part of the product. Validation is part of the product. The model is a probabilistic system. It will make mistakes. Your validation layer transforms that probabilistic mess into deterministic, reliable output. JSON mode is step 1. Not the solution. After implementing this stack across 50,000+ generations: • First attempt success: ~85-90% • After retry with error context: 95%+ • User-visible failures: <2% That's the difference between a demo and a product. Quick Reference: Validation Checklist Before you ship AI JSON to production: Required field validation for each output type Type-specific validators with semantic rules Forbidden content scanning Smart retry with error context injection Retry budget (don't loop forever) Graceful fallback when retries exhausted Constraint families for output variety Required field validation for each output type Type-specific validators with semantic rules Forbidden content scanning Smart retry with error context injection Retry budget (don't loop forever) Graceful fallback when retries exhausted Constraint families for output variety Don't trust the model. Verify the model. Building something with AI-generated structured data? I'd love to hear what validation challenges you've hit. Building something with AI-generated structured data? I'd love to hear what validation challenges you've hit.