If you’re using large language models in real products, “the model gave a sensible answer” is not enough.
What you actually need is:
The model gave a sensible answer in a strict JSON structure that my code can
json.loads()without exploding.
This article walks through a practical framework for turning messy, natural-language LLM outputs into machine-friendly structured JSON, using only prompt design. We’ll cover:
- Why JSON is the natural “bridge format” between LLMs and your backend
- A 4-step prompt pattern for stable JSON output
- Common failure modes (extra text, broken syntax, wrong types…) and how to fix them
- Three real-world prompt templates (e-commerce, customer support, project management)
1. Why JSON? Moving from “human-readable” to “machine-readable”
By default, LLMs talk like people: paragraphs, bullet points, and the occasional emoji.
Example request:
“Compare three popular laptops and give me the core specs.”
A typical answer might be:
- Laptop A: 16-inch display, 2.5K resolution, Intel i7, about £1,300
- Laptop B: 14-inch, 2.2K, AMD R7, around £1,000
- Laptop C: 13.6-inch, Retina-style display, Apple M-series, ~£1,500
Nice for humans. Awful for code.
If you want to:
- Plot prices in a chart
- Filter out all non-touchscreen models
- Load specs into a database
…you’re forced to regex your way through free text. Any tiny format change breaks your parsing.
JSON fixes this in three ways
1. Syntax is strict, parsing is deterministic
- Keys are quoted.
- Arrays use
[], objects use{}. - Every mainstream language has a stable JSON library (
jsonin Python,JSON.parsein JS, etc.).
If the output is valid JSON, parsing is a solved problem.
2. Types are explicit
- Strings, numbers, booleans, arrays, objects.
- You can enforce logic like “
price_gbpmust be a number, not\"£1,299\"”.
3. Nested structure matches real data
Think: user → order list → line items. JSON handles this naturally:
{
"user": {
"name": "Alice",
"orders": [
{ "product": "Laptop", "price_gbp": 1299 },
{ "product": "Monitor", "price_gbp": 199 }
]
}
}
Example: natural language vs JSON
Free-text output:
“We compared 3 laptops. First, a 16" Lenovo with 2.5K display… second, a 14" HP… third, a 13.6" MacBook Air… prices roughly £1,000–£1,500…”
JSON output:
{
"laptop_analysis": {
"analysis_date": "2025-01-01",
"total_count": 3,
"laptops": [
{
"brand": "Lenovo",
"model": "Slim 7",
"screen": {
"size_inch": 16,
"resolution": "2.5K",
"touch_support": false
},
"processor": "Intel i7",
"price_gbp": 1299
},
{
"brand": "HP",
"model": "Envy 14",
"screen": {
"size_inch": 14,
"resolution": "2.2K",
"touch_support": true
},
"processor": "AMD Ryzen 7",
"price_gbp": 1049
},
{
"brand": "Apple",
"model": "MacBook Air M2",
"screen": {
"size_inch": 13.6,
"resolution": "Retina-class",
"touch_support": false
},
"processor": "Apple M2",
"price_gbp": 1249
}
]
}
}
Now your pipeline can do:
data = json.loads(output)
for laptop in data["laptop_analysis"]["laptops"]:
...
No brittle parsing. No surprises.
2. A 4-step pattern for “forced JSON” prompts
Getting an LLM to output proper JSON isn’t magic. A robust prompt usually has four ingredients:
- Format instructions – “Only output JSON, nothing else.”
- A concrete JSON template – the exact keys and structure you expect.
- Validation rules – type constraints, required fields, allowed values.
- Few-shot examples – one or two “here’s the input, here’s the JSON” samples.
Let’s go through them.
Step 1 – Hard-lock the output format
You must explicitly fight the model’s “chatty” instinct.
Bad instruction:
“Please use JSON format, you can also add explanations.”
You will absolutely get:
Here is your analysis:
{
...
}
Hope this helps!
Your parser will absolutely die.
Use strict wording instead:
You MUST return ONLY valid JSON.
- Do NOT include any explanations, comments, or extra text.
- The output must be a single JSON object.
- If you include any non-JSON content, the result is invalid.
You can go even stricter by wrapping it:
【HARD REQUIREMENT】
Return output wrapped between the markers ---BEGIN JSON--- and ---END JSON---.
Outside these markers there must be NOTHING (no text, no spaces, no newlines).
Example:
---BEGIN JSON---
{"key": "value"}
---END JSON---
Then your code can safely extract the block between those markers before parsing.
Step 2 – Provide a JSON “fill-in-the-blanks” template
Don’t leave structure to the model’s imagination. Tell it exactly what object you want.
Example: extracting news metadata.
{
"news_extraction": {
"article_title": "", // string, full headline
"publish_time": "", // string, "YYYY-MM-DD HH:MM", or null
"source": "", // string, e.g. "BBC News"
"author": "", // string or null
"key_points": [], // array of 3–5 strings, each ≤ 50 chars
"category": "", // one of: "Politics", "Business", "Tech", "Entertainment", "Sport"
"word_count": 0 // integer, total word count
}
}
Template design tips:
- Prefer English snake_case keys:
product_name,price_gbp,word_count. - Use inline comments to mark types and constraints.
- Explicitly say how to handle optional fields:
nullinstead of empty string. - For arrays, describe the item type:
tags: [] // array of strings, e.g. ["budget", "lightweight"].
This turns the model’s job into “fill in a form”, not “invent whatever feels right”.
Step 3 – Add lightweight validation rules
The template defines shape. Validation rules define what’s legal inside that shape.
Examples you can include in the prompt:
-
Type rules
word_countmust be an integer like850, not"850"or"850 words". Boolean fields must betrueorfalse, not"yes"/"no". -
Escaping rules
If a string contains
"or a newline, escape it:He said "hello"→He said \"hello\" -
Array size rules
key_pointsmust contain 3–5 items, never 0 and never more than 5. -
Logic rules
categorymust be one of the listed values, you may not invent a new one. Ifpublish_timeis unavailable, usenull, not “yesterday”.
You don’t need a full JSON Schema in the prompt, but a few clear bullets like this reduce errors dramatically.
Step 4 – Use one or two few-shot examples
Models learn fast by imitation. Give them a mini “input → JSON” pair that matches your task.
Example: news extraction.
Prompt snippet:
Example input article:
"[Tech] UK startup launches home battery to cut energy bills
Source: The Guardian Author: Jane Smith Published: 2024-12-30 10:00
A London-based climate tech startup has launched a compact home battery
designed to help households store cheap off-peak electricity and reduce
their energy bills..."
Example JSON output:
{
"news_extraction": {
"article_title": "UK startup launches home battery to cut energy bills",
"publish_time": "2024-12-30 10:00",
"source": "The Guardian",
"author": "Jane Smith",
"key_points": [
"London climate tech startup releases compact home battery",
"Product lets households store off-peak electricity and lower bills",
"Targets UK homeowners looking to reduce reliance on the grid"
],
"category": "Tech",
"word_count": 850
}
}
Then you append your real article and say:
“Now extract data for the following article. Remember: only output JSON in the same format as the example.”
This single example often bumps JSON correctness from “coin flip” to “production-ready”.
3. Debugging JSON output: 5 common failure modes
Even with good prompts, you’ll still see issues. Here’s what usually goes wrong and how to fix it.
Problem 1 – Extra natural language before/after JSON
“Here is your result: { … } Hope this helps!”
Why it happens: chatty default behaviour; format instruction too soft.
How to fix:
- Repeat a hard requirement at the end of the prompt.
- Use explicit markers (
---BEGIN JSON---/---END JSON---) as shown earlier. - Make sure your few-shot examples contain only JSON, no explanation.
Problem 2 – Broken JSON syntax
Examples:
- Keys without quotes
- Single quotes instead of double quotes
- Trailing commas
- Missing closing braces
Fixes:
-
Add a “JSON hygiene” reminder:
JSON syntax rules: - All keys MUST be in double quotes. - Use double quotes for strings, never single quotes. - No trailing commas after the last element in an object or array. - All { [ must have matching } ]. -
For very long/complex structures, generate in steps:
- Step 1: output only the top-level structure.
- Step 2: fill a particular nested array.
- Step 3: add the rest.
-
Add a retry loop in your code:
-
Try
json.loads(). -
If it fails, send the error message back to the model:
“Your previous JSON failed to parse with
JSONDecodeError: …. Please correct the JSON and output a fixed version. Do not change the structure.”
-
Problem 3 – Wrong data types
Examples:
"price_gbp": "1299.0"instead of1299.0"in_stock": "yes"instead oftrue"word_count": "850 words"
Fixes:
-
Be blunt in the template comments:
"price_gbp": 0.0 // number ONLY, like 1299.0, no currency symbol "word_count": 0 // integer ONLY, like 850, no text "in_stock": false // boolean, must be true or false -
Include bad vs good examples in the prompt:
Wrong: "word_count": "850 words" Correct: "word_count": 850 Wrong: "touch_support": "yes" Correct: "touch_support": true -
In your backend, add lightweight type coercion where safe (e.g.
"1299"→1299.0), but still log violations.
Problem 4 – Missing or extra fields
Examples:
authoromitted even though it existed- An unexpected
summaryfield appears
Fixes:
-
Spell out required vs forbidden fields:
The JSON MUST include exactly these fields: article_title, publish_time, source, author, key_points, category, word_count. Do NOT add any new fields such as summary, description, tags, etc. -
Add a checklist at the end of the instructions:
Before returning your final JSON, check:
- Are all 7 fields present?
- Are there any extra fields not in the template? Remove them.
- Are all non-optional fields non-null?
Problem 5 – Messy nested structures
This is where things like arrays of objects containing arrays go sideways.
Fixes:
-
Break down nested templates:
"laptops" is an array. Each element is an object with: { "brand": "", "model": "", "screen": { "size_inch": 0, "resolution": "", "touch_support": false }, "processor": "", "price_gbp": 0 } -
Use a dedicated example focused just on one nested element.
-
Or ask the model to generate one laptop object first, validate it, then scale to an array.
4. Three ready-to-use JSON prompt templates
Here are three complete patterns you can lift straight into your own system.
Scenario 1 – E-commerce product extraction (for database import)
Goal: From a UK shop’s product description, extract key fields like product ID, category, specs, price, stock, etc.
Prompt core:
Task: Extract key product data from the following product description and return JSON only.
### Output requirements
1. Output MUST be valid JSON, no extra text.
2. Use this template exactly (do not rename keys):
{
"product_info": {
"product_id": "", // string, e.g. "P20250201001"
"product_name": "", // full name, not abbreviated
"category": "", // one of: "Laptop", "Phone", "Appliance", "Clothing", "Food"
"specifications": [], // 2–3 core specs as strings
"price_gbp": 0.0, // number, price in GBP, e.g. 999.0
"stock": 0, // integer, units in stock
"free_shipping": false, // boolean, true if free delivery in mainland UK
"sales_count": 0 // integer, total units sold (0 if not mentioned)
}
}
3. Rules:
- No "£" symbol in price_gbp, number only.
- If no product_id mentioned, use "unknown".
- If no sales info, use 0 for sales_count.
### Product text:
"..."
Example model output:
{
"product_info": {
"product_id": "P20250201005",
"product_name": "Dell XPS 13 Plus 13.4" Laptop",
"category": "Laptop",
"specifications": [
"Colour: Platinum",
"Memory: 16GB RAM, 512GB SSD",
"Display: 13.4" OLED, 120Hz"
],
"price_gbp": 1499.0,
"stock": 42,
"free_shipping": true,
"sales_count": 850
}
}
In Python, it’s just:
import json
data = json.loads(model_output)
price = data["product_info"]["price_gbp"]
stock = data["product_info"]["stock"]
And you’re ready to insert into a DB.
Scenario 2 – Customer feedback sentiment (for ticket routing)
Goal: Take free-text customer feedback and turn it into structured analysis for your support system.
Template:
{
"feedback_analysis": {
"feedback_id": "", // string, you can generate like "F20250201093001"
"sentiment": "", // "Positive" | "Negative" | "Neutral"
"core_demand": "", // 10–30 chars summary of what the customer wants
"issue_type": "", // "Delivery" | "Quality" | "After-sales" | "Enquiry"
"urgency_level": 0, // 1 = low, 2 = medium, 3 = high
"keywords": [] // 3–4 noun keywords, e.g. ["laptop", "screen crack"]
}
}
Rule of thumb for urgency:
- Product unusable (“won’t turn on”, “payment blocked”) →
3 - Delays and inconvenience (“parcel 1 day late”) →
2 - Simple questions (“how do I…?”) →
1
Example output:
{
"feedback_analysis": {
"feedback_id": "F20250201093001",
"sentiment": "Negative",
"core_demand": "Request replacement or refund for dead-on-arrival laptop",
"issue_type": "Quality",
"urgency_level": 3,
"keywords": ["laptop", "won't turn on", "replacement", "refund"]
}
}
Your ticketing system can now:
- Route all "Quality" issues with
urgency_level = 3to a priority queue. - Show agents a one-line
core_demandinstead of a wall of text.
Scenario 3 – Project task breakdown (for Jira/Trello import)
Goal: Turn a “website redesign” paragraph into a structured task list.
Template:
{
"project": "Website Redesign",
"tasks": [
{
"task_id": "T001", // T + 3 digits
"task_name": "", // 10–20 chars, clear action
"owner": "", // "Product Manager" | "Designer" | "Frontend" | "Backend" | "QA"
"due_date": "", // "YYYY-MM-DD", assume project start 2025-02-01
"priority": "", // "High" | "Medium" | "Low"
"dependencies": [] // e.g. ["T001"], [] if none
}
],
"total_tasks": 0 // number of items in tasks[]
}
Rules:
- Cover the full flow: requirements → design → build → test → release.
- Make dependency chains realistic (frontend depends on design, etc.).
- Dates must logically lead up to the stated launch date.
Example output (shortened):
{
"project": "Website Redesign",
"tasks": [
{
"task_id": "T001",
"task_name": "Gather detailed redesign requirements",
"owner": "Product Manager",
"due_date": "2025-02-03",
"priority": "High",
"dependencies": []
},
{
"task_id": "T002",
"task_name": "Design new homepage and listing UI",
"owner": "Designer",
"due_date": "2025-02-08",
"priority": "High",
"dependencies": ["T001"]
},
{
"task_id": "T003",
"task_name": "Implement login and registration backend",
"owner": "Backend",
"due_date": "2025-02-13",
"priority": "High",
"dependencies": ["T001"]
}
],
"total_tasks": 3
}
You can then POST tasks into Jira/Trello with their APIs and auto-create all tickets.
5. From “stable JSON” to “production-ready pipelines”
To recap:
-
Why JSON? It’s the natural contract between LLMs and code: deterministic parsing, clear types, nested structures.
-
How to get it reliably? Use the 4-step pattern:
- Hard format instructions
- A strict JSON template
- Light validation rules
- One or two good few-shot examples
-
How to ship it? Combine prompt-side constraints with backend safeguards:
- Retry on
JSONDecodeErrorwith error feedback to the model. - Optional type coercion (e.g.
"1299"→1299.0) with logging. - JSON Schema validation for high-stakes use cases (finance, healthcare).
- Retry on
Once you can reliably get structured JSON out of an LLM, you move from:
“The AI wrote something interesting.”
to:
“The AI is now a machine in the pipeline: it reads text, outputs structured data, and my system just works.”
That’s the real unlock.
