JSON & YAML
The Universal Languages Machines Speak
Every API, every config file, every AI model response uses data formats. Master JSON and YAML - the two most important formats in AI automation - and never struggle with data parsing again.
What Are Data Formats & Why They Matter
Data Formats = Languages for Machines to Talk
The Restaurant Menu Analogy
When you order food on Swiggy, the app sends your order to the restaurant in a specific format - not in Hindi, not in English, but in a structured format that both Swiggy's servers and the restaurant's system understand. That structured format is like JSON.
Data formats are standardized ways to organize information so that different systems, APIs, and programs can understand each other. Without them, it would be like trying to order food by sending a painting - nobody would understand!
The Big Three Data Formats:
- JSON (JavaScript Object Notation) - The king of APIs. Every LLM API (OpenAI, Claude, Gemini) uses JSON. Lightweight, human-readable, universally supported.
- YAML (YAML Ain't Markup Language) - The king of configuration. Docker Compose, Kubernetes, GitHub Actions, LangChain configs - all use YAML. Even more readable than JSON.
- XML - The old king. Still used in some enterprise systems but largely replaced by JSON. You'll rarely use it in AI work.
Why AI Engineers Must Know This:
- Every LLM API request and response is JSON
- Prompt templates often use YAML configs
- RAG pipeline configurations are in YAML
- Function calling / tool use schemas are JSON
- Docker, CI/CD, and deployment configs are YAML
Note: If APIs are the roads of the internet, JSON is the language spoken on those roads. You cannot build AI automation without being fluent in JSON.
JSON - Deep Dive
JSON: The Language Every API Speaks
JSON Structure - Just 6 Data Types:
{
"string": "Hello World",
"number": 42,
"boolean": true,
"null_value": null,
"array": [1, 2, 3, "mixed types allowed"],
"object": {
"nested": "objects work too",
"deep": { "as": { "you": "want" } }
}
}
That's it! These 6 types (string, number, boolean, null, array, object) can represent ANY data structure. LLM responses, user profiles, API configs - everything.
Real Example - OpenAI API Request:
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is RAG?"}
],
"temperature": 0.7,
"max_tokens": 500
}
JSON Rules to Remember:
- Keys MUST be in double quotes (not single quotes)
- No trailing commas allowed
- No comments allowed (unlike YAML)
- Strings must use double quotes
- Numbers can be integers or floating point
Note: Pro tip: Use json.dumps(data, indent=2) in Python to pretty-print JSON. For debugging API responses, this is invaluable. Also json.loads() to parse JSON strings back into Python dicts.
YAML - The Human-Friendly Format
YAML: When Readability Matters Most
YAML vs JSON - Same Data, Different Look:
# YAML - clean, readable, supports comments!
model: gpt-4
messages:
- role: system
content: You are a helpful assistant.
- role: user
content: What is RAG?
temperature: 0.7
max_tokens: 500
Compare this to the JSON version above - same data but YAML is much cleaner. No curly braces, no quotes around keys, supports comments with #.
YAML Key Features:
- Indentation-based - Uses spaces (NOT tabs!) for nesting. 2 spaces is standard.
- Comments - Use # for comments. JSON doesn't support this!
- Multi-line strings - Use | for literal blocks or > for folded blocks
- Anchors & Aliases - Reuse values with & and * (DRY principle)
Where You'll See YAML in AI:
- docker-compose.yml - Container orchestration
- GitHub Actions - CI/CD workflows (.github/workflows/)
- Kubernetes - Deployment manifests
- LangChain/LlamaIndex - Pipeline configurations
- Prompt templates - Many frameworks use YAML for prompt configs
- Haystack - RAG pipeline definitions
Note: YAML's biggest gotcha: indentation MUST use spaces, never tabs. And it's sensitive to spacing. A single wrong indent can break your entire config. Use a YAML linter!
Working with JSON & YAML in Python
Practical Python Code
JSON in Python:
import json
# Python dict to JSON string
data = {"model": "gpt-4", "temperature": 0.7}
json_string = json.dumps(data, indent=2) # Pretty print
# JSON string to Python dict
parsed = json.loads(json_string)
# Read JSON file
with open("config.json", "r") as f:
config = json.load(f)
# Write JSON file
with open("output.json", "w") as f:
json.dump(data, f, indent=2)
YAML in Python:
import yaml # pip install pyyaml
# Read YAML file
with open("config.yaml", "r") as f:
config = yaml.safe_load(f) # Always use safe_load!
# Write YAML file
with open("output.yaml", "w") as f:
yaml.dump(data, f, default_flow_style=False)
# YAML string to Python dict
yaml_string = """
model: gpt-4
temperature: 0.7
"""
parsed = yaml.safe_load(yaml_string)
Security Warning - yaml.safe_load():
NEVER use yaml.load() without a Loader! It can execute arbitrary Python code. ALWAYS use yaml.safe_load() which only loads basic data types safely.
Note: json.dumps() and json.loads() - remember: dumps = dump to string, loads = load from string. The 's' stands for 'string'.
JSON vs YAML - When to Use Which
Choosing the Right Format
Use JSON When:
- Sending/receiving API data
- Storing structured data in databases
- LLM function calling schemas
- Frontend-backend communication
- Structured output from AI models
Use YAML When:
- Configuration files (docker-compose, k8s)
- CI/CD pipeline definitions
- Human-editable settings
- Prompt template management
- When you need comments in your config
Quick Comparison:
Feature | JSON | YAML
-----------------+-------------------+------------------
Comments | Not supported | Supported (#)
Readability | Good | Excellent
Data types | 6 basic types | 6 + dates, etc.
Trailing commas | Not allowed | Not applicable
Parsing speed | Faster | Slower
API standard | Yes (universal) | No (config only)
Multi-line text | Escaped \n | Native (| or >)
Note: Rule of thumb: JSON for data exchange (APIs), YAML for configuration (infrastructure). In AI work, you'll use JSON 70% of the time (API calls) and YAML 30% (configs and pipelines).
Interview Questions
Data Format Interview Questions
Q1: Why do APIs use JSON instead of YAML?
Answer: JSON is lighter, faster to parse, has a strict spec, native to JavaScript (web), and universally supported by all languages without extra libraries. YAML is great for human readability but slower to parse and has more complex spec with potential security issues.
Q2: What's the security risk with YAML?
Answer: YAML's full loader can execute arbitrary code. The yaml.load() function in Python can deserialize arbitrary Python objects, leading to remote code execution. Always use yaml.safe_load() which only deserializes basic data types.
Q3: How would you handle deeply nested JSON from an API?
Answer: Use dict.get() with defaults for safe access, or libraries like jmespath/jsonpath for querying. For validation, use Pydantic models to parse and validate JSON into typed Python objects. For LLM output, use structured output / JSON mode.
Note: Data format questions are basic but important. They show you understand the foundations of API communication - essential for any AI automation role.
Frequently Asked Questions
What is JSON & YAML?
Every API, every config file, every AI model response uses data formats. Master JSON and YAML - the two most important formats in AI automation - and never struggle with data parsing again.
How does JSON & YAML work?
Data Formats = Languages for Machines to Talk The Restaurant Menu Analogy When you order food on Swiggy, the app sends your order to the restaurant in a specific format - not in Hindi, not in English, but in a structured format that both Swiggy's servers and the restaurant's system understand. That structured format…
Related topics
Practice this on DevInterviewMaster
Read the full JSON & YAML breakdown with interactive demos, quizzes, and Hinglish notes.
800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.