AI & AutomationFree to read

JSON & YAML

The Universal Languages Machines Speak

Every API, every config file, every AI model response uses data formats. Master JSON and YAML - the two most important formats in AI automation - and never struggle with data parsing again.

What Are Data Formats & Why They Matter

Data Formats = Languages for Machines to Talk

The Restaurant Menu Analogy

When you order food on Swiggy, the app sends your order to the restaurant in a specific format - not in Hindi, not in English, but in a structured format that both Swiggy's servers and the restaurant's system understand. That structured format is like JSON.

Data formats are standardized ways to organize information so that different systems, APIs, and programs can understand each other. Without them, it would be like trying to order food by sending a painting - nobody would understand!

The Big Three Data Formats:

JSON (JavaScript Object Notation) - The king of APIs. Every LLM API (OpenAI, Claude, Gemini) uses JSON. Lightweight, human-readable, universally supported.
YAML (YAML Ain't Markup Language) - The king of configuration. Docker Compose, Kubernetes, GitHub Actions, LangChain configs - all use YAML. Even more readable than JSON.
XML - The old king. Still used in some enterprise systems but largely replaced by JSON. You'll rarely use it in AI work.

Why AI Engineers Must Know This:

Every LLM API request and response is JSON
Prompt templates often use YAML configs
RAG pipeline configurations are in YAML
Function calling / tool use schemas are JSON
Docker, CI/CD, and deployment configs are YAML

Note: If APIs are the roads of the internet, JSON is the language spoken on those roads. You cannot build AI automation without being fluent in JSON.

JSON - Deep Dive

JSON: The Language Every API Speaks

JSON Structure - Just 6 Data Types:

{
  "string": "Hello World",
  "number": 42,
  "boolean": true,
  "null_value": null,
  "array": [1, 2, 3, "mixed types allowed"],
  "object": {
    "nested": "objects work too",
    "deep": { "as": { "you": "want" } }
  }
}

That's it! These 6 types (string, number, boolean, null, array, object) can represent ANY data structure. LLM responses, user profiles, API configs - everything.

Real Example - OpenAI API Request:

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is RAG?"}
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

JSON Rules to Remember:

Keys MUST be in double quotes (not single quotes)
No trailing commas allowed
No comments allowed (unlike YAML)
Strings must use double quotes
Numbers can be integers or floating point

Note: Pro tip: Use json.dumps(data, indent=2) in Python to pretty-print JSON. For debugging API responses, this is invaluable. Also json.loads() to parse JSON strings back into Python dicts.

YAML - The Human-Friendly Format

YAML: When Readability Matters Most

YAML vs JSON - Same Data, Different Look:

# YAML - clean, readable, supports comments!
model: gpt-4
messages:
  - role: system
    content: You are a helpful assistant.
  - role: user
    content: What is RAG?
temperature: 0.7
max_tokens: 500

Compare this to the JSON version above - same data but YAML is much cleaner. No curly braces, no quotes around keys, supports comments with #.

YAML Key Features:

Indentation-based - Uses spaces (NOT tabs!) for nesting. 2 spaces is standard.
Comments - Use # for comments. JSON doesn't support this!
Multi-line strings - Use | for literal blocks or > for folded blocks
Anchors & Aliases - Reuse values with & and * (DRY principle)

Where You'll See YAML in AI:

docker-compose.yml - Container orchestration
GitHub Actions - CI/CD workflows (.github/workflows/)
Kubernetes - Deployment manifests
LangChain/LlamaIndex - Pipeline configurations
Prompt templates - Many frameworks use YAML for prompt configs
Haystack - RAG pipeline definitions

Note: YAML's biggest gotcha: indentation MUST use spaces, never tabs. And it's sensitive to spacing. A single wrong indent can break your entire config. Use a YAML linter!

Working with JSON & YAML in Python

Practical Python Code

JSON in Python:

import json

# Python dict to JSON string
data = {"model": "gpt-4", "temperature": 0.7}
json_string = json.dumps(data, indent=2)  # Pretty print

# JSON string to Python dict
parsed = json.loads(json_string)

# Read JSON file
with open("config.json", "r") as f:
    config = json.load(f)

# Write JSON file
with open("output.json", "w") as f:
    json.dump(data, f, indent=2)

YAML in Python:

import yaml  # pip install pyyaml

# Read YAML file
with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)  # Always use safe_load!

# Write YAML file
with open("output.yaml", "w") as f:
    yaml.dump(data, f, default_flow_style=False)

# YAML string to Python dict
yaml_string = """
model: gpt-4
temperature: 0.7
"""
parsed = yaml.safe_load(yaml_string)

Security Warning - yaml.safe_load():

NEVER use yaml.load() without a Loader! It can execute arbitrary Python code. ALWAYS use yaml.safe_load() which only loads basic data types safely.

Note: json.dumps() and json.loads() - remember: dumps = dump to string, loads = load from string. The 's' stands for 'string'.

JSON vs YAML - When to Use Which

Choosing the Right Format

Use JSON When:

Sending/receiving API data
Storing structured data in databases
LLM function calling schemas
Frontend-backend communication
Structured output from AI models

Use YAML When:

Configuration files (docker-compose, k8s)
CI/CD pipeline definitions
Human-editable settings
Prompt template management
When you need comments in your config

Quick Comparison:

Feature          | JSON              | YAML
-----------------+-------------------+------------------
Comments         | Not supported     | Supported (#)
Readability      | Good              | Excellent
Data types       | 6 basic types     | 6 + dates, etc.
Trailing commas  | Not allowed       | Not applicable
Parsing speed    | Faster            | Slower
API standard     | Yes (universal)   | No (config only)
Multi-line text  | Escaped \n        | Native (| or >)

Note: Rule of thumb: JSON for data exchange (APIs), YAML for configuration (infrastructure). In AI work, you'll use JSON 70% of the time (API calls) and YAML 30% (configs and pipelines).

Interview Questions

Data Format Interview Questions

Q1: Why do APIs use JSON instead of YAML?

Answer: JSON is lighter, faster to parse, has a strict spec, native to JavaScript (web), and universally supported by all languages without extra libraries. YAML is great for human readability but slower to parse and has more complex spec with potential security issues.

Q2: What's the security risk with YAML?

Answer: YAML's full loader can execute arbitrary code. The yaml.load() function in Python can deserialize arbitrary Python objects, leading to remote code execution. Always use yaml.safe_load() which only deserializes basic data types.

Q3: How would you handle deeply nested JSON from an API?

Answer: Use dict.get() with defaults for safe access, or libraries like jmespath/jsonpath for querying. For validation, use Pydantic models to parse and validate JSON into typed Python objects. For LLM output, use structured output / JSON mode.

Note: Data format questions are basic but important. They show you understand the foundations of API communication - essential for any AI automation role.

Frequently Asked Questions

What is JSON & YAML?

Every API, every config file, every AI model response uses data formats. Master JSON and YAML - the two most important formats in AI automation - and never struggle with data parsing again.

How does JSON & YAML work?

Data Formats = Languages for Machines to Talk The Restaurant Menu Analogy When you order food on Swiggy, the app sends your order to the restaurant in a specific format - not in Hindi, not in English, but in a structured format that both Swiggy's servers and the restaurant's system understand. That structured format…

Browse all AI & Automation topics →

Practice this on DevInterviewMaster

Read the full JSON & YAML breakdown with interactive demos, quizzes, and Hinglish notes.

Open the interactive topic →

800+ system-design, LLD, coding, and design-pattern topics. Unlock everything with Pro (₹499, one-time) or Ultimate (₹999, one-time) — lifetime access, no subscription.

JSON & YAML

What Are Data Formats & Why They Matter

JSON - Deep Dive

YAML - The Human-Friendly Format

Working with JSON & YAML in Python

JSON vs YAML - When to Use Which

Interview Questions

Frequently Asked Questions

Related topics

Practice this on DevInterviewMaster