Flatten nested JSON data with pd.json_normalize()

Corin, pythonpandasjson
Back

When working with JSON data that has nested structures, converting it directly to a pandas DataFrame can result in columns containing dictionaries or lists. The pd.json_normalize() function flattens these nested structures into separate columns.

Here's a simple example:

import pandas as pd

# Sample nested JSON data
data = [
    {
        "name": "Alice",
        "address": {"city": "New York", "zip": "10001"},
        "age": 30
    },
    {
        "name": "Bob",
        "address": {"city": "San Francisco", "zip": "94102"},
        "age": 25
    }
]

# Flatten the nested structure
df = pd.json_normalize(data)

This creates a DataFrame with columns like name, age, address.city, and address.zip, where nested fields are separated by dots by default.

For deeper nesting, you can specify the record_path parameter to indicate which nested list to flatten, and meta to preserve parent fields. This is especially useful when working with API responses or complex JSON files.

© Corin Faife.RSS