Extracting Specific Keys/Values From A Messed-Up JSON File (Python)
What to do when a messy JSON file gives you a massive headache

Sometimes when we call certain APIs, the format and structure of the returned JSON object can sometimes be pretty messy and confusing. For instance (this example is considered mild BTW):
data = {
"type": "video",
"videoID": "vid001",
"links": [
{"type":"video", "videoID":"vid002", "links":[]},
{ "type":"video",
"videoID":"vid003",
"links": [
{"type": "video", "videoID":"vid004"},
{"type": "video", "videoID":"vid005"},
]
},
{"type":"video", "videoID":"vid006"},
{ "type":"video",
"videoID":"vid007",
"links": [
{"type":"video", "videoID":"vid008", "links": [
{ "type":"video",
"videoID":"vid009",
"links": [{"type":"video", "videoID":"vid010"}]
}
]}
]},
]
}Unfortunately this happens more often than I wish it does.
Python Code To Extract Specific Key-Value Pairs
def extract(data, keys):
out = []
queue = [data]
while len(queue) > 0:
current = queue.pop(0)
if type(current) == dict:
for key in keys:
if key in current:
out.append({key:current[key]})
for val in current.values():
if type(val) in [list, dict]:
queue.append(val) elif type(current) == list:
queue.extend(current)
return outx = extract(data, ["videoID"])
print(x)Here, we wish to extract all videoIDs from the messy dictionary, so we pass ["videoID"] as the keys argument. The output:
[{'videoID': 'vid001'}, {'videoID': 'vid002'}, {'videoID': 'vid003'}, {'videoID': 'vid004'}, {'videoID': 'vid005'}, {'videoID': 'vid006'}, {'videoID': 'vid007'}, {'videoID': 'vid008'}, {'videoID': 'vid009'}, {'videoID': 'vid010'}]The Logic Behind The Code
We need to keep track of 2 lists — 1) out, which contains our output and 2) queue, which contains the data structures we wish to search. We first initialize out as an empty list, and queue to contain our entire json data.
- remove the first element from the queue, and assign it to
current - If
currentis a dictionary, search it for the keys that we want, and add any found key-value pairs intoout. - Then add all values that are either lists or dictionaries back into
queueso we can search them again later. - If
currentis a list, we add everything insidecurrentback intoqueue, so we can search the individual elements later. This can be done using the.extendmethod. - Repeat steps 1–4 until
queueis empty.
Extending Its Functionality
def extract(data, keys):
out = []
queue = [data]
while len(queue) > 0:
current = queue.pop(0)
if type(current) == dict: for key in keys: # CHANGE THIS BLOCK
if key in current:
out.append({key:current[key]})
for val in current.values():
if type(val) in [list, dict]:
queue.append(val) elif type(current) == list:
queue.extend(current)
return outx = extract(data, ["videoID"])
print(x)To change the behaviour of this function, change this block of code:
for key in keys:
if key in current:
out.append({key:current[key]})Currently, this block of code simply adds ANY key-value pair whose key appears in keys into our output. If you wish to change the way this works eg. conditionally add certain key-value pairs into output, simply change this block of code to suit your needs.
Some Final Words
If this article provided value and you wish to support me, do consider signing up for a Medium membership — It’s $5 a month, and you get unlimited access to articles on Medium. If you sign up using my link below, I’ll earn a tiny commission at zero additional cost to you.
Sign up using my link here to read unlimited Medium articles.
I write coding articles (mainly Python) that I think would have probably helped the younger me speed up my learning curve. Do join my email list to get notified whenever I publish.
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.
