avatarLiu Zuo Lin

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3815

Abstract

/pre></div><div id="27f2"><pre> elif <span class="hljs-keyword">type</span>(current) == <span class="hljs-symbol">list:</span> queue.<span class="hljs-keyword">extend</span>(current) <span class="hljs-keyword">return</span> <span class="hljs-keyword">out</span></pre></div><div id="70d4"><pre>x = <span class="hljs-built_in">extract</span>(data, <span class="hljs-selector-attr">[<span class="hljs-string">"videoID"</span>]</span>) <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(x)</span></span></pre></div><p id="4f9c">Here, we wish to extract all <code>videoIDs</code> from the messy dictionary, so we pass <code>["videoID"]</code> as the <code>keys</code> argument. The output:</p><div id="681e"><pre>[{<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid001</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid002</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid003</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid004</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid005</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid006</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid007</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid008</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid009</span>'}, {<span class="hljs-symbol">'videoID</span><span class="hljs-symbol">':</span> <span class="hljs-symbol">'vid010</span>'}]</pre></div><h1 id="39c1">The Logic Behind The Code</h1><p id="7225">We need to keep track of 2 lists — 1) <code>out</code>, which contains our output and 2) <code>queue</code>, which contains the data structures we wish to search. We first initialize <code>out</code> as an empty list, and <code>queue</code> to contain our entire json data.</p><ol><li>remove the first element from the queue, and assign it to <code>current</code></li><li>If <code>current</code> is a dictionary, search it for the keys that we want, and add any found key-value pairs into <code>out</code>.</li><li>Then add all values that are either lists or dictionaries back into <code>queue</code> so we can search them again later.</li><li>If <code>current</code> is a list, we add everything inside <code>current</code> back into <code>queue</code>, so we can search the individual elements later. This can be done using the <code>.extend</code> method.</li><li>Repeat steps 1–4 until <code>queue</code> is empty.</li></ol><h1 id="62ab">Extending Its Functionality</h1><div id="b8fc"><pre><span class="hljs-title">def</span> extract(<span class="hljs-class"><span class="hljs-keyword">data</span>, keys):</span> out = [] queue = [<span class="hljs-class"><span class="hljs-keyword">data</span>]</span> while len(queue) > <span class="hljs-number">0</span>: current = queue.pop(<span class="hljs-number">0</span>) <span class="hljs-keyword">if</span> <span class="hljs-class"><span class="hljs-keyword">type</span>(<span class="hljs-title">current</span>) == dict:</span></pre></div><div id="3e11"><pre> <span class="hljs-keyword">for</span> <span class="hljs-built_in">key</span> <span class="hljs-keyword">in</span> keys: # CHANGE THIS BLOCK <span class="hljs-keyword">if</span> <sp

Options

an class="hljs-built_in">key</span> <span class="hljs-keyword">in</span> current: out.<span class="hljs-built_in">append</span>({<span class="hljs-built_in">key</span>:current[<span class="hljs-built_in">key</span>]})

        <span class="hljs-keyword">for</span> val <span class="hljs-keyword">in</span> current.<span class="hljs-built_in">values</span>():
            <span class="hljs-keyword">if</span> type(val) <span class="hljs-keyword">in</span> [list, dict]:
                queue.<span class="hljs-built_in">append</span>(val)</pre></div><div id="0afd"><pre>        elif <span class="hljs-keyword">type</span>(current) == <span class="hljs-symbol">list:</span>
        queue.<span class="hljs-keyword">extend</span>(current)
<span class="hljs-keyword">return</span> <span class="hljs-keyword">out</span></pre></div><div id="e8fd"><pre>x = <span class="hljs-built_in">extract</span>(data, <span class="hljs-selector-attr">[<span class="hljs-string">"videoID"</span>]</span>)

<span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(x)</span></span></pre></div><p id="f148">To change the behaviour of this function, change this block of code:</p><div id="ff91"><pre><span class="hljs-keyword">for</span> <span class="hljs-built_in">key</span> <span class="hljs-keyword">in</span> keys: <span class="hljs-keyword">if</span> <span class="hljs-built_in">key</span> <span class="hljs-keyword">in</span> current: out.<span class="hljs-built_in">append</span>({<span class="hljs-built_in">key</span>:current[<span class="hljs-built_in">key</span>]})</pre></div><p id="7a70">Currently, this block of code simply adds ANY key-value pair whose key appears in <code>keys</code> into our output. If you wish to change the way this works eg. conditionally add certain key-value pairs into <code>output</code>, simply change this block of code to suit your needs.</p><h1 id="8bbe">Some Final Words</h1><p id="6c1c"><i>If this article provided value and you wish to support me, do consider signing up for a Medium membership — It’s $5 a month, and you get unlimited access to articles on Medium. If you sign up using my link below, I’ll earn a tiny commission at zero additional cost to you.</i></p><p id="e987"><a href="https://zlliu.medium.com/membership"><b><i>Sign up using my link here to read unlimited Medium articles.</i></b></a></p><p id="d27e"><i>I write coding articles (mainly Python) that I think would have probably helped the younger me speed up my learning curve. Do join my email list to get notified whenever I publish.</i></p><div id="13ec" class="link-block"> <a href="https://zlliu.medium.com/subscribe"> <div> <div> <h2>Get an email whenever Liu Zuo Lin publishes.</h2> <div><h3>Get an email whenever Liu Zuo Lin publishes. By signing up, you will create a Medium account if you don't already have…</h3></div> <div><p>zlliu.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*HdSL4Hj2KLjgUBvt)"></div> </div> </div> </a> </div><p id="014b"><i>More content at <a href="https://plainenglish.io/"><b>PlainEnglish.io</b></a>. Sign up for our <a href="http://newsletter.plainenglish.io/"><b>free weekly newsletter</b></a>. Follow us on <a href="https://twitter.com/inPlainEngHQ"><b>Twitter</b></a></i>, <a href="https://www.linkedin.com/company/inplainenglish/"><b><i>LinkedIn</i></b></a><i>, <a href="https://www.youtube.com/channel/UCtipWUghju290NWcn8jhyAw"><b>YouTube</b></a>, and <a href="https://discord.gg/GtDtUAvyhW"><b>Discord</b></a>.</i></p></article></body>

Extracting Specific Keys/Values From A Messed-Up JSON File (Python)

What to do when a messy JSON file gives you a massive headache

cool art

Sometimes when we call certain APIs, the format and structure of the returned JSON object can sometimes be pretty messy and confusing. For instance (this example is considered mild BTW):

data = {
    "type": "video",
    "videoID": "vid001",
    "links": [
        {"type":"video", "videoID":"vid002", "links":[]},
        {   "type":"video", 
            "videoID":"vid003",
            "links": [
            {"type": "video", "videoID":"vid004"},
            {"type": "video", "videoID":"vid005"},
            ]
        },
        {"type":"video", "videoID":"vid006"},
        {   "type":"video",
            "videoID":"vid007",
            "links": [
            {"type":"video", "videoID":"vid008", "links": [
                {   "type":"video", 
                    "videoID":"vid009",
                    "links": [{"type":"video", "videoID":"vid010"}]
                }
            ]}
        ]},
    ]
}

Unfortunately this happens more often than I wish it does.

Python Code To Extract Specific Key-Value Pairs

def extract(data, keys):
    out = []
    queue = [data]
    while len(queue) > 0:
        current = queue.pop(0)
        if type(current) == dict:
            for key in keys:
                if key in current:
                    out.append({key:current[key]})
            
            for val in current.values():
                if type(val) in [list, dict]:
                    queue.append(val)
        elif type(current) == list:
            queue.extend(current)
    return out
x = extract(data, ["videoID"])
print(x)

Here, we wish to extract all videoIDs from the messy dictionary, so we pass ["videoID"] as the keys argument. The output:

[{'videoID': 'vid001'}, {'videoID': 'vid002'}, {'videoID': 'vid003'}, {'videoID': 'vid004'}, {'videoID': 'vid005'}, {'videoID': 'vid006'}, {'videoID': 'vid007'}, {'videoID': 'vid008'}, {'videoID': 'vid009'}, {'videoID': 'vid010'}]

The Logic Behind The Code

We need to keep track of 2 lists — 1) out, which contains our output and 2) queue, which contains the data structures we wish to search. We first initialize out as an empty list, and queue to contain our entire json data.

  1. remove the first element from the queue, and assign it to current
  2. If current is a dictionary, search it for the keys that we want, and add any found key-value pairs into out.
  3. Then add all values that are either lists or dictionaries back into queue so we can search them again later.
  4. If current is a list, we add everything inside current back into queue, so we can search the individual elements later. This can be done using the .extend method.
  5. Repeat steps 1–4 until queue is empty.

Extending Its Functionality

def extract(data, keys):
    out = []
    queue = [data]
    while len(queue) > 0:
        current = queue.pop(0)
        if type(current) == dict:
            for key in keys:        # CHANGE THIS BLOCK
                if key in current:
                    out.append({key:current[key]})
            
            for val in current.values():
                if type(val) in [list, dict]:
                    queue.append(val)
        elif type(current) == list:
            queue.extend(current)
    return out
x = extract(data, ["videoID"])
print(x)

To change the behaviour of this function, change this block of code:

for key in keys:
    if key in current:
        out.append({key:current[key]})

Currently, this block of code simply adds ANY key-value pair whose key appears in keys into our output. If you wish to change the way this works eg. conditionally add certain key-value pairs into output, simply change this block of code to suit your needs.

Some Final Words

If this article provided value and you wish to support me, do consider signing up for a Medium membership — It’s $5 a month, and you get unlimited access to articles on Medium. If you sign up using my link below, I’ll earn a tiny commission at zero additional cost to you.

Sign up using my link here to read unlimited Medium articles.

I write coding articles (mainly Python) that I think would have probably helped the younger me speed up my learning curve. Do join my email list to get notified whenever I publish.

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.

Python
Python3
Json
Python Programming
Recommended from ReadMedium