30 Pandas Practice Questions

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

13860

Abstract

ing">'science', 'history', 'geography'])</pre></div>A student is eligible for math olympiad if both math and science scores are above 80. Create a new column <code>olympiad</code> containing boolean values. True means the student is eligible for the math olympiad, and False means that the student is not eligible.<figure id="a929"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YG3ARBHV-kfRc2SRAlDiJw.png"><figcaption></figcaption></figure><h1 id="dc4b">** Questions</h1><h2 id="04d8">11) Finding the mean + median per subject</h2><div id="1e34"><pre>df = pd.DataFrame([ [60, 70, 72, 90, 74], [76, 70, 80, 84, 62], [92, 70, 64, 82, 94], [88, 68, 98, 90, 100], [86, 70, 78, 66, 96], ], columns=['english', 'math', 'science', 'history', 'geography'])</pre></div><figure id="052a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KJGs84VIlsGVx70wtfw6UA.png"><figcaption></figcaption></figure><h2 id="0267">12) Filling NaN with median of column</h2><div id="9f15"><pre>df = pd.DataFrame([ [60, None, 72, 90, 74], [76, 70, 80, None, 62], [92, 70, 64, 82, 94], [None, 68, 98, 90, 100], [86, 70, 78, 66, None], ], columns=['english', 'math', 'science', 'history', 'geography'])</pre></div><figure id="157e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*a2EJKJ5YZJT9GZIYKinlMw.png"><figcaption></figcaption></figure><h2 id="4cb0">13) Average (mean) price per shop</h2><div id="b07a"><pre>df = pd.DataFrame([ ['A', 'apple', 1.5, 20], ['A', 'orange', 2.0, 30], ['A', 'pear', 2.5, 10], ['B', 'apple', 3.0, 8], ['B', 'orange', 3.5, 20], ['B', 'pear', 4.0, 10], ], columns=['shop', 'fruit', 'price', 'quantity'])</pre></div><figure id="fdaf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jPLxcYq0K8d3m86wupQ4sQ.png"><figcaption></figcaption></figure><h2 id="4be3">14) Average (median) price per fruit</h2><div id="5d23"><pre>df = pd.DataFrame([ ['A', 'apple', 1.5, 20], ['A', 'orange', 2.0, 30], ['A', 'pear', 2.5, 10], ['B', 'apple', 3.0, 8], ['B', 'orange', 3.5, 20], ['B', 'pear', 4.0, 10], ], columns=['shop', 'fruit', 'price', 'quantity'])</pre></div><figure id="fe61"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EkeY0fTmixRcp1BfDWRWLA.png"><figcaption></figcaption></figure><h2 id="e9ae">15) Area, Circumference, Volume and Surface Area</h2><div id="a449"><pre>df = pd.DataFrame([ [1], [1.5], [2], [10] ], columns=['radius'])</pre></div>Where <code>radius</code> represents the radius of a circle/sphere, and:<ul><li>pi = 3.14159</li><li>area = pi * radius²</li><li>circumference = 2 * pi * radius</li><li>volume = 4 / 3 * pi * radius³</li><li>suface area = 4 * pi * radius²</li></ul><figure id="4145"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6T1NGKFnGaRMnGCZTgcf6A.png"><figcaption></figcaption></figure><h2 id="22c4">16) Email feature extraction</h2><div id="8cef"><pre>df = pd.DataFrame([ ['[email protected]'], ['[email protected]'], ['[email protected]'], ['[email protected]'], ], columns=['email'])</pre></div>Write some code to extract the following information<ul><li><code>name</code> — the stuff before the <code>@</code></li><li><code>host</code> — the stuff after the <code>@</code></li><li><code>tld</code> — the stuff after the last <code>.</code></li></ul><figure id="c50d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*px02cnpk0ab6znWrgddSjA.png"><figcaption></figcaption></figure><h2 id="39d2">17) Moving datetime column to index</h2><div id="7344"><pre>df = pd.DataFrame([ ['2022-01-03', 100], ['2022-01-02', 110], ['2022-01-01', 120], ['2022-01-08', 130], ['2022-01-05', 120], ['2022-01-06', 140], ['2022-01-07', 150], ['2022-01-04', 120], ], columns=['date', 'price'])</pre></div>Write some code to first convert the datetime column to actual datetime objects, then move it to the index of the dataframe. Remember to sort the values by date.<figure id="08a6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*kx50iKFW-ilsCeITQBU-4A.png"><figcaption></figcaption></figure><h2 id="0173">18) Mean value per month</h2><div id="20f3"><pre>df = pd.DataFrame([ ['2022-01-03', 100], ['2022-01-02', 110], ['2022-01-01', 120], ['2022-02-08', 130], ['2022-02-05', 120], ['2022-05-06', 140], ['2022-05-07', 150], ['2022-05-04', 120], ], columns=['date', 'value'])</pre></div>Here, the dates are in <code>yyyy-mm-dd</code> format. Find the average value per month.<figure id="436b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NIUviY8Tk2x5gJlblXZV-g.png"><figcaption></figcaption></figure><h2 id="db18">19) Cleaning dirty numbers</h2><div id="fab6"><pre>df = pd.DataFrame([ ['A', '100,000'], ['B', '80,000'], ['C', '20,200'], ['D', '50,000'], ['E', '10,000'], ], columns=['item', 'cost'])</pre></div>Here, the numbers have commas in them, causing pandas to think that they are strings. Convert each number string to an actual number.<figure id="97c2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*N79VC4Rf_InUZLVONtRcpg.png"><figcaption></figcaption></figure><h2 id="d425">20) Fixing duplicated entries</h2><div id="5b4f"><pre>df = pd.DataFrame([ ['A', 20], ['B', 25], ['C', 40], ['A', 22], ['B', 1], ['A', 1], ], columns=['item', 'quantity'])</pre></div>Here, multiple items and their quantities are duplicated instead of combined. Write some code to combine them instead.<figure id="52fe"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iOQ1fFlxylSCfXXGSRQhug.png"><figcaption></figcaption></figure><h1 id="2b81">*** Questions</h1><h2 id="0aeb">21) Storage conversion</h2><div id="f79f"><pre>df = pd.DataFrame([ ['A', '1TB'], ['B', '1 tb'], ['C', '256 GB'], ['D', '512MB'], ['E', '512GB'], ], columns=['laptop', 'storage'])</pre></div>Let’s assume that:<ul><li>1 TB == 1000 GB</li><li>1 GB == 1000 MB</li></ul>Clean the ‘storage’ column, and convert the values to megabytes. Be careful of the inconsistent spaces/casing.<figure id="e4e2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fLU_Or2xzJbxTtbxhp7RUA.png"><figcaption></figcaption></figure><h2 id="24bf">22) Feature extraction from dirty data</h2><div id="27c0"><pre>df = pd.DataFrame([ [1, 'name=rocky;age=4'], [2, 'name=ricky;breed=dog'], [3, 'breed=dog'], [4, 'name=ducky;age=5;breed=duck'], [5, 'age=6;breed=cat'], ], columns=['pet_id', 'desc'])</pre></div><p id="58

Options

2f">Here, the <code>desc</code> column contains multiple key-value pairs stored in the form <code>key1=value1;key2=value2</code><figure id="4eaf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7nc_BYvRpnVT-z9fUeGXVA.png"><figcaption></figcaption></figure><h2 id="0987">23) Grading scores</h2><div id="adaf"><pre>df = pd.DataFrame([ [60, 70, 72, 90, 74], [76, 70, 80, 84, 62], [92, 70, 64, 82, 94], [88, 68, 98, 90, 100], [86, 70, 78, 66, 96], ], columns=['english', 'math', 'science', 'history', 'geography'])</pre></div>Convert each score to a grade:<ul><li>91 to 100 → A</li><li>81 to 90 → B</li><li>71 to 80 → C</li><li>61 to 70 → D</li><li>60 and below → E</li></ul><figure id="e7b9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*veW926e9zj6s1ey4MPhh9w.png"><figcaption></figcaption></figure><h2 id="38b7">24) Ranking scores</h2><div id="93c1"><pre>df = pd.DataFrame([ [60, 70, 72, 90, 74], [76, 71, 80, 84, 62], [92, 72, 64, 82, 94], [88, 68, 98, 90, 100], [86, 73, 78, 66, 96], ], columns=['english', 'math', 'science', 'history', 'geography']</pre></div>For each subject, rank each student’s scores using numbers 1 to 5.<figure id="221c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8OaZr-BbvyvVbjQBmpsrnQ.png"><figcaption></figcaption></figure><h2 id="8e66">25) Outlier Detection</h2><div id="65cb"><pre>df = pd.DataFrame([ [50,60,70,80], [62,74,50,55], [50,64,71,81], [50,64,72,82], [53,65,67,79], ], columns=['A', 'B', 'C', 'D'])</pre></div>Here, a value is considered an outlier as compared to the other values in its column if:<ul><li>It is more than (mean of column) + (standard deviation of column)</li><li>It is less than (mean of column) - (standard deviation of column)</li></ul><figure id="ff86"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9GlBOPN-Ghqx710TxpMCgg.png"><figcaption></figcaption></figure><h2 id="d172">26) Dirty data</h2><div id="8f2d"><pre>df = pd.DataFrame([ ['bob', 'rocky,fifi,baaron'], ['tim', 'lucky,ricky'], ['tom', 'rex,lala'] ], columns=['owner', 'dog'])</pre></div><figure id="6335"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*HxPMhVQt2ugGufwgMbZ3kA.png"><figcaption></figcaption></figure><h2 id="e325">27) Average number per country per team</h2><div id="2f7b"><pre>df = pd.DataFrame([ ['SG', 'A', 1000], ['SG', 'A', 1100], ['SG', 'B', 1200], ['SG', 'B', 1300], ['MY', 'C', 1400], ['MY', 'C', 1500], ['MY', 'D', 1600], ['MY', 'D', 1700], ], columns=['country', 'team', 'number'])</pre></div>Find the average <code>number</code> per country per team.<figure id="8bfe"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ICnRUZv8CdinUWxRu7LkEA.png"><figcaption></figcaption></figure><h2 id="953f">28) Expanding columns</h2><div id="60e1"><pre>df = pd.DataFrame([ ['A', 100, 110, 120, 130, 140], ['B', 101, 111, 121, 131, 141], ['C', 200, 210, 220, 230, 240], ['D', 303, 310, 320, 330, 340], ['E', 500, 510, 520, 530, 540], ], columns=['stock', 'jan', 'feb', 'mar', 'apr', 'may'])</pre></div>^ This dataframe above might not be very friendly for analysis. Write some code to convert it into the new dataframe below.<figure id="12da"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EbjWsaj3kjzl-irJLK5NiA.png"><figcaption></figcaption></figure><h2 id="67fd">29) Largest per month</h2><div id="8f94"><pre>df = pd.DataFrame([ ['A', 100, 110, 120, 130, 140], ['B', 110, 101, 100, 135, 150], ['C', 100, 140, 100, 135, 60], ['D', 120, 130, 90, 121, 70], ['E', 100, 110, 20, 30, 40], ], columns=['stock', 'jan', 'feb', 'mar', 'apr', 'may'])</pre></div>Write some code to find the largest value per month out of all 5 stocks, and display them in the format below.<figure id="ca9f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*pYKeu9V0CBmmYjhxZUykVQ.png"><figcaption></figcaption></figure><h2 id="5dfd">30) Dealing with screwed up data</h2><div id="b735"><pre>df = pd.DataFrame([ ['name=rocky,age=4', '', 'breed=dog,gender=male'], ['age=5', 'name=fifi,breed=dog', 'gender=female'], ['breed=cat', 'name=ricky', 'age=6'], ['age=7,name=lucky,breed=bird', '', 'gender=male'], ['', 'age=8', 'name=bucky,breed=chicken,gender=female'] ], columns=['desc1', 'desc2', 'desc3'])</pre></div>Clean the above (extremely messy) dataframe to get the dataframe below. Missing values should be converted to NaN.<figure id="2bd6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-s93qmKt9A7I9YfqbdcF6w.png"><figcaption></figcaption></figure><h1 id="02ae">Conclusion</h1>Hope these questions were helpful in your practice!<h1 id="40ec">Some Final words</h1>If this story provided value to you, and you wish to show support, you could:<ol><li>Clap multiple times for this story (this really helps me out!)</li><li>Consider signing up for a Medium membership using my link — it’s $5 per month and you get to read unlimited stories on Medium.</li></ol><a href="https://zlliu.medium.com/membership">Sign up using my link here to read unlimited Medium articles.</a>Get my free Ebooks: <a href="https://zlliu.co/books">https://zlliu.co/books</a>I write Python articles (sometimes other stuff) that the younger me would have wanted to read. Do join my email list to get notified whenever I publish.<div id="d77a" class="link-block"> <a href="https://zlliu.medium.com/subscribe"> <div> <div> <h2>Get an email whenever Liu Zuo Lin publishes.</h2> <div><h3>Get an email whenever Liu Zuo Lin publishes. By signing up, you will create a Medium account if you don't already have…</h3></div> <div>zlliu.medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*IgpLf7o7RdjrLUuK)"></div> </div> </div> </a> </div>More content at <a href="https://plainenglish.io/">PlainEnglish.io</a>. Sign up for our <a href="http://newsletter.plainenglish.io/">free weekly newsletter</a>. Follow us on <a href="https://twitter.com/inPlainEngHQ">Twitter</a>, <a href="https://www.linkedin.com/company/inplainenglish/">LinkedIn</a>, <a href="https://www.youtube.com/channel/UCtipWUghju290NWcn8jhyAw">YouTube</a>, and <a href="https://discord.gg/GtDtUAvyhW">Discord</a>. Interested in Growth Hacking? Check out <a href="https://circuit.ooo/">Circuit</a>.</article></body>