avatarNaina Chaturvedi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

24033

Abstract

<div> <h2>Ignito</h2> <div><h3>Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…</h3></div> <div><p>naina0405.substack.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*_ER1J-h50iqAjH70)"></div> </div> </div> </a> </div><h1 id="7a42">3. Pandas Pivot Table</h1><p id="b578">In Pandas, the pivot table function takes a data frame as input and performs grouped operations that provide a multidimensional summarization of the data.</p><p id="a18a">Import necessary libraries :</p><div id="72cb"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np</pre></div><p id="485a">Load Data :</p><div id="b17d"><pre><span class="hljs-attr">loan</span> = pd.read_csv(<span class="hljs-string">'/Users/priyeshkucchu/Desktop/loan_train.csv'</span>, \ index_col = <span class="hljs-string">'Loan_ID'</span>)</pre></div><p id="6d8e">Show data :</p><div id="5de6"><pre>loan<span class="hljs-selector-class">.head</span>()</pre></div><figure id="121c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zh3uKlwMr_eRlkncrrXSkg.png"><figcaption>Loan Data</figcaption></figure><p id="27c3">Pivot Table :</p><div id="4b6b"><pre>pivot = loan<span class="hljs-selector-class">.pivot_table</span>(values = <span class="hljs-selector-attr">[<span class="hljs-string">'LoanAmount'</span>]</span>,index = <span class="hljs-selector-attr">[<span class="hljs-string">'Gender'</span>, \<span class="hljs-string">'Married'</span>,<span class="hljs-string">'Dependents'</span>, <span class="hljs-string">'Self_Employed'</span>]</span>, aggfunc = np.median)</pre></div><figure id="27b9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RXiQ2zssNdS_85tBTlgWqg.png"><figcaption>Output pivot table</figcaption></figure><div id="7211" class="link-block"> <a href="https://readmedium.com/the-most-hilarious-code-comments-ever-bae3cb1030b5"> <div> <div> <h2>The Most Hilarious Code Comments Ever</h2> <div><h3>Programmer Humor: Yes, coders actually wrote them!</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*C-cPP9D2MIyeexAT.gif)"></div> </div> </div> </a> </div><div id="8959" class="link-block"> <a href="https://readmedium.com/coding-sins-hilarious-developer-confessions-f55eb342454e"> <div> <div> <h2>Coding Sins: Hilarious Developer Confessions</h2> <div><h3>How ‘whiteboarding’ got mocked</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*JceCvoRHEHRXyHnb.jpeg)"></div> </div> </div> </a> </div><div id="427c" class="link-block"> <a href="https://readmedium.com/python-iterators-generators-and-decorators-made-easy-659cae26054f"> <div> <div> <h2>Python Iterators, Generators And Decorators Made Easy</h2> <div><h3>A Quick Implementation Guide</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*XtVnWXUTVVE13f3-.jpeg)"></div> </div> </div> </a> </div><h1 id="fe5d">4. Pandas Apply</h1><p id="1677">In Pandas, the .apply() function helps to segregate data based on the conditions as defined by the user.</p><p id="b21c">Import necessary libraries</p><div id="5265"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="d956">Load Data</p><div id="91e8"><pre><span class="hljs-attr">ytdata</span>= pd.read_csv(<span class="hljs-string">'/Users/priyeshkucchu/Desktop/USvideos.csv'</span>)</pre></div><p id="578d">Function Missing Values —</p><div id="fdfe"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">missing_values</span>(<span class="hljs-params">x</span>): <span class="hljs-keyword">return</span> <span class="hljs-built_in">sum</span>(x.isnull())</pre></div><p id="9094">For missing values in the columns —</p><div id="48d8"><pre><span class="hljs-built_in">print</span>(<span class="hljs-string">" Missing values in each column :"</span>) ytdata.apply(missing_values,<span class="hljs-attribute">axis</span>=0)</pre></div><p id="7bfa">Output —</p><div id="4efb"><pre>Missing <span class="hljs-keyword">values</span> <span class="hljs-keyword">in</span> <span class="hljs-keyword">each</span> <span class="hljs-keyword">column</span> :</pre></div><div id="b9df"><pre><span class="hljs-attribute">video_id</span> <span class="hljs-number">0</span> <span class="hljs-attribute">trending_date</span> <span class="hljs-number">0</span> <span class="hljs-attribute">title</span> <span class="hljs-number">0</span> <span class="hljs-attribute">channel_title</span> <span class="hljs-number">0</span> <span class="hljs-attribute">category_id</span> <span class="hljs-number">0</span> <span class="hljs-attribute">publish_time</span> <span class="hljs-number">0</span> <span class="hljs-attribute">tags</span> <span class="hljs-number">0</span> <span class="hljs-attribute">views</span> <span class="hljs-number">0</span> <span class="hljs-attribute">likes</span> <span class="hljs-number">0</span> <span class="hljs-attribute">dislikes</span> <span class="hljs-number">0</span> <span class="hljs-attribute">comment_count</span> <span class="hljs-number">0</span> <span class="hljs-attribute">thumbnail_link</span> <span class="hljs-number">0</span> <span class="hljs-attribute">comments_disabled</span> <span class="hljs-number">0</span> <span class="hljs-attribute">ratings_disabled</span> <span class="hljs-number">0</span> <span class="hljs-attribute">video_error_or_removed</span> <span class="hljs-number">0</span> <span class="hljs-attribute">description</span> <span class="hljs-number">502</span> <span class="hljs-attribute">dtype</span>: int64</pre></div><p id="d0fd">For missing values in the rows —</p><div id="a9b1"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(<span class="hljs-string">" Missing values in each row :"</span>)</span></span> ytdata<span class="hljs-selector-class">.apply</span>(missing_values,axis=<span class="hljs-number">1</span>)<span class="hljs-selector-class">.head</span>()</pre></div><p id="b89f">Output —</p><div id="cb6a"><pre>Missing <span class="hljs-keyword">values</span> <span class="hljs-keyword">in</span> <span class="hljs-keyword">each</span> <span class="hljs-type">row</span> :</pre></div><div id="ad2c"><pre><span class="hljs-attribute">0</span> <span class="hljs-number">0</span> <span class="hljs-attribute">1</span> <span class="hljs-number">0</span> <span class="hljs-attribute">2</span> <span class="hljs-number">0</span> <span class="hljs-attribute">3</span> <span class="hljs-number">0</span> <span class="hljs-attribute">4</span> <span class="hljs-number">0</span> <span class="hljs-attribute">dtype</span>: int64</pre></div><div id="06ac" class="link-block"> <a href="https://naina0412.medium.com/holidays-alert-top-5-free-data-science-ai-ml-courses-you-can-finish-8067ecff7c1d"> <div> <div> <h2>[Holidays Alert]: Top 5 free Data Science, AI &amp; ML courses You can finish</h2> <div><h3>Amazing courses…</h3></div> <div><p>naina0412.medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*GlYMx-gy0797brkS)"></div> </div> </div> </a> </div><h1 id="9a72">5. Pandas Count</h1><p id="7063">In pandas, the count function helps in counting Non-NA cells for each column or row.</p><p id="c278">Import necessary libraries :</p><div id="82cf"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="2024">Load Data :</p><div id="686b"><pre><span class="hljs-attr">ytdata</span>= pd.read_csv(<span class="hljs-string">'/Users/priyeshkucchu/Desktop/USvideos.csv'</span>)</pre></div><p id="ac4a">Count no of data points in each column :</p><div id="e372"><pre>ytdata.count(<span class="hljs-attribute">axis</span>=0)</pre></div><figure id="b836"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*rn6r9UEDfkC767rX5hKXnw.png"><figcaption>Output — Count of data points in each column</figcaption></figure><p id="4e80">Count no. of null data points in the Description column</p><div id="9b96"><pre>ytdata<span class="hljs-selector-class">.description</span><span class="hljs-selector-class">.isnull</span>()<span class="hljs-selector-class">.value_counts</span>()</pre></div><figure id="b46f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7lFnXf8qe2mfIQFkkLpunQ.png"><figcaption>Output — No of null data points in the description column</figcaption></figure><h1 id="48cf">6. Pandas Crosstab</h1><p id="cfea">In Pandas, this function is used to compute a simple <b>cross-tabulation</b> of two or more factors.</p><p id="a488">Import necessary libraries :</p><div id="2e16"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="f64e">Load Data :</p><div id="b0d2"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.read_csv('/<span class="hljs-type">Users</span>/<span class="hljs-title">priyeshkucchu</span>/<span class="hljs-type">Desktop</span>/<span class="hljs-title">loan_train</span>.<span class="hljs-title">csv'</span>,\ <span class="hljs-title">index_col</span> = '<span class="hljs-type">Loan_ID</span>')</span></pre></div><p id="5997">Cross tab between Credit History and Self Employed columns in the loan data :</p><div id="eafc"><pre>pd.crosstab(data[<span class="hljs-string">"Credit_History"</span>],data[<span class="hljs-string">"Self_Employed"</span>],\ <span class="hljs-attribute">margins</span>=<span class="hljs-literal">True</span>, normalize = <span class="hljs-literal">False</span>)</pre></div><figure id="198e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qC4GfxyeBKCPbAKVf2rdoQ.png"><figcaption>Output</figcaption></figure><h1 id="aabe">7. Pandas str.split</h1><p id="740e">In Pandas, str.split function is used to provide a method to split string around a passed separator or a delimiter.</p><p id="b550">Import necessary libraries :</p><div id="a80f"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="d76e">Create a Data Frame :</p><div id="c620"><pre><span class="hljs-built_in">df</span> = pd.DataFrame({<span class="hljs-string">'Person_name'</span>:[<span class="hljs-string">'Naina Chaturvedi'</span>, <span class="hljs-string">'Alvaro Morte'</span>, <span class="hljs-string">'Alex Pina'</span>, <span class="hljs-string">'Steve Jobs'</span>]}) <span class="hljs-built_in">df</span></pre></div><figure id="0ec5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dbKGDDCR4Jf-wDn5c6I0sg.png"><figcaption>Data Frame df</figcaption></figure><p id="8aae">Extract First and Last Names:</p><div id="279c"><pre>df<span class="hljs-selector-attr">[<span class="hljs-string">'first_name'</span>]</span> = df<span class="hljs-selector-attr">[<span class="hljs-string">'Person_name'</span>]</span><span class="hljs-selector-class">.str</span><span class="hljs-selector-class">.split</span>(<span class="hljs-string">' '</span>,expand = True)<span class="hljs-selector-attr">[0]</span> df<span class="hljs-selector-attr">[<span class="hljs-string">'last_name'</span>]</span> = df<span class="hljs-selector-attr">[<span class="hljs-string">'Person_name'</span>]</span><span class="hljs-selector-class">.str</span><span class="hljs-selector-class">.split</span>(<span class="hljs-string">' '</span>, expand = True)<span class="hljs-selector-attr">[1]</span></pre></div><div id="2802"><pre><span class="hljs-built_in">df</span></pre></div><figure id="39a2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*d6Jfwi_qRM7xCEyyNKmmuQ.png"><figcaption>Output — Extract First and Last Name using pandas str.splt()</figcaption></figure><h1 id="ab75">8. Extract E-mail from text</h1><p id="58bd">Import the necessary libraries and initialize the text :</p><div id="1707"><pre><span class="hljs-keyword">import</span> re</pre></div><div id="b663"><pre><span class="hljs-attr">Enquiries_text</span> = <span class="hljs-string">'For any enquiries or feedback related to our product,\service, marketing promotions or other general support \ matters. [email protected]’'</span></pre></div><p id="e46a">Extract email using Regular Expression :</p><div id="f933"><pre>re.findall(<span class="hljs-string">r"([\w.-]+@[\w.-]+)"</span>, Enquiries_text)</pre></div><figure id="daa3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aSrntKRKD0SnwVrw0l0XFQ.png"><figcaption>Output — Extract Email from Text</figcaption></figure><h1 id="ee1a">9. Pandas melt</h1><p id="8496">In pandas, melt function is used to reshape the data frame to a longer form.</p><p id="0acc">Import necessary libraries :</p><div id="8a41"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="7b49">Create a Data Frame :</p><div id="0b87"><pre><span class="hljs-attribute">df</span> = pd.DataFrame({'Person_Name': {<span class="hljs-number">0</span>: 'Naina', <span class="hljs-number">1</span>: 'Alex', <span class="hljs-number">2</span>: \'Avarto'}, 'CourseName': {<span class="hljs-number">0</span>: 'Masters', <span class="hljs-number">1</span>: 'Graduate', <span class="hljs-number">2</span>: \'Graduate'}, 'Age': {<span class="hljs-number">0</span>: <span class="hljs-number">27</span>, <span class="hljs-number">1</span>: <span class="hljs-number">20</span>, <span class="hljs-number">2</span>: <span class="hljs-number">22</span>}})</pre></div><p id="2aff">Melt two data frames :</p><div id="cff0"><pre>m1= pd<span class="hljs-selector-class">.melt</span>(df, id_vars =<span class="hljs-selector-attr">[<span class="hljs-string">'Person_Name'</span>]</span>, value_vars =<span class="hljs-selector-attr">[<span class="hljs-string">'CourseName'</span>, <span class="hljs-string">'Age'</span>]</span>) m1</pre></div><figure id="4c33"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gOJz2cjSfojkR8E2RuCIbw.png"><figcaption>Output — m1 Dataframe</figcaption></figure><div id="8b75"><pre>m2= pd<span class="hljs-selector-class">.melt</span>(df, id_vars =<span class="hljs-selector-attr">[<span class="hljs-string">'Person_Name'</span>]</span>, value_vars =<span class="hljs-selector-attr">[<span class="hljs-string">'Age'</span>]</span>) m2</pre></div><figure id="5e86"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9pcd7jeoFIn1e09ZPBqLWQ.png"><figcaption>Output — m2 Dataframe</figcaption></figure><h1 id="de1a">10. Extract Continuous and categorical data</h1><p id="9539">Import necessary libraries :</p><div id="12b9"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="895d">Load Data :</p><div id="9437"><pre><span class="hljs-attribute">Loan_data</span> = pd.read_csv(<span class="hljs-string">'/Users/priyeshkucchu/Desktop/loan_train.csv'</span>) Loan_data.shape</pre></div><p id="f202">Output: (614, 13)</p><p id="d097">Check data types of columns :</p><div id="55d6"><pre><span class="hljs-type">Loan_data</span>.dtypes</pre></div><figure id="9d88"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FFLTNVqrAsG63WuPB0RbAg.png"><figcaption>Output — Data Types of columns in Loan Data</figcaption></figure><p id="fd92">Extract columns containing only categorical data:</p><div id="af4a"><pre>categorical_variables = Loan_data<span class="hljs-selector-class">.select_dtypes</span>(<span class="hljs-string">"object"</span>)<span class="hljs-selector-class">.head</span>() categorical_variables<span class="hljs-selector-class">.head</span>()</pre></div><figure id="29a2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*uQct43R3d3ExOan038qguw.png"><figcaption></figcaption></figure><p id="02c8">Extract columns containing only integer data:</p><div id="4294"><pre>integer_variables = Loan_data<span class="hljs-selector-class">.select_dtypes</span>(<span class="hljs-string">"int64"</span>) integer_variables<span class="hljs-selector-class">.head</span>()</pre></div><figure id="d236"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cD7bKCdxiGqFnMrXxAS63A.png"><figcaption></figcaption></figure><p id="811f">Extract columns containing only numerical data:</p><div id="d5ae"><pre>numeric_variables = Loan_data<span class="hljs-selector-class">.select_dtypes</span>(<span class="hljs-string">"number"</span>) numeric_variables<span class="hljs-selector-class">.head</span>()</pre></div><figure id="36b7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ijGBqybPb6vjN4jWaoW2VA.png"><figcaption></figcaption></figure><h1 id="2b23">11. Pandas Eval function for efficient operations</h1><p id="eb3d">The eval() function in Pandas uses string expressions to efficiently compute operations using a Data Frame.</p><p id="d90e">Import necessary libraries :</p><div id="0787"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np</pre></div><p id="d1b0">Initialize no_rows, no_cols:</p><div id="67d3"><pre>no_rows, no_cols = <span class="hljs-number">100000</span>, <span class="hljs-number">100</span> r = np<span class="hljs-selector-class">.random</span><span class="hljs-selector-class">.RandomState</span>(<span class="hljs-number">50</span>) df1, df2, df3, df4 = (pd<span class="hljs-selector-class">.DataFrame</span>(r<span class="hljs-selector-class">.rand</span>(no_rows, no_cols)) <span class="hljs-keyword">for</span> <span class="hljs-selector-tag">i</span> <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-number">4</span>))</pre></div><p id="d009">Without Eval function</p><div id="d1a8"><pre><span class="hljs-tag">%<span class="hljs-selector-tag">timeit</span></span> df1 + df2 * df3 - df4</pre></div><figure id="7a21"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*c6iSeWHTX599feRg_TbT3A.png"><figcaption>Output without Eval function</figcaption></figure><p id="d829">With Eval function — The eval() version of this expression is about 50% faster and uses much less memory</p><div id="90e4"><pre>%timeit pd.<span class="hljs-keyword">eval</span>(<span class="hljs-string">'df1 + df2 * df3 - df4'</span>)</pre></div><figure id="0244"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6tpCkjukqiQg911Op2VF3A.png"><figcaption>Output with Eval function</figcaption></figure><h1 id="f23f">12. Pandas Unique</h1><p id="204b">In pandas, using unique function values that are unique are returned in order of appearance.</p><p id="0938">Import necessary libraries :</p><div id="bc69"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np</pre></div><p id="dab7">Load Data :</p><div id="5760"><pre><span class="hljs-attr">crime_data</span> = pd.read_csv(<span class="hljs-string">"/Users/priyeshkucchu/Desktop/crime.csv"</span>,\ engine=<span class="hljs-string">'python'</span>)</pre></div><p id="59e4">Show data :</p><div id="deca"><pre>crime_data.head<span class="hljs-comment">()</span></pre></div><figure id="bee0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ssCJTlcXUDrvm300L38o5w.png"><figcaption></figcaption></figure><p id="16c8">Show Unique values in the District Codes Column:</p><div id="552a"><pre>crime_data<span class="hljs-selector-attr">[<span class="hljs-string">"DISTRICT"</span>]</span><span class="hljs-selector-class">.unique</span>()</pre></div><figure id="8a30"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KvS5pfp7FJ1o4JnTdMozUA.png"><figcaption>Output — Unique values in the District codes column</figcaption></figure><h1 id="1c1a">13. Ipython Interactive Shell</h1><p id="6c13">Import necessary libraries :</p><div id="18ef"><pre><span class="hljs-title">from</span> <span class="hljs-type">IPython</span>.core.interactiveshell <span class="hljs-keyword">import</span> InteractiveShell <span class="hljs-type">InteractiveShell</span>.ast_node_interactivity = <span class="hljs-string">"all"</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="dd79">Load Data :</p><div id="63b8"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.read_csv('/<span class="hljs-type">Users</span>/<span class="hljs-title">priyeshkucchu</span>/<span class="hljs-type">Desktop</span>/<span class="hljs-title">loan_train</span>.<span class="hljs-title">csv'</span>)</span></pre></div><p id="082e">Run commands simultaneously:</p><div id="62c5"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.shape</span> <span class="hljs-class"><span class="hljs-keyword">data</span>.head()</span> <span class="hljs-class"><span class="hljs-keyword">data</span>.dtypes</span> <span class="hljs-class"><span class="hljs-keyword">data</span>.info()</span></pre></div><figure id="ed06"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jOoM6sAbAhpOry6iPSOZlQ.png"><figcaption>Output</figcaption></figure><h1 id="6a1d">14. Pandas Merge</h1><p id="6796">In pandas, the merge function is used to join two datasets together based on common columns between them.</p><p id="a9e1">Import necessary libraries :</p><div id="b781"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="6185">Initialize Data Frames :</p><div id="6a8c"><pre>df1 = pd<span class="hljs-selector-class">.DataFrame</span>({<span class="hljs-string">'Left_key'</span>: <span class="hljs-selector-attr">[<span class="hljs-string">'Naina'</span>, <span class="hljs-string">'Avarto'</span>, <span class="hljs-string">'Alex'</span>, \<span class="hljs-string">'Naina'</span>]</span>,<span class="hljs-string">'value'</span>: <span class="hljs-selector-attr">[1, 2, 3, 5]</span>}) df2 = pd<span class="hljs-selector-class">.DataFrame</span>({<span class="hljs-string">'Right_key'</span>: <span class="hljs-selector-attr">[<span class="hljs-string">'Naina'</span>, <span class="hljs-string">'Avarto'</span>, <span class="hljs-string">'Alex'</span>, \<span class="hljs-string">'Naina'</span>]</span>,<span class="hljs-string">'value'</span>: <span class="hljs-selector-attr">[5, 6, 7, 8]</span>})</pre></div><figure id="b96c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gMF6lF1hzZvXyeZxFSfMQw.png"><figcaption>DataFrames d1 and d2</f # Options igcaption></figure><p id="b014">Merge the data frames :</p><div id="0fba"><pre>df1.merge(df2, <span class="hljs-attribute">left_on</span>=<span class="hljs-string">'Left_key'</span>, <span class="hljs-attribute">right_on</span>=<span class="hljs-string">'Right_key'</span>, \ suffixes=(<span class="hljs-string">'_Left'</span>, <span class="hljs-string">'_Right'</span>))</pre></div><figure id="a4e1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2j1e9X26veefaxaIcujIHQ.png"><figcaption>Output — Merge the data frames</figcaption></figure><h1 id="a0d2">15. Parse dates in read_csv() to change data type to DateTime</h1><p id="f84c">Import necessary libraries :</p><div id="867e"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd</pre></div><p id="9c1f">Load Data and print the data types of crime data columns:</p><div id="7887"><pre>crime_data = pd.read_csv(<span class="hljs-string">"/Users/priyeshkucchu/Desktop/crime.csv"</span>, \ <span class="hljs-attribute">engine</span>=<span class="hljs-string">'python'</span>) crime_data.dtypes</pre></div><figure id="f6cc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*fIjbOf6E-tfQpi-OOiG8ZQ.png"><figcaption></figcaption></figure><p id="9627">Parse Dates in read_csv():</p><div id="cc68"><pre>crime_data = pd.read_csv(<span class="hljs-string">"/Users/priyeshkucchu/Desktop/crime.csv"</span>, <span class="hljs-attribute">engine</span>=<span class="hljs-string">'python'</span>,parse_dates = [<span class="hljs-string">"OCCURRED_ON_DATE"</span>]) crime_data.dtypes</pre></div><figure id="df5f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*pdAT554IK4VIYv6XCDl46A.png"><figcaption>Output — Parse dates in read_csv for column OCCURRED_ON_DATE</figcaption></figure><h1 id="774d">16. Date Parser</h1><p id="1678">Import necessary libraries :</p><div id="135e"><pre><span class="hljs-keyword">import</span> datetime <span class="hljs-keyword">import</span> dateutil.parser</pre></div><p id="c966">Parse Dates:</p><div id="ea0f"><pre><span class="hljs-attr">input_date</span> = <span class="hljs-string">'04th Dec 2020'</span> <span class="hljs-attr">parsed_date</span> = dateutil.parser.parse(input_date)</pre></div><p id="715c">Output date in the designated format :</p><div id="aa3a"><pre>op_date = datetime.datetime.strftime(parsed_date, '<span class="hljs-built_in">%Y</span>-<span class="hljs-built_in">%m</span>-<span class="hljs-built_in">%d</span>')</pre></div><div id="58b4"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(op_date)</span></span></pre></div><figure id="de95"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*E7jxpwgUB6K4guYeIc2T-A.png"><figcaption>Output Date</figcaption></figure><h1 id="6dce">17. Invert a Dictionary</h1><p id="46e5">Create a dictionary :</p><div id="6c4b"><pre>l_dict = {<span class="hljs-symbol">'Person_Name</span><span class="hljs-string">':'</span>Naina', <span class="hljs-symbol">'Age</span>' : 27, <span class="hljs-symbol">'Profession</span>' : '<span class="hljs-type">Software</span> Engineer' }</pre></div><p id="b33a">Original Dictionary :</p><figure id="00e3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OgZyAPT5YlPGEc8-hQ7Zig.png"><figcaption></figcaption></figure><p id="1723">Invert dictionary :</p><div id="725d"><pre>invert_dict = {<span class="hljs-keyword">values</span>:<span class="hljs-keyword">keys</span> <span class="hljs-keyword">for</span> <span class="hljs-keyword">keys</span>,<span class="hljs-keyword">values</span> in l_dict.items()} invert_dict</pre></div><figure id="afb0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*e9_kK2UfwVyLKtMWHIrAKA.png"><figcaption></figcaption></figure><h1 id="6af8">18. Pretty Dictionaries</h1><p id="e3d0">Create a dictionary :</p><div id="d0d7"><pre>l_dict = {<span class="hljs-symbol">'Student_ID</span>': <span class="hljs-number">4</span>,<span class="hljs-symbol">'Student_name</span>' : '<span class="hljs-type">Naina</span>', <span class="hljs-symbol">'Class_Name</span>': '<span class="hljs-number">12</span>th' ,<span class="hljs-symbol">'Student_marks</span>' : {'<span class="hljs-type">maths</span>' : 92, <span class="hljs-symbol">'science</span>' : 95, <span class="hljs-symbol">'computer</span> science' : 100, <span class="hljs-symbol">'English</span>' : 91} }</pre></div><p id="cd32">Original Dictionary :</p><figure id="3774"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ckFTkR-tLzyaXbu7eQaduw.png"><figcaption></figcaption></figure><p id="ea0d">Pretty dictionary using pprint:</p><div id="f07d"><pre><span class="hljs-keyword">import</span> pprint <span class="hljs-title">pprint</span>.pprint(l_dict)</pre></div><figure id="b041"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9wWH29sB5nG1W10WhKGOug.png"><figcaption>Pretty Dictionary</figcaption></figure><h1 id="7fb2">19. Convert List of list to list</h1><p id="d813">Import necessary libraries:</p><div id="06c9"><pre><span class="hljs-keyword">import</span> itertools</pre></div><p id="1adb">Create a list :</p><div id="5468"><pre>nested_list = [[<span class="hljs-string">'Naina'</span>], [<span class="hljs-string">'Alex'</span>, <span class="hljs-string">'Rhody'</span>], [<span class="hljs-string">'Sharron'</span>, <span class="hljs-string">'Avarto'</span>, \<span class="hljs-string">'Grace'</span>]] nested_list</pre></div><figure id="f64b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*49R87X-mBjtUaQVaJmObog.png"><figcaption></figcaption></figure><p id="c5c9">Convert the list to list :</p><div id="fe82"><pre>converted_list = <span class="hljs-built_in">list</span>(itertools<span class="hljs-selector-class">.chain</span><span class="hljs-selector-class">.from_iterable</span>(nested_list))</pre></div><div id="ef8b"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(converted_list)</span></span></pre></div><figure id="e622"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*uIOtr-Ve6KmBUDP_brL2Xw.png"><figcaption></figcaption></figure><h1 id="0b6c">20. Removing Emojis from Text</h1><div id="5419"><pre><span class="hljs-attr">Emoji_text</span> = <span class="hljs-string">'For example, 🤓🏃‍🏢 could mean “Iam running to work.”'</span> <span class="hljs-attr">final_text</span>=Emoji_text.encode(<span class="hljs-string">'ascii'</span>, <span class="hljs-string">'ignore'</span>).decode(<span class="hljs-string">'ascii'</span>)</pre></div><div id="e2d3"><pre><span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(<span class="hljs-string">"Raw tweet with Emoji:"</span>,Emoji_text)</span></span> <span class="hljs-function"><span class="hljs-title">print</span><span class="hljs-params">(<span class="hljs-string">"Final tweet withput Emoji:"</span>,final_text)</span></span></pre></div><figure id="508e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Vo8yn3KnaUSiwDBpFW9vTg.png"><figcaption>Output — Remove Emojis from Text</figcaption></figure><h1 id="08e7">21. Apply Pandas Operations in Parallel</h1><p id="1760">It’s used to distribute your pandas computations over all available CPUs on your computer to get a significant increase in the speed.</p><p id="0ebf">Install pandarallel :</p><div id="cf90"><pre>!pip <span class="hljs-keyword">install</span> pandarallel</pre></div><p id="c237">Import necessary libraries:</p><div id="c63c"><pre>%load_ext autoreload %autoreload <span class="hljs-number">2</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> time <span class="hljs-title">from</span> pandarallel <span class="hljs-keyword">import</span> pandarallel <span class="hljs-keyword">import</span> math <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> random <span class="hljs-title">from</span> tqdm._tqdm_notebook <span class="hljs-keyword">import</span> tqdm_notebook <span class="hljs-title">tqdm_notebook</span>.pandas()</pre></div><p id="6a69">Initialize pandarallel :</p><div id="b953"><pre>pandarallel.initialize(<span class="hljs-attribute">progress_bar</span>=<span class="hljs-literal">True</span>)</pre></div><figure id="0de3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qYVVRoogEWsTCFw3vpfKrg.png"><figcaption></figcaption></figure><p id="9f77">Dataframe:</p><div id="347b"><pre>df = pd.DataFrame({ <span class="hljs-string">'A'</span> : [<span class="hljs-type">random.randint</span>(<span class="hljs-number">8</span>,<span class="hljs-number">15</span>) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-keyword">range</span>(<span class="hljs-number">1</span>,<span class="hljs-number">100000</span>) ], <span class="hljs-string">'B'</span> : [<span class="hljs-type">random.randint</span>(<span class="hljs-number">10</span>,<span class="hljs-number">20</span>) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-keyword">range</span>(<span class="hljs-number">1</span>,<span class="hljs-number">100000</span>) ] })</pre></div><p id="b0a5">Trigono function:</p><div id="02d4"><pre>def trigono(x): <span class="hljs-keyword">return</span> <span class="hljs-built_in">math</span>.<span class="hljs-built_in">sin</span>(x.A**<span class="hljs-number">2</span>) + <span class="hljs-built_in">math</span>.<span class="hljs-built_in">sin</span>(x.B**<span class="hljs-number">2</span>) + <span class="hljs-built_in">math</span>.<span class="hljs-built_in">tan</span>(x.A**<span class="hljs-number">2</span>)</pre></div><p id="e371">Without parallelization:</p><div id="c279"><pre><span class="hljs-meta">%</span><span class="hljs-meta">%</span>time first = df.progress_apply<span class="hljs-comment">(trigono, axis=1)</span></pre></div><figure id="abf4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*BFj-gDynhcR8jHjZP6LWEA.png"><figcaption></figcaption></figure><p id="0983">With parallelization:</p><div id="8d9e"><pre><span class="hljs-meta">%</span><span class="hljs-meta">%</span>time first_parallel = df.parallel_apply<span class="hljs-comment">(trigono, axis=1)</span></pre></div><figure id="1103"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*DN_vKVqdTZSvlfbZhn5Z_A.png"><figcaption>Output — Apply Panda operations in parallel</figcaption></figure><h1 id="ec47">22. Pandas Cut and qcut</h1><p id="2671">In Pandas,</p><p id="2b35">cut command creates <b>equispaced bins</b> but the frequency of samples is <b>unequal in each bin</b></p><p id="e6a6">qcut command creates <b>unequal size bins</b> but the frequency of samples is <b>equal in each bin.</b></p><p id="477f">Import necessary Libraries:</p><div id="ff46"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np</pre></div><p id="f990">Dataframe:</p><div id="8562"><pre>df_rollno = pd<span class="hljs-selector-class">.DataFrame</span>({<span class="hljs-string">'Roll No'</span>: np<span class="hljs-selector-class">.random</span><span class="hljs-selector-class">.randint</span>(<span class="hljs-number">20</span>, <span class="hljs-number">55</span>, <span class="hljs-number">10</span>)}) df_rollno</pre></div><figure id="36b9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*AiwjmWyKSqm1ShuJ74OUyA.png"><figcaption></figcaption></figure><p id="5f56">Using Pandas cut function :</p><div id="0723"><pre>df_rollno<span class="hljs-selector-attr">[<span class="hljs-string">'roll_no_bins'</span>]</span> = pd<span class="hljs-selector-class">.cut</span>(x=df_rollno<span class="hljs-selector-attr">[<span class="hljs-string">'Roll No'</span>]</span>, bins=<span class="hljs-selector-attr">[20, 40, 50, 60]</span>)</pre></div><figure id="4368"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*7JoYZeRYkSVfD-trcgZSYQ.png"><figcaption>Output</figcaption></figure><p id="5131">Using Pandas qcut function:</p><div id="fcf7"><pre>pd.qcut(df_rollno[<span class="hljs-string">'Roll No'</span>], <span class="hljs-attribute">q</span>=6)</pre></div><figure id="54f1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*LvXXVi2O4uDPFWvZFHWL_Q.png"><figcaption>Output</figcaption></figure><h1 id="3266">23. Pandas Profiling</h1><p id="9ae2">It’s used to generates profile reports from a pandas DataFrame or data sheet.</p><p id="1bcf">Install Pandas Profiling:</p><div id="117d"><pre>pip <span class="hljs-keyword">install</span> pandas-profiling</pre></div><p id="8598">Import necessary libraries:</p><div id="67cc"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-keyword">import</span> pandas_profiling</pre></div><p id="9291">Load Data:</p><div id="8287"><pre><span class="hljs-attr">Youtube_data</span> = pd.read_csv(<span class="hljs-string">'/Users/priyeshkucchu/Desktop/USvideos.csv'</span>)</pre></div><p id="eec0">Generate Profiling report:</p><div id="a7b9"><pre><span class="hljs-attr">profiling_report</span> = pandas_profiling.ProfileReport(Youtube_data)</pre></div><figure id="76fc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*FdPkKZrfHUtZsZ3-k7fR_Q.png"><figcaption>Profiling report — Overview</figcaption></figure><figure id="1141"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6BaaD5Z2doaNj5jPwan5rg.png"><figcaption>Profiling report — Interactions</figcaption></figure><figure id="e806"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*222j6BFuGGqZxksgOHa4kg.png"><figcaption>Profiling report — Correlations</figcaption></figure><h1 id="26cc">All the Complete System Design Series Parts —</h1><blockquote id="f3fb"><p><a href="https://readmedium.com/complete-system-design-series-part-1-45bf9c8654bc"><b><i>1. System design basics</i></b></a></p></blockquote><blockquote id="c535"><p><a href="https://readmedium.com/complete-system-design-series-part-2-922f45f2faaf"><b><i>2. Horizontal and vertical scaling</i></b></a></p></blockquote><blockquote id="18a1"><p><a href="https://readmedium.com/part-3-complete-system-design-series-e1362baa8a4c"><b><i>3. Load balancing and Message queues</i></b></a></p></blockquote><blockquote id="4d43"><p><a href="https://readmedium.com/part-4-complete-system-design-series-138bc9fbcfc0"><b><i>4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture</i></b></a></p></blockquote><blockquote id="d211"><p><a href="https://readmedium.com/part-5-complete-system-design-series-4b9b04f23608"><b><i>5. Caching, Indexing, Proxies</i></b></a></p></blockquote><blockquote id="10ec"><p><a href="https://readmedium.com/part-6-complete-system-design-series-59a2d8bbf1ed"><b><i>6. Networking, How Browsers work, Content Network Delivery ( CDN)</i></b></a></p></blockquote><blockquote id="2fb1"><p><a href="https://readmedium.com/part-7-complete-system-design-series-1bef528923d6"><b><i>7. Database Sharding, CAP Theorem, Database schema Design</i></b></a></p></blockquote><blockquote id="982a"><p><a href="https://readmedium.com/part-8-complete-system-design-series-57bc88433c8e"><b><i>8. Concurrency, API, Components + OOP + Abstraction</i></b></a></p></blockquote><blockquote id="f09e"><p><a href="https://readmedium.com/part-9-complete-system-design-series-df975c85ec51"><b><i>9. Estimation and Planning, Performance</i></b></a></p></blockquote><blockquote id="9128"><p><b><i>10. <a href="https://readmedium.com/part-10-complete-system-design-series-523b4dd978bf?sk=741f92929c8639a2e4cf218521e8cc4a">Map Reduce, Patterns and Microservices</a></i></b></p></blockquote><blockquote id="f879"><p><b><i>11. <a href="https://naina0412.medium.com/part-11-complete-system-design-series-9c8efbc0237a?sk=5bddf2adc78ea4947ae88ab21c94af1c">SQL vs NoSQL and Cloud</a></i></b></p></blockquote><blockquote id="bdf5"><p><a href="https://readmedium.com/most-popular-system-design-questions-mega-compilation-45218129fe26"><b><i>12. Most Popular System Design Questions</i></b></a></p></blockquote><h1 id="a23a">Github —</h1><div id="b414" class="link-block"> <a href="https://github.com/Coder-World04/Complete-System-Design/blob/main/README.md"> <div> <div> <h2>Complete-System-Design/README.md at main · Coder-World04/Complete-System-Design</h2> <div><h3>This repository contains everything you need to become proficient in System Design Topics you should know in System…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><h1 id="0388">Thanks for Reading. Keep Learning :)</h1><h1 id="137a">Want to read programmers humor?</h1><div id="fd28" class="link-block"> <a href="https://readmedium.com/programming-humor-part-2-f92cf5a26f2b"> <div> <div> <h2>Programming Humor Part 2</h2> <div><h3>Keep laughing because it’s hilarious ….</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*xkCXqHz7vIXjmjD_.png)"></div> </div> </div> </a> </div><div id="1e2f" class="link-block"> <a href="https://readmedium.com/the-most-hilarious-code-comments-ever-bae3cb1030b5"> <div> <div> <h2>The Most Hilarious Code Comments Ever</h2> <div><h3>Programmer Humor: Yes, coders actually wrote them!</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*C-cPP9D2MIyeexAT.gif)"></div> </div> </div> </a> </div><div id="93a8" class="link-block"> <a href="https://readmedium.com/coding-sins-hilarious-developer-confessions-f55eb342454e"> <div> <div> <h2>Coding Sins: Hilarious Developer Confessions</h2> <div><h3>How ‘whiteboarding’ got mocked</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*JceCvoRHEHRXyHnb.jpeg)"></div> </div> </div> </a> </div><div id="052b" class="link-block"> <a href="https://readmedium.com/10-witty-programming-jokes-that-will-make-you-go-rofl-a53fbfb91943"> <div> <div> <h2>10 Witty Programming Jokes That Will Make You Go ROFL</h2> <div><h3>These are hilarious ….</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*c6MUlOF-1Z2Su0-E)"></div> </div> </div> </a> </div><h1 id="d281">Recommended Articles -</h1><div id="f7a3" class="link-block"> <a href="https://readmedium.com/python-iterators-generators-and-decorators-made-easy-659cae26054f"> <div> <div> <h2>Python Iterators, Generators And Decorators Made Easy</h2> <div><h3>A Quick Implementation Guide</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*XtVnWXUTVVE13f3-.jpeg)"></div> </div> </div> </a> </div><div id="70ed" class="link-block"> <a href="https://readmedium.com/23-data-science-techniques-you-should-know-61bc2c9d1b3a"> <div> <div> <h2>23 Data Science Techniques You Should Know!</h2> <div><h3>Save your precious time by using these hacks</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*222j6BFuGGqZxksgOHa4kg.png)"></div> </div> </div> </a> </div><div id="b8f3" class="link-block"> <a href="https://readmedium.com/coding-sins-hilarious-developer-confessions-f55eb342454e"> <div> <div> <h2>Coding Sins: Hilarious Developer Confessions</h2> <div><h3>How ‘whiteboarding’ got mocked</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*JceCvoRHEHRXyHnb.jpeg)"></div> </div> </div> </a> </div><div id="c55e" class="link-block"> <a href="https://readmedium.com/5-cool-advanced-pandas-techniques-for-data-scientists-c5a59ae0625d"> <div> <div> <h2>5 Cool Advanced Pandas Techniques for Data Scientists</h2> <div><h3>Use these techniques …</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*nd1WG4uRgLzMQr8P.jpeg)"></div> </div> </div> </a> </div><div id="bbb9" class="link-block"> <a href="https://readmedium.com/stack-overflow-analyzed-data-from-60-000-software-developers-hours-they-work-languages-they-476ac6ca0197"> <div> <div> <h2>Stack Overflow Analyzed Data from 60,000+ Software Developers — Hours They Work, Languages They…</h2> <div><h3>Here is what they found…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*LWGz2247yyjKfW6g.png)"></div> </div> </div> </a> </div><div id="4965" class="link-block"> <a href="https://readmedium.com/advanced-python-made-easy-part-4-a4996ba9fe19"> <div> <div> <h2>Advanced Python Made Easy — Part 4</h2> <div><h3>Use these hacks and techniques…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*nd1WG4uRgLzMQr8P.jpeg)"></div> </div> </div> </a> </div><div id="1938" class="link-block"> <a href="https://readmedium.com/advanced-python-made-easy-part-1-ce1e2f17431e"> <div> <div> <h2>Advanced Python Made Easy — Part 1</h2> <div><h3>Use these hacks and techniques…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*nd1WG4uRgLzMQr8P.jpeg)"></div> </div> </div> </a> </div></article></body>

23 Data Science Techniques You Should Know!

Save your precious time by using these hacks

Gif (Source and credits: Giphy)

Data scientists are high in demand. The job of a data scientist is not easy, so it’s important to know a few data science hacks that can save your precious time and make your life simpler. In this post, I’m going to cover 23 data science hacks that I have used.

Projects Videos —

All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).

Subscribe today!

Here are some essential data science techniques that every data scientist should know:

  1. Data Exploration and Visualization: Understanding your data is critical for building effective models. This includes exploring the distribution of variables, identifying outliers and missing values, and creating visualizations to help identify patterns and relationships in the data.
  2. Data Preprocessing: Preprocessing is an important step in preparing your data for analysis and modeling. This may include imputing missing values, transforming variables, and scaling data to ensure all variables are on a similar scale.
  3. Feature Engineering: Feature engineering involves creating new variables or transforming existing variables to improve the performance of a model. This can include creating interactions between variables, binning variables, or extracting features from text data.
  4. Model Selection: There are many algorithms to choose from when building a model, and selecting the best algorithm depends on the specific problem you are trying to solve and the properties of your data. Techniques such as cross-validation can be used to compare the performance of different algorithms and select the best one.
  5. Model Evaluation: Once you have built a model, it’s important to evaluate its performance to ensure it generalizes well to new data. Common metrics for evaluating the performance of a model include accuracy, precision, recall, and F1 score.
  6. Hyperparameter tuning: Many machine learning algorithms have hyperparameters that control their behavior. Fine-tuning these hyperparameters can significantly improve the performance of a model. Common techniques for hyperparameter tuning include grid search and random search.
  7. Ensemble Methods: Ensemble methods are techniques that combine the predictions of multiple models to produce a single, more accurate prediction. Common ensemble methods include bagging, random forests, and boosting.

1. Image Augmentation:

Image Augmentation is a very powerful technique that is used to create new and different images from the existing images. It is used to address issues associated with limited data in machine learning.

Import all the necessary libraries :

# importing all the required libraries
%matplotlib inline
import skimage.io as io
from skimage.transform import rotate
import numpy as np
import matplotlib.pyplot as plt

Read Image :

img= io.imread('/Users/priyeshkucchu/Desktop/image.jpeg')

Define Augment function :

def augment_img(img):
    fig,ax = plt.subplots(nrows=1,ncols=5,figsize=(22,12))
    ax[0].imshow(img)
    ax[0].axis('off')
    ax[1].imshow(rotate(img, angle=45, mode = 'wrap'))
    ax[1].axis('off')
    ax[2].imshow(np.fliplr(img))
    ax[2].axis('off')
    ax[3].imshow(np.flipud(img))
    ax[3].axis('off')
    ax[4].imshow(np.rot90(img))
    ax[4].axis('off')       
augment_img(img)
Output: Augmented Image

Highly Recommended Data Science and Machine Learning Courses that you MUST take ( with certificate) —

Complete Data Scientist

Complete Data Analyst

Complete Data Engineering

Complete Machine Learning Engineer

Complete Deep Learning

Complete Natural Language Processing

Complete Self Driving Car Engineer

2. Pandas Boolean Indexing

It’s a type of indexing method in which we can select subsets of data based on the actual values of the data in the DataFrame using a boolean vector to filter the data.

Import necessary libraries

import pandas as pd

Load Data

ytdata= pd.read_csv('/Users/priyeshkucchu/Desktop/USvideos.csv')

Boolean Indexing — Show only those rows where category_id is 24 and no of likes is greater than 12000

ytdata.loc[(ytdata['category_id']==24)& (ytdata['likes']>12000),\["category_id","likes"]].head()
Output

Some of the other best Series —

60 days of Data Science and ML Series with projects

How to solve any System Design Question ( approach that you can take)?

30 Days of Natural Language Processing ( NLP) Series

30 days of Machine Learning Ops

30 days of Data Structures and Algorithms and System Design Simplified

60 Days of Deep Learning with Projects Series

30 days of Data Engineering with projects Series

Data Science and Machine Learning Research ( papers) Simplified **

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

Exceptional Github Repos — Part 1

Exceptional Github Repos — Part 2

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

3. Pandas Pivot Table

In Pandas, the pivot table function takes a data frame as input and performs grouped operations that provide a multidimensional summarization of the data.

Import necessary libraries :

import pandas as pd
import numpy as np

Load Data :

loan = pd.read_csv('/Users/priyeshkucchu/Desktop/loan_train.csv', \ index_col = 'Loan_ID')

Show data :

loan.head()
Loan Data

Pivot Table :

pivot = loan.pivot_table(values = ['LoanAmount'],index = ['Gender', \'Married','Dependents', 'Self_Employed'], aggfunc = np.median)
Output pivot table

4. Pandas Apply

In Pandas, the .apply() function helps to segregate data based on the conditions as defined by the user.

Import necessary libraries

import pandas as pd

Load Data

ytdata= pd.read_csv('/Users/priyeshkucchu/Desktop/USvideos.csv')

Function Missing Values —

def missing_values(x):
    return sum(x.isnull())

For missing values in the columns —

print(" Missing values in each column :")
ytdata.apply(missing_values,axis=0)

Output —

Missing values in each column :
video_id                    0
trending_date               0
title                       0
channel_title               0
category_id                 0
publish_time                0
tags                        0
views                       0
likes                       0
dislikes                    0
comment_count               0
thumbnail_link              0
comments_disabled           0
ratings_disabled            0
video_error_or_removed      0
description               502
dtype: int64

For missing values in the rows —

print(" Missing values in each row :")
ytdata.apply(missing_values,axis=1).head()

Output —

Missing values in each row :
0    0
1    0
2    0
3    0
4    0
dtype: int64

5. Pandas Count

In pandas, the count function helps in counting Non-NA cells for each column or row.

Import necessary libraries :

import pandas as pd

Load Data :

ytdata= pd.read_csv('/Users/priyeshkucchu/Desktop/USvideos.csv')

Count no of data points in each column :

ytdata.count(axis=0)
Output — Count of data points in each column

Count no. of null data points in the Description column

ytdata.description.isnull().value_counts()
Output — No of null data points in the description column

6. Pandas Crosstab

In Pandas, this function is used to compute a simple cross-tabulation of two or more factors.

Import necessary libraries :

import pandas as pd

Load Data :

data = pd.read_csv('/Users/priyeshkucchu/Desktop/loan_train.csv',\ index_col = 'Loan_ID')

Cross tab between Credit History and Self Employed columns in the loan data :

pd.crosstab(data["Credit_History"],data["Self_Employed"],\
margins=True, normalize = False)
Output

7. Pandas str.split

In Pandas, str.split function is used to provide a method to split string around a passed separator or a delimiter.

Import necessary libraries :

import pandas as pd

Create a Data Frame :

df = pd.DataFrame({'Person_name':['Naina Chaturvedi', 'Alvaro Morte', 'Alex Pina', 'Steve Jobs']})
df
Data Frame df

Extract First and Last Names:

df['first_name'] = df['Person_name'].str.split(' ',expand = True)[0]
df['last_name'] = df['Person_name'].str.split(' ', expand = True)[1]
df
Output — Extract First and Last Name using pandas str.splt()

8. Extract E-mail from text

Import the necessary libraries and initialize the text :

import re
Enquiries_text = 'For any enquiries or feedback related to our product,\service, marketing promotions or other general support \ matters. [email protected]’'

Extract email using Regular Expression :

re.findall(r"([\w.-]+@[\w.-]+)", Enquiries_text)
Output — Extract Email from Text

9. Pandas melt

In pandas, melt function is used to reshape the data frame to a longer form.

Import necessary libraries :

import pandas as pd

Create a Data Frame :

df = pd.DataFrame({'Person_Name': {0: 'Naina', 1: 'Alex', 2: \'Avarto'}, 'CourseName': {0: 'Masters', 1: 'Graduate', 2: \'Graduate'}, 'Age': {0: 27, 1: 20, 2: 22}})

Melt two data frames :

m1= pd.melt(df, id_vars =['Person_Name'], value_vars =['CourseName', 'Age'])
m1
Output — m1 Dataframe
m2= pd.melt(df, id_vars =['Person_Name'], value_vars =['Age'])
m2
Output — m2 Dataframe

10. Extract Continuous and categorical data

Import necessary libraries :

import pandas as pd

Load Data :

Loan_data = pd.read_csv('/Users/priyeshkucchu/Desktop/loan_train.csv')
Loan_data.shape

Output: (614, 13)

Check data types of columns :

Loan_data.dtypes
Output — Data Types of columns in Loan Data

Extract columns containing only categorical data:

categorical_variables = Loan_data.select_dtypes("object").head()
categorical_variables.head()

Extract columns containing only integer data:

integer_variables = Loan_data.select_dtypes("int64")
integer_variables.head()

Extract columns containing only numerical data:

numeric_variables = Loan_data.select_dtypes("number")
numeric_variables.head()

11. Pandas Eval function for efficient operations

The eval() function in Pandas uses string expressions to efficiently compute operations using a Data Frame.

Import necessary libraries :

import pandas as pd
import numpy as np

Initialize no_rows, no_cols:

no_rows, no_cols = 100000, 100
r = np.random.RandomState(50)
df1, df2, df3, df4 = (pd.DataFrame(r.rand(no_rows, no_cols))
                      for i in range(4))

Without Eval function

%timeit df1 + df2 * df3 - df4
Output without Eval function

With Eval function — The eval() version of this expression is about 50% faster and uses much less memory

%timeit pd.eval('df1 + df2 * df3 - df4')
Output with Eval function

12. Pandas Unique

In pandas, using unique function values that are unique are returned in order of appearance.

Import necessary libraries :

import pandas as pd
import numpy as np

Load Data :

crime_data = pd.read_csv("/Users/priyeshkucchu/Desktop/crime.csv",\ engine='python')

Show data :

crime_data.head()

Show Unique values in the District Codes Column:

crime_data["DISTRICT"].unique()
Output — Unique values in the District codes column

13. Ipython Interactive Shell

Import necessary libraries :

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import pandas as pd

Load Data :

data = pd.read_csv('/Users/priyeshkucchu/Desktop/loan_train.csv')

Run commands simultaneously:

data.shape
data.head()
data.dtypes
data.info()
Output

14. Pandas Merge

In pandas, the merge function is used to join two datasets together based on common columns between them.

Import necessary libraries :

import pandas as pd

Initialize Data Frames :

df1 = pd.DataFrame({'Left_key': ['Naina', 'Avarto', 'Alex', \'Naina'],'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'Right_key': ['Naina', 'Avarto', 'Alex', \'Naina'],'value': [5, 6, 7, 8]})
DataFrames d1 and d2

Merge the data frames :

df1.merge(df2, left_on='Left_key', right_on='Right_key', \
suffixes=('_Left', '_Right'))
Output — Merge the data frames

15. Parse dates in read_csv() to change data type to DateTime

Import necessary libraries :

import pandas as pd

Load Data and print the data types of crime data columns:

crime_data = pd.read_csv("/Users/priyeshkucchu/Desktop/crime.csv", \ engine='python')
crime_data.dtypes

Parse Dates in read_csv():

crime_data = pd.read_csv("/Users/priyeshkucchu/Desktop/crime.csv", engine='python',parse_dates = ["OCCURRED_ON_DATE"])
crime_data.dtypes
Output — Parse dates in read_csv for column OCCURRED_ON_DATE

16. Date Parser

Import necessary libraries :

import datetime
import dateutil.parser

Parse Dates:

input_date = '04th Dec 2020'
parsed_date = dateutil.parser.parse(input_date)

Output date in the designated format :

op_date = datetime.datetime.strftime(parsed_date, '%Y-%m-%d')
print(op_date)
Output Date

17. Invert a Dictionary

Create a dictionary :

l_dict = {'Person_Name':'Naina',
           'Age' : 27,
           'Profession' : 'Software Engineer'
           }

Original Dictionary :

Invert dictionary :

invert_dict = {values:keys for keys,values in l_dict.items()}
invert_dict

18. Pretty Dictionaries

Create a dictionary :

l_dict = {'Student_ID': 4,'Student_name' : 'Naina', 'Class_Name': '12th' ,'Student_marks' : {'maths' : 92,
                            'science' : 95,
                            'computer science' : 100,
                            'English' : 91}
          }

Original Dictionary :

Pretty dictionary using pprint:

import pprint
pprint.pprint(l_dict)
Pretty Dictionary

19. Convert List of list to list

Import necessary libraries:

import itertools

Create a list :

nested_list = [['Naina'], ['Alex', 'Rhody'], ['Sharron', 'Avarto', \'Grace']]
nested_list

Convert the list to list :

converted_list = list(itertools.chain.from_iterable(nested_list))
print(converted_list)

20. Removing Emojis from Text

Emoji_text = 'For example, 🤓🏃‍🏢 could mean “Iam running to work.”'
final_text=Emoji_text.encode('ascii', 'ignore').decode('ascii')
print("Raw tweet with Emoji:",Emoji_text)  
print("Final tweet withput Emoji:",final_text)
Output — Remove Emojis from Text

21. Apply Pandas Operations in Parallel

It’s used to distribute your pandas computations over all available CPUs on your computer to get a significant increase in the speed.

Install pandarallel :

!pip install pandarallel

Import necessary libraries:

%load_ext autoreload
%autoreload 2
import pandas as pd
import time
from pandarallel import pandarallel
import math
import numpy as np
import random
from tqdm._tqdm_notebook import tqdm_notebook
tqdm_notebook.pandas()

Initialize pandarallel :

pandarallel.initialize(progress_bar=True)

Dataframe:

df = pd.DataFrame({
    'A' : [random.randint(8,15) for i in range(1,100000) ],
    'B' : [random.randint(10,20) for i in range(1,100000) ]
})

Trigono function:

def trigono(x):
    return math.sin(x.A**2) + math.sin(x.B**2) + math.tan(x.A**2)

Without parallelization:

%%time
first = df.progress_apply(trigono, axis=1)

With parallelization:

%%time
first_parallel = df.parallel_apply(trigono, axis=1)
Output — Apply Panda operations in parallel

22. Pandas Cut and qcut

In Pandas,

cut command creates equispaced bins but the frequency of samples is unequal in each bin

qcut command creates unequal size bins but the frequency of samples is equal in each bin.

Import necessary Libraries:

import pandas as pd
import numpy as np

Dataframe:

df_rollno = pd.DataFrame({'Roll No': np.random.randint(20, 55, 10)})
df_rollno

Using Pandas cut function :

df_rollno['roll_no_bins'] = pd.cut(x=df_rollno['Roll No'], bins=[20, 40, 50, 60])
Output

Using Pandas qcut function:

pd.qcut(df_rollno['Roll No'], q=6)
Output

23. Pandas Profiling

It’s used to generates profile reports from a pandas DataFrame or data sheet.

Install Pandas Profiling:

pip install pandas-profiling

Import necessary libraries:

import pandas as pd
import pandas_profiling

Load Data:

Youtube_data = pd.read_csv('/Users/priyeshkucchu/Desktop/USvideos.csv')

Generate Profiling report:

profiling_report = pandas_profiling.ProfileReport(Youtube_data)
Profiling report — Overview
Profiling report — Interactions
Profiling report — Correlations

All the Complete System Design Series Parts —

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

Github —

Thanks for Reading. Keep Learning :)

Want to read programmers humor?

Recommended Articles -

Data Science
Machine Learning
Programming
Tech
Artificial Intelligence
Recommended from ReadMedium
avatarAyesha sidhikha
Numpy Introduction(Data Analysis)

NumPy

10 min read