avatarVincent Schröder

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3926

Abstract

egorical variables. If the number of categories are few compared to the total number values, it is better to use the category data type instead of object. It saves a great amount of memory depending on the data size.</p><p id="4b45">The following code will go over columns with object data type. If the number of categories are less than 5 percent of the total number of values, the data type of the column will be changed to category.</p><div id="0b82"><pre>cols = marketing<span class="hljs-selector-class">.select_dtypes</span>(include=<span class="hljs-string">'object'</span>)<span class="hljs-selector-class">.columns</span> <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> cols: ratio = <span class="hljs-built_in">len</span>(marketing<span class="hljs-selector-attr">[col]</span><span class="hljs-selector-class">.value_counts</span>()) / <span class="hljs-built_in">len</span>(marketing) <span class="hljs-keyword">if</span> ratio < <span class="hljs-number">0.05</span>: marketing<span class="hljs-selector-attr">[col]</span> = marketing<span class="hljs-selector-attr">[col]</span><span class="hljs-selector-class">.astype</span>(<span class="hljs-string">'category'</span>)</pre></div><p id="56aa">We have done three steps of data cleaning and manipulation. Depending on the task, the number of steps might be more.</p><p id="7ed1">Let’s create a pipe that accomplish all these tasks.</p><p id="af89">The pipe function takes functions as inputs. These functions need to take a dataframe as input and return a dataframe. Thus, we need to define functions for each task.</p><div id="4e5c"><pre>def drop_missing(df): thresh = len(df) * 0.6 df.dropna(<span class="hljs-attribute">axis</span>=1, <span class="hljs-attribute">thresh</span>=thresh, <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>) return df </pre></div><div id="3a6e"><pre>def remove_outliers(df, <span class="hljs-built_in">column_name</span>): low = np.quantile(df[<span class="hljs-built_in">column_name</span>], <span class="hljs-number">0.05</span>) high = np.quantile(df[<span class="hljs-built_in">column_name</span>], <span class="hljs-number">0.95</span>) <span class="hljs-keyword">return</span> df[df[<span class="hljs-built_in">column_name</span>].<span class="hljs-keyword">between</span>(low, high, inclusive=<span class="hljs-keyword">True</span>)]</pre></div><div id="dcd6"><pre>def to_category(<span class="hljs-built_in">df</span>): cols = df.select_dtypes(include=<span class="hljs-string">'object'</span>).columns <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> cols: ratio = len(<span class="hljs-built_in">df</span>[col].value_counts()) / len(<span class="hljs-built_in">df</span>) <span class="hljs-keyword">if</span> ratio < 0.05: <span class="hljs-built_in">df</span>[col] = <span class="hljs-built_in">df</span>[col].astype(<span class="hljs-string">'category'</span>) <span class="hljs-built_in">return</span> <span class="hljs-built_in">df</span></pre></div><p id="4e1f">You may argue that what the point is if we need to define functions. It does not seem like simplifying the workflow. You are right for one particular task but we need to think more generally. Consider you are doing the same operations many times. In such case, creating a pipe makes the process easier and also provides cleaner code.</p><p id="f187">We have mentioned that the pipe function takes a function as input. If the function we pass to the pipe function has any arguments, we can pass it to the pipe function along with the function. It makes the pipe function even more efficient.</p><p id="4d8c">For instance, the remove_outliers function takes a column name as argument. The function removes the outliers in that column.</p><p id="b1c0">We can now create our pipe.</

Options

p><div id="df66"><pre>marketing_cleaned = (<span class="hljs-name">marketing</span>. pipe(<span class="hljs-name">drop_missing</span>). pipe(<span class="hljs-name">remove_outliers</span>, 'Salary'). pipe(<span class="hljs-name">to_category</span>))</pre></div><p id="10f5">It looks neat and clean. We can add as many steps as needed. The only criterion is that the functions in the pipe should take a dataframe as argument and return a dataframe. Just like with the remove_outliers function, we can pass the arguments of the functions to the pipe function as an argument. This flexibility makes the pipes more useful.</p><p id="b9fe">One important thing to mention is that the pipe function modifies the original dataframe. We should avoid changing the original dataset if possible.</p><p id="2d9c">To overcome this issue, we can use a copy of the original dataframe in the pipe. Furthermore, we can add a step that makes a copy of the dataframe in the beginning of the pipe.</p><div id="95d3"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">copy_df</span>(<span class="hljs-params">df</span>): <span class="hljs-keyword">return</span> df.copy()</pre></div><div id="f711"><pre>marketing_cleaned = (<span class="hljs-name">marketing</span>. pipe(<span class="hljs-name">copy_df</span>). pipe(<span class="hljs-name">drop_missing</span>). pipe(<span class="hljs-name">remove_outliers</span>, 'Salary'). pipe(<span class="hljs-name">to_category</span>))</pre></div><p id="0425">Our pipeline is complete now. Let’s compare the original dataframe with the cleaned to confirm it is working.</p><div id="b937"><pre>marketing.<span class="hljs-built_in">shape</span> (<span class="hljs-number">1000</span>,<span class="hljs-number">10</span>)</pre></div><div id="e53b"><pre>marketing.dtypes <span class="hljs-type">Age</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Gender</span> <span class="hljs-keyword">object</span> <span class="hljs-type">OwnHome</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Married</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Location</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Salary</span> <span class="hljs-type">int64</span> <span class="hljs-type">Children</span> <span class="hljs-type">int64</span> <span class="hljs-type">History</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Catalogs</span> <span class="hljs-type">int64</span> <span class="hljs-type">AmountSpent</span> <span class="hljs-type">int64</span> </pre></div><div id="3641"><pre><span class="hljs-title">marketing_cleaned</span>.dtypes (<span class="hljs-number">900</span>,<span class="hljs-number">10</span>)</pre></div><div id="e065"><pre>marketing_cleaned.dtypes Age category Gender category OwnHome category Married category Location category Salary <span class="hljs-built_in">int64</span> Children <span class="hljs-built_in">int64</span> History category Catalogs <span class="hljs-built_in">int64</span> AmountSpent <span class="hljs-built_in">int64</span></pre></div><p id="9b63">The pipeline is working as expected.</p><h2 id="8b85">Conclusion</h2><p id="2c89">The pipes provide cleaner and more maintainable syntax for data analysis. Another advantage is that they automatize the steps of data cleaning and manipulation.</p><p id="a29e">If you are doing the same operations over and over, you should definitely consider creating a pipeline.</p><p id="ae6e">Thank you for reading. Please let me know if you have any feedback.</p></article></body>

Asynchronously loaded Javascript — how to do it with promises.

Javascript: How to execute code from an asynchronously loaded script although when it is not loaded completely (Promises)

Sometimes you have to fire an action or execute code before a script is completely loaded in the browser (for example an asynchronously loaded external script). To prevent that these actions resulting in an error or get completely lost, you can use a promise which resolves, when the script is loaded. You can then fire the actions inside the promise and be sure that everything get executed in the correct order when the script is loaded.

Concept Code (copy paste to console)

var x = new Promise(function (resolve, reject) {
    setTimeout(resolve, 3000);
});
x.then(function () {
    console.log('1');
});
x.then(function () {
    console.log('2');
});
x.then(function () {
    console.log('3');
});

This should fire: “1,2,3” after 3 seconds when the promise is resolved. That’s the basic concept. Now we bring this to a real world scenario.

Real world example: loadScriptAsync()

We instantiate a new promise and create inside the promise a new script tag. We bind our promise resolver to the onload event, which is fired when the script is completely loaded and then resolve the promise.

The promise is saved in the “scriptLoaded” variable.

<script>
var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
      resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
}
var scriptLoaded = loadScriptAsync('external-script.js');
</script>

Firing the event via the scriptLoaded promise

Now every time when you want to use methods from the external script you can use the promise like this:

<script>
scriptLoaded.then(function(){
  window.extvar.execute('test');
});
</script>

Copy & paste script

Try it! Create an index.html and copy this code.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Test</title>
<script>
var loadScriptAsync = function(uri){
  return new Promise((resolve, reject) => {
    var tag = document.createElement('script');
    tag.src = uri;
    tag.async = true;
    tag.onload = () => {
      resolve();
    };
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
});
}
var scriptLoaded = loadScriptAsync('external-script.js');
</script>
<script>
scriptLoaded.then(function(){
  window.extvar.execute('test');
});
</script>
</head>
<body>
</body>
</html>

Example external script: external-script.js

Use this as an external script, if you don’t have already one.

window.extvar = {};
window.extvar.execute = function(a){ console.log('external event: ' + a); }
console.log('external script loaded');

Of course that script does not handle errors or reject the promise when the script is not loaded etc... I just want to make the concept clear.

New at Promises?

Have a look in this nice introduction:

More about the basic principles behind promises you can find here:

https://github.com/mattdesl/promise-cookbook#the-problem

Happy async coding!

Promises
JavaScript
Asynchron
Recommended from ReadMedium