avatarMelissa Gouty

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3926

Abstract

egorical variables. If the number of categories are few compared to the total number values, it is better to use the category data type instead of object. It saves a great amount of memory depending on the data size.</p><p id="4b45">The following code will go over columns with object data type. If the number of categories are less than 5 percent of the total number of values, the data type of the column will be changed to category.</p><div id="0b82"><pre>cols = marketing<span class="hljs-selector-class">.select_dtypes</span>(include=<span class="hljs-string">'object'</span>)<span class="hljs-selector-class">.columns</span> <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> cols: ratio = <span class="hljs-built_in">len</span>(marketing<span class="hljs-selector-attr">[col]</span><span class="hljs-selector-class">.value_counts</span>()) / <span class="hljs-built_in">len</span>(marketing) <span class="hljs-keyword">if</span> ratio < <span class="hljs-number">0.05</span>: marketing<span class="hljs-selector-attr">[col]</span> = marketing<span class="hljs-selector-attr">[col]</span><span class="hljs-selector-class">.astype</span>(<span class="hljs-string">'category'</span>)</pre></div><p id="56aa">We have done three steps of data cleaning and manipulation. Depending on the task, the number of steps might be more.</p><p id="7ed1">Let’s create a pipe that accomplish all these tasks.</p><p id="af89">The pipe function takes functions as inputs. These functions need to take a dataframe as input and return a dataframe. Thus, we need to define functions for each task.</p><div id="4e5c"><pre>def drop_missing(df): thresh = len(df) * 0.6 df.dropna(<span class="hljs-attribute">axis</span>=1, <span class="hljs-attribute">thresh</span>=thresh, <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>) return df </pre></div><div id="3a6e"><pre>def remove_outliers(df, <span class="hljs-built_in">column_name</span>): low = np.quantile(df[<span class="hljs-built_in">column_name</span>], <span class="hljs-number">0.05</span>) high = np.quantile(df[<span class="hljs-built_in">column_name</span>], <span class="hljs-number">0.95</span>) <span class="hljs-keyword">return</span> df[df[<span class="hljs-built_in">column_name</span>].<span class="hljs-keyword">between</span>(low, high, inclusive=<span class="hljs-keyword">True</span>)]</pre></div><div id="dcd6"><pre>def to_category(<span class="hljs-built_in">df</span>): cols = df.select_dtypes(include=<span class="hljs-string">'object'</span>).columns <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> cols: ratio = len(<span class="hljs-built_in">df</span>[col].value_counts()) / len(<span class="hljs-built_in">df</span>) <span class="hljs-keyword">if</span> ratio < 0.05: <span class="hljs-built_in">df</span>[col] = <span class="hljs-built_in">df</span>[col].astype(<span class="hljs-string">'category'</span>) <span class="hljs-built_in">return</span> <span class="hljs-built_in">df</span></pre></div><p id="4e1f">You may argue that what the point is if we need to define functions. It does not seem like simplifying the workflow. You are right for one particular task but we need to think more generally. Consider you are doing the same operations many times. In such case, creating a pipe makes the process easier and also provides cleaner code.</p><p id="f187">We have mentioned that the pipe function takes a function as input. If the function we pass to the pipe function has any arguments, we can pass it to the pipe function along with the function. It makes the pipe function even more efficient.</p><p id="4d8c">For instance, the remove_outliers function takes a column name as argument. The function removes the outliers in that column.</p><p id="b1c0">We can now create our pipe.</

Options

p><div id="df66"><pre>marketing_cleaned = (<span class="hljs-name">marketing</span>. pipe(<span class="hljs-name">drop_missing</span>). pipe(<span class="hljs-name">remove_outliers</span>, 'Salary'). pipe(<span class="hljs-name">to_category</span>))</pre></div><p id="10f5">It looks neat and clean. We can add as many steps as needed. The only criterion is that the functions in the pipe should take a dataframe as argument and return a dataframe. Just like with the remove_outliers function, we can pass the arguments of the functions to the pipe function as an argument. This flexibility makes the pipes more useful.</p><p id="b9fe">One important thing to mention is that the pipe function modifies the original dataframe. We should avoid changing the original dataset if possible.</p><p id="2d9c">To overcome this issue, we can use a copy of the original dataframe in the pipe. Furthermore, we can add a step that makes a copy of the dataframe in the beginning of the pipe.</p><div id="95d3"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">copy_df</span>(<span class="hljs-params">df</span>): <span class="hljs-keyword">return</span> df.copy()</pre></div><div id="f711"><pre>marketing_cleaned = (<span class="hljs-name">marketing</span>. pipe(<span class="hljs-name">copy_df</span>). pipe(<span class="hljs-name">drop_missing</span>). pipe(<span class="hljs-name">remove_outliers</span>, 'Salary'). pipe(<span class="hljs-name">to_category</span>))</pre></div><p id="0425">Our pipeline is complete now. Let’s compare the original dataframe with the cleaned to confirm it is working.</p><div id="b937"><pre>marketing.<span class="hljs-built_in">shape</span> (<span class="hljs-number">1000</span>,<span class="hljs-number">10</span>)</pre></div><div id="e53b"><pre>marketing.dtypes <span class="hljs-type">Age</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Gender</span> <span class="hljs-keyword">object</span> <span class="hljs-type">OwnHome</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Married</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Location</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Salary</span> <span class="hljs-type">int64</span> <span class="hljs-type">Children</span> <span class="hljs-type">int64</span> <span class="hljs-type">History</span> <span class="hljs-keyword">object</span> <span class="hljs-type">Catalogs</span> <span class="hljs-type">int64</span> <span class="hljs-type">AmountSpent</span> <span class="hljs-type">int64</span> </pre></div><div id="3641"><pre><span class="hljs-title">marketing_cleaned</span>.dtypes (<span class="hljs-number">900</span>,<span class="hljs-number">10</span>)</pre></div><div id="e065"><pre>marketing_cleaned.dtypes Age category Gender category OwnHome category Married category Location category Salary <span class="hljs-built_in">int64</span> Children <span class="hljs-built_in">int64</span> History category Catalogs <span class="hljs-built_in">int64</span> AmountSpent <span class="hljs-built_in">int64</span></pre></div><p id="9b63">The pipeline is working as expected.</p><h2 id="8b85">Conclusion</h2><p id="2c89">The pipes provide cleaner and more maintainable syntax for data analysis. Another advantage is that they automatize the steps of data cleaning and manipulation.</p><p id="a29e">If you are doing the same operations over and over, you should definitely consider creating a pipeline.</p><p id="ae6e">Thank you for reading. Please let me know if you have any feedback.</p></article></body>

Here’s an Alternative For Quick Success: Go Slow

“If you look really closely, most overnight successes took a long time.”

Photo by LOGAN WEAVER on Unsplash

It’s not about speed.

I sometimes feel discouraged by the number of articles I read about bloggers who got 1000 followers in 30 days. Or how they made thousands of dollars in the first four months. Or what it was like when their post went viral and had 10,000 views in a few days.

Just yesterday, I got snarky with my friend because she made the comment that her business had grown so fast and become so lucrative in just a few months because she worked hard. I snapped like a high-school girl in heat. I, too, have worked hard. Very hard. But she got to “success” much, much faster than I have.

Hard work does not equate to the speed of success.

Today, I apologized to her. Some people just get to their destination faster than others. My speed to success is a slow burn, but that doesn’t mean I’m not going to get there. . . eventually.

No matter how hard I work, how many hours a day I work, how many times I post, my growth is slow and my income is small. (I’d like to write one of those articles that frustrate me with a headline that screams, “QUADRUPLE Your Income On Medium In Six Short Months,” but I don’t think going from $5.00 to $20.00 would impress anybody.)

Mine is not a flashy race; it’s the slow, laborious, tortured-turtle walk, one step small step at a time.

It’s not about publishing every day.

I wish I could publish every day, and I am envious of people who dash out posts in twenty minutes and crank out multiple pieces of writing every day. The gene for speed-writing is not included in my DNA. For me, I do research. I put in links. I anguish over pictures. Dozens of headline possibilities are run through an analyzer before I develop one that works. It takes me several hours to write anything worth posting — sometimes several days — to complete an article.

I’m a slow writer.

But I’m steady, plodding, and persistent. And that’s how I’m going to get to success, not in the quick flash of an exploding skyrocket, but in the guise of a plodding turtle. Not like the ambitious, young, self-assured, rising stars, but like the sure-footed, steady-in-the-boat, sixty-two-year-old woman I am.

Doing continual, daily, consistent work is the way to achieve success, whatever your definition of it is. While a lucky few may catapult to the top, the vast majority of us are determined “plodders” making slow progress on our ascent to success. We’re taking consistent, small, mundane steps in a forward direction.

It comes one slow step at a time.

No matter what you do, whether you’re a writer, an artist, an entrepreneur, a manufacturer, or an accountant, you can do something to move your career forward one small step at a time.

  • Do something to your website every day. Upload a blog, a new photo, a quote, or a new reference. Fresh content gives you more credibility and improves Google rankings.
  • Do research for future projects. For me, this means finding sources, taking notes, creating outlines…. For you, that might mean networking with colleagues, contacting experts, gathering data, doing sketches….. No matter what the task, you’re moving ahead.
  • Gather inspiration by reading, listening to music, and looking at picture books.
  • Prepare rough drafts, reports, charts, infographics, or materials for upcoming jobs.
  • Market yourself. No matter what your business is, you can never stop looking for future work or prospective clients. Do some kind of outreach every day: make phone calls, send emails, search job posting sites.
  • No one ever knows enough. Each and every day — workday or play day — you can at least read blogs, trade journals, or industry newsletters.
  • Listen to a podcast, take a course, or attend a webinar to build your knowledge base. A short period of time reaps huge benefits to spotting new trends, gaining new perspectives, and finding industry thought-leaders.

One small step per day; one giant leap toward success in the future.

It’s different for everyone.

Part of understanding “success” is knowing that I’m not like other people. I’m marching to the beat of my own drummer — a drummer who taps out an exaggerated adagio rhythm.

I don’t have unrealistic expectations of making six figures from blogging. My goal is much less lofty. I simply want to make enough money from various writing jobs that I can make $18,000 a year. I’m not expecting high-powered corporations to search me out for some lucrative contract, but if it happened, I’d be thrilled. I’d be ecstatic if some agent out there would ask to see my book manuscript because they’ve liked some of the pieces I’ve written here. But even finding the elusive agent isn’t a success by itself.

Success for me is to give hope, share information, and spark inspiration with my words. Success for me is a modest income, feeling satisfied in the occupation of my choosing, and finding joy in living my own dream.

I may not set any speed records, but fable-like, I’ll be the slow and steady tortoise that wins the race, tottering over the finish line into the promised land of success late in life — but very, very happy.

As Steve Jobs says,

“If you really look closely, most overnight successes took a long time.”

If you liked this, you’ll appreciate these:

Follow Tough Cookie on Medium

Self
Self Improvement
Success
Life
Mental Health
Recommended from ReadMedium