avatarAsjad Naqvi

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4115

Abstract

y the free ones that come with Stata. But I selected COVID-19 for three reasons.</p><p id="25ef">First, real world data is not clean and it is usually not in the shape one needs it in. Data management and cleaning is usually 60–80% of the time spent on projects. So COVID-19 datasets are almost clean but still need some work, enough to introduce some basic aspects of data and file management.</p><p id="0b8f">Second, unprecedented amounts of data is being generated around COVID-19, which makes it even more interesting to explore. Having dealt with shocks to economic systems as part of my research work, this level of data generation on a daily basis, and availability in the public domain, has not happened before on a global scale. This also gives users a unique opportunity to deal with very new, innovative, and topical datasets.</p><p id="6d70">Third, given that COVID-19 is an on-going crisis, and affects everyone, the visualizations are something viewers can related to. One can always look at a COVID-19 figure and can come up with a story to explain it. Therefore the appeal of COVID-19 visualizations is global. This is very different from using some generic dataset, with case-specific variables, and just plotting them.</p><p id="17e0">This exercise also afford the opportunity to think through visualizations, especially what message they should contain. If a visualization cannot be interpreted in a few seconds, then it has lost its purpose. For example, the average time spent on looking at a tweet is not more than a couple of seconds. If the visualization is not exciting enough, people will not engage. This of course is different from academic papers, where there is space provided to explain a visualization, but the general rule applies there as well.</p><h1 id="08f5">Why Medium?</h1><p id="1f00">I already had a Stata blog on my personal website where the focus was more on automating data streams from online sources (e.g., the <a href="https://readmedium.com/automating-eurostat-in-stata-part-1-a047941b2b4f">Eurostat guide</a> is duplicated here), and Stata-to-Latex guides (these will be added here soon). After a hack in 2019, my WordPress website was taken offline due to malicious malware, and since it was an expensive process to recover it, I decided to shut it down. I have now hosted my personal website on <a href="https://asjadnaqvi.github.io/">GitHub</a>.</p><p id="67ae">Additionally, I was looking for a platform to upload the existing Stata articles and write new ones. After scouring the internet, Medium provided the right space for putting up the guides. Plus the Medium annual subscription is still cheaper than hosting a personal website, and has tons of interesting content available from independent writers that I would also like to support. Every day, hundreds of articles are released on all sorts of topics ranging from data science, to machine learning, visualizations, etc. And this is just the part of Medium that I have personally explored. A lot more still needs to be discovered.</p><p id="6d87">What I found completely missing on Medium was Stata. Since most of Stata stuff is mostly available in the old-school book format, I decided to start my own little Stata corner under the name of <a href="https://medium.com/the-stata-guide"><i>.do the Stata Guide</i></a><i>. </i>I have also been offered to put them in other online Medium data science collections, which also have much more visibility, but I am planning to keep my small Stata corner, where I also have control over the content. This is, in any case, more of a side project that I enjoy doing.</p><p id="a859">Another advantage of Medium is that stories can be edited and updated especially as links and data sources evolve. This allows content creators to update information frequently, as opposed to books, where one has to first compile a minimum level of new information to release a new edition.</p><p id="8041">Also, on a side note, since the stories on Medium are metered, I recover, on average, one coffee cost per article, for my time on writing these articles. Nothing spectac

Options

ular :) At the time of writing this article, I have on average 2000 plus views and 500 plus reads per month and I hope that those who read it find it useful.</p><p id="9d85">If you think you have something interesting and out-of-the-box to contribute, then please feel to reach out! Any number of members can be added to publications.</p><h1 id="dc0d">What got me interested in this?</h1><p id="78e2">I have always been fascinated with data visualization. Before getting into economics, I had wanted to study graphics design, had interned in a major media house during my high-school days, and was putting together college magazines, brochures etc. Both my younger siblings are also full-time artists, so I was (or still am), constantly surrounded by discussions around colors, designs concepts, printing techniques, paintings etc.</p><p id="fd09">After my Masters, I went to the USA for my PhD, which was at the New School for Social Research from 2007–2011. The New School is right next to Union Square in the heart of Manhattan, and its Parsons School of Design, is constantly ranked in the top 10 design departments in the USA. So, I had the opportunity to explore lots of interesting data science exhibitions as well. Around this time, data science was emerging as a new field, and all major news outlets I was following, like the New York Times and the Washington Post etc., started hiring data scientists for really cool visualizations. Books were also coming out that curated old visualizations and presented new ones.</p><figure id="b7b7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*y1I7H4aKONchMetkS1xAlQ.jpeg"><figcaption>Some data visualization books from my collection</figcaption></figure><p id="a052">Additionally, departments like the MIT <a href="https://www.media.mit.edu/">Media Lab</a> were focusing more and more on data science and other universities were also following suit. So a lot was going on in this field, with really creative stuff coming out, and with the advent of more data science-oriented languages, websites were becoming more and more interactive. These trends continue to this day with constant innovations in programming languages, web interfaces, data portals, and interactive online tools.</p><h1 id="e00f">What’s next?</h1><p id="4b5f">There are still some more guides planned around visualizations! Plus in 2022, 16 datavisualization packages were relased. This also gives me some material to write about, which I discuss in a post <a href="https://readmedium.com/updates-from-2022-and-plans-for-2023-6673be5f8de8">here</a>.</p><p id="4a4e">There is also more planned on Stata-to-Latex, which remains a user favorite, and one that I am also dealing with constantly as part of my work.</p><p id="b724">There is also stuff I would like to write on statistical analysis, especially dealing with post-estimation outcomes, but again more from a visual perspective. So please give feedback and comments as it encourages me to write more as well!</p><h1 id="88fa">About the author</h1><p id="16f5">I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria. You can see my profile, research, and projects on <a href="https://github.com/asjadnaqvi">GitHub</a> or my <a href="https://asjadnaqvi.github.io/">website</a>. You can connect with me via <a href="https://medium.com/@asjadnaqvi">Medium</a>, <a href="https://twitter.com/AsjadNaqvi">Twitter</a>, <a href="https://www.linkedin.com/in/asjad-naqvi-phd-9a539512/">LinkedIn</a>, <a href="https://econtwitter.net/@asjadnaqvi">Mastodon</a>, <a href="https://post.news/asjadnaqvi">Post</a>, or simply via email: <i>[email protected]</i>. If you have questions regarding the Guide or Stata in general post them on <a href="https://discord.gg/qpHZtX6Xkk"><b>The Code Block</b></a> Discord server.</p><p id="7ec0"><a href="https://medium.com/the-stata-guide">The Stata Guide</a> releases awesome new content regularly. <a href="https://asjadnaqvi.medium.com/about">Subscribe</a>, Clap, and/or Follow the guide if you like the content!</p></article></body>

Photo by Efe Kurnaz on Unsplash

Why the Stata Guide on Medium?

(Last updated in Jan 2023 to fix typos (thanks to Staurt Leske), and do minor edits and updates to redundant parts.)

Dear reader,

First of all, thanks for stopping by, and showing interest in the Stata Guide!

The Guide would not have been possible without the support and encouragement from the community. I have received countless messages, comments, suggestions, and feedback on various articles, all of which have tremendously helped in improving the content. I wanted to write this small article to explain the motivation behind all of this.

Why Stata?

In academia, Stata has been the go-to language for years. Having been involved in a dozen of projects in institutions like LUMS, the World Bank, DFID, USAID, J-PAL, CERP, and others, Stata was, and still is, the core software used for analysis. While the current orientation might be more towards open-source, data science-oriented languages, like R and Python, my current academic/research circle uses only Stata. And this is unlikely to change in the next few years. I myself started using Stata around 2003 during my Masters’ program. This was a major leap from other softwares available at that point in time, for example, TSP and SPSS. Therefore, for me personally, there is also a lock-in effect. I am also constantly dealing with other languages in addition to Stata, which in order of time spent are, Mathematica, NetLogo, QGIS, R, and Python.

Stata is a powerful software for statistical analysis and has improved considerably in the last decade or so. While it has not focused so much on the graphics and visualization side, a lot exists in its core structure that has not been fully explored. The series of articles presented in the Stata Guide, aims to highlight these under-explored elements. Additionally, Stata now also has Python integration since version 16, opening up endless possibilities for exciting visualizations.

Moreover, Stata should not been seen as a competition to R, or Python. Both of these industry standard softwares also provide a rich code base for learning new techniques on coding up visualizations. Plus, both of these also integrate well with Stata. Stata’s uniqueness lies in its ability to be a great platform for data management and statistical analysis with pre-defined and rigorously vetted packages. In R, for example, one needs to be well aware of new packages as they get released, and therefore require a lot of version control. Stata is also relatively homogenous in its programming structure, while in R, the same code can be executed in a host of different ways (especially with the introduction of tidyverse). Therefore R has very intensive setup costs relative to Stata.

The Stata Guide aims to make the code for these visualizations available for anyone to use. While the focus has mostly been on COVID-19, a lot of the programming aspects introduced in the guides, especially around automation, can be applied anywhere.

Another motivation to write these guides is a personal one. Since I have countless dofiles, currently over 3,500 on my Dropbox, these guides also help me find code snippets. Otherwise, I have to sift through dozens of old dofiles which takes painfully long!

Why COVID-19?

These guides could have been written using any dataset, especially the free ones that come with Stata. But I selected COVID-19 for three reasons.

First, real world data is not clean and it is usually not in the shape one needs it in. Data management and cleaning is usually 60–80% of the time spent on projects. So COVID-19 datasets are almost clean but still need some work, enough to introduce some basic aspects of data and file management.

Second, unprecedented amounts of data is being generated around COVID-19, which makes it even more interesting to explore. Having dealt with shocks to economic systems as part of my research work, this level of data generation on a daily basis, and availability in the public domain, has not happened before on a global scale. This also gives users a unique opportunity to deal with very new, innovative, and topical datasets.

Third, given that COVID-19 is an on-going crisis, and affects everyone, the visualizations are something viewers can related to. One can always look at a COVID-19 figure and can come up with a story to explain it. Therefore the appeal of COVID-19 visualizations is global. This is very different from using some generic dataset, with case-specific variables, and just plotting them.

This exercise also afford the opportunity to think through visualizations, especially what message they should contain. If a visualization cannot be interpreted in a few seconds, then it has lost its purpose. For example, the average time spent on looking at a tweet is not more than a couple of seconds. If the visualization is not exciting enough, people will not engage. This of course is different from academic papers, where there is space provided to explain a visualization, but the general rule applies there as well.

Why Medium?

I already had a Stata blog on my personal website where the focus was more on automating data streams from online sources (e.g., the Eurostat guide is duplicated here), and Stata-to-Latex guides (these will be added here soon). After a hack in 2019, my WordPress website was taken offline due to malicious malware, and since it was an expensive process to recover it, I decided to shut it down. I have now hosted my personal website on GitHub.

Additionally, I was looking for a platform to upload the existing Stata articles and write new ones. After scouring the internet, Medium provided the right space for putting up the guides. Plus the Medium annual subscription is still cheaper than hosting a personal website, and has tons of interesting content available from independent writers that I would also like to support. Every day, hundreds of articles are released on all sorts of topics ranging from data science, to machine learning, visualizations, etc. And this is just the part of Medium that I have personally explored. A lot more still needs to be discovered.

What I found completely missing on Medium was Stata. Since most of Stata stuff is mostly available in the old-school book format, I decided to start my own little Stata corner under the name of .do the Stata Guide. I have also been offered to put them in other online Medium data science collections, which also have much more visibility, but I am planning to keep my small Stata corner, where I also have control over the content. This is, in any case, more of a side project that I enjoy doing.

Another advantage of Medium is that stories can be edited and updated especially as links and data sources evolve. This allows content creators to update information frequently, as opposed to books, where one has to first compile a minimum level of new information to release a new edition.

Also, on a side note, since the stories on Medium are metered, I recover, on average, one coffee cost per article, for my time on writing these articles. Nothing spectacular :) At the time of writing this article, I have on average 2000 plus views and 500 plus reads per month and I hope that those who read it find it useful.

If you think you have something interesting and out-of-the-box to contribute, then please feel to reach out! Any number of members can be added to publications.

What got me interested in this?

I have always been fascinated with data visualization. Before getting into economics, I had wanted to study graphics design, had interned in a major media house during my high-school days, and was putting together college magazines, brochures etc. Both my younger siblings are also full-time artists, so I was (or still am), constantly surrounded by discussions around colors, designs concepts, printing techniques, paintings etc.

After my Masters, I went to the USA for my PhD, which was at the New School for Social Research from 2007–2011. The New School is right next to Union Square in the heart of Manhattan, and its Parsons School of Design, is constantly ranked in the top 10 design departments in the USA. So, I had the opportunity to explore lots of interesting data science exhibitions as well. Around this time, data science was emerging as a new field, and all major news outlets I was following, like the New York Times and the Washington Post etc., started hiring data scientists for really cool visualizations. Books were also coming out that curated old visualizations and presented new ones.

Some data visualization books from my collection

Additionally, departments like the MIT Media Lab were focusing more and more on data science and other universities were also following suit. So a lot was going on in this field, with really creative stuff coming out, and with the advent of more data science-oriented languages, websites were becoming more and more interactive. These trends continue to this day with constant innovations in programming languages, web interfaces, data portals, and interactive online tools.

What’s next?

There are still some more guides planned around visualizations! Plus in 2022, 16 datavisualization packages were relased. This also gives me some material to write about, which I discuss in a post here.

There is also more planned on Stata-to-Latex, which remains a user favorite, and one that I am also dealing with constantly as part of my work.

There is also stuff I would like to write on statistical analysis, especially dealing with post-estimation outcomes, but again more from a visual perspective. So please give feedback and comments as it encourages me to write more as well!

About the author

I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria. You can see my profile, research, and projects on GitHub or my website. You can connect with me via Medium, Twitter, LinkedIn, Mastodon, Post, or simply via email: [email protected]. If you have questions regarding the Guide or Stata in general post them on The Code Block Discord server.

The Stata Guide releases awesome new content regularly. Subscribe, Clap, and/or Follow the guide if you like the content!

Stata
Background
Data Science
Medium
Blog
Recommended from ReadMedium