Free AI web copilot to create summaries, insights and extended knowledge, download it at here

18205

Abstract

="hljs-built_in">sort date region</pre></div><div id="3a5d"><pre>levelsof date, local(dates)</pre></div><div id="9069"><pre>foreach y of local dates {

summ region</pre></div><div id="bfc5"><pre>*** cases replace stack_cases = new_cases_ma7 if date==y' & region==r(min)' </pre></div><div id="c4c3"><pre>replace stack_cases = new_cases_ma7 + stack_cases[_n-1] if date==y' & region!=r(min)' </pre></div><div id="9b68"><pre>*** deaths </pre></div><div id="ff11"><pre>replace stack_deaths = new_deaths_ma7 if date==y' & region==r(min)' </pre></div><div id="8886"><pre>replace stack_deaths = new_deaths_ma7 + stack_deaths[n-1] if date==y' & region!=r(min)' </pre></div><div id="cd0a"><pre>}</pre></div>Essentially, we are taking the first region observation as it is, and for the subsequent regions iteratively adding up the values. We can also plot the new variables:<div id="b839"><pre>twoway /// (line stack_cases date if region==1) /// (line stack_cases date if region==2) /// (line stack_cases date if region==3) /// (line stack_cases date if region==4) /// (line stack_cases date if region==5) /// (line stack_cases date if region==6), /// legend(off)</pre></div>where we get this graph:<figure id="382a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ri_pBxrnhDrJvw8-Sk85ig.png"><figcaption></figcaption></figure>Here the difference between the two lines are the daily cases for each region. This figure is also easier to look at relative to the earlier graph.If you look at the help of the <code>twoway rarea</code> command:<div id="be02"><pre>help twoway_rarea</pre></div>here you will see that the syntax is:<div id="131e"><pre>twoway rarea y1var y2var xvar [if] [in] [, options]</pre></div>This implies that in Stata, if we need to make area graphs, then each region needs to be its own variable. This essentially means we need to reshape the data and make it wide.Before we reshape, we keep the variables we need and rename the new stack* variable for convivence:<div id="a587"><pre>keep region date new_cases stack_cases new_deaths stack_deaths</pre></div><div id="d125"><pre>// rename just to keep life easy ren stack_cases cases ren stack_deaths deaths</pre></div>As discussed in <a href="https://readmedium.com/covid-19-visualizations-with-stata-part-9-customized-bar-graphs-dde096567837">Guide 9</a>, reshaping losses the information on value labels. We can use this three-step process to (a) preserve the labels in locals before reshaping, (b) reshape the data, and, (c) apply the labels after the reshape:<div id="7c6f"><pre>*** preserve the labels</pre></div><div id="83fc"><pre>levelsof region, local(idlabels) // store the id levels

foreach x of local idlabels {
local idlab_x' : label region x'
}</pre></div><div id="f5ba"><pre>*** reshape the data</pre></div><div id="5191"><pre>reshape wide cases new_cases deaths newdeaths, i(date) j(region) order date cases* newcases* deaths*</pre></div><div id="b744"><pre>*** and apply the labels back</pre></div><div id="2283"><pre>foreach x of local idlabels { </pre></div><div id="59ce"><pre> lab var casesx' "idlabx''" lab var new_casesx' "idlab_x''"
lab var deathsx' "idlabx''" lab var new_deathsx' "idlab_x''"
}</pre></div>Since there are locals involved, the above code has to run in one go. If the code runs fine, we should get something like this where each variable is given a label based on the corresponding number in the variable name:<figure id="430d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vH4nv6xcg_Qu8RVdCS4P1g.png"><figcaption></figcaption></figure>We can now redraw the line graph shown earlier but now we just do it using variable names rather than <code>if</code> conditions:<div id="65f7"><pre>twoway /// (line cases1 date) /// (line cases2 date) /// (line cases3 date) /// (line cases4 date) /// (line cases5 date) /// (line cases6 date) /// (line cases12 date) /// , legend(off)</pre></div>Note that here I am using the last variable cases12 just to show the extent of the data:<figure id="496f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*srthcbu7nPKq9l1LpOsNcw.png"><figcaption></figcaption></figure>Since we want areas to stack up, we also need to define a dummy 0 variable for the first country:<div id="0b81"><pre>gen cases0 = 0
gen deaths0 = 0</pre></div><div id="bb43"><pre>twoway /// (line cases0 date) /// (line cases1 date) /// (line cases2 date) /// (line cases3 date) /// (line cases4 date) /// (line cases5 date) /// (line cases6 date) /// (line cases12 date) /// , legend(off)</pre></div>Which just gives us this figure:<figure id="b3af"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*teSaueTmUwQMQOpYPhSzQA.png"><figcaption></figcaption></figure>The importance of having a zero line will become obvious below.Now we have all the data in place, we can re-center the graph around zero on the y-axis. For this we need to take the maximum value of each date, divide it by two, and subtract each region’s observation by it.In order to re-center the graph, we take the highest value, or the value of the last variable and divided it by two:<div id="c305"><pre>ds cases* local items : word count r(varlist)' local items = items' - 1 display items'</pre></div><div id="c038"><pre>gen meanval_cases = casesitems' / 2 gen meanval_deaths = deathsitems' / 2</pre></div><div id="e5c5"><pre>foreach x of varlist cases* { gen x'_norm = x' - meanval_cases }</pre></div><div id="935b"><pre>foreach x of varlist deaths* { gen x'_norm = x' - meanval_deaths }</pre></div><div id="4263"><pre>drop meanval*</pre></div>The first three lines automate to process of counting the number of variables in the dataset. Note, that we do <code>local items = items' — 1</code> to account for the additional cases0 variable we generated earlier, so that the total number of variables in our example are 12, and not 13, for the loop to run properly.The automation helps if regions are added or subtracted, or the graph is looped over multiple regions, for example, generating a stream graph for each continent with all the countries.We can now plot the normalized variable given with the suffix _norm as follows:<div id="a007"><pre>twoway /// (line cases0_norm date) /// (line cases1_norm date) /// (line cases2_norm date) /// (line cases3_norm date) /// (line cases4_norm date) /// (line cases5_norm date) /// (line cases6_norm date) /// (line cases12_norm date) /// , legend(off)</pre></div>which gives us the core skeleton structure we need for stream graphs:<figure id="c9e8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*etMXlcnFwTN3CGFPTyCCGA.png"><figcaption></figcaption></figure>Here we can see the y=0 line has also been re-centered. We can now convert the above figure into an area graph as follows:<div id="23ac"><pre>twoway /// (rarea cases0_norm cases1_norm date) /// (rarea cases1_norm cases2_norm date) /// (rarea cases2_norm cases3_norm date) /// (rarea cases3_norm cases4_norm date) /// (rarea cases4_norm cases5_norm date) /// (rarea cases5_norm cases6_norm date) /// (rarea cases6_norm cases12_norm date) /// , legend(off)</pre></div>which gives us this figure:<figure id="36a3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*w2XaJ884kqcAJV24_ty59w.png"><figcaption></figcaption></figure>Notice also the y-axis which is now showing zero in the center and everything is distributed around it. Also note the pattern for generating the rarea graph where two y-variables have a difference of one in the name.<h2 id="b2ca">Labels</h2>We can also automate the labels in three steps.Step 1: generate the mid points of the last data observation:<div id="6abf"><pre>** this part is for the mid points</pre></div><div id="9437"><pre>summ date gen last = 1 if date==r(max)</pre></div><div id="4948"><pre>ds casesnorm local items : word count r(varlist)' local items = items' - 2 display items'</pre></div><div id="de0c"><pre>forval i = 0/items' { local i0 = i' local i1 = i' + 1</pre></div><div id="1622"><pre>gen ycasesi1' = (casesi0'_norm + casesi1'_norm) / 2 if last==1 gen ydeathsi1' = (deathsi0'_norm + deathsi1'_norm) / 2 if last==1</pre></div><div id="3e85"><pre>}</pre></div>Here note the use of the locals and the word count again to automate the whole process. The mid point has to be calculated as starting value + ending value / 2. Since we are counting from zero, we update the <code>items</code> local by reducing it’s value by 2. Within the loop, we define two locals, <code>i0</code> and <code>i1</code>, as indices for variable names, which can then be dynamically used in generating the mid points.Step 2: Next we generate a variable for the share of cases and deaths for the last observation. This is just to indicate how much each region is contributing to the total of the last data point.This is achieved using a fairly straightforward loop:<div id="8517"><pre>** <span class="hljs-keyw

Options

ord">this part is for the shares</pre></div><div id="9d0b"><pre>egen lastsum_cases = rowtotal(new_cases*) if last==1 egen lastsum_deaths = rowtotal(new_deaths*) if last==1</pre></div><div id="62a5"><pre>foreach x of varlist new_cases* { gen x'_share = (x' / lastsum_cases) * 100 }</pre></div><div id="97a6"><pre>foreach x of varlist new_deaths* { gen x'_share = (x' / lastsum_deaths) * 100 }</pre></div><div id="9e09"><pre>drop lastsum*</pre></div>Note here that I am not using the smoothed variable but the actual data for the accurate value of the share of cases. This is also the reason, we carry this variable forward throughout the collapse and reshaping process.Step 3: generate the variables containing the label for the graphs. Here we again use a mix of several locals to automate the label generation process:<div id="0f33"><pre>**** here we generate the labels</pre></div><div id="c8b6"><pre>ds cases*norm local items : word count r(varlist)' local items = items' - 1</pre></div><div id="e69b"><pre>foreach x of numlist 1/`items' {

local t : var lab casesx' </pre></div><div id="cae9"><pre>*** cases gen labelx'_cases = "t'" + " (" + string( new_casesx', "%9.0f") + ", " + string( new_casesx'_share, "%9.0fc") + "%)" if last==1</pre></div><div id="cb91"><pre>*** deaths gen labelx'_deaths = "t'" + " (" + string(new_deathsx', "%9.0f") + ", " + string(new_deaths`x'_share, "%9.0fc") + "%)" if last==1

}</pre></div>In the code above, <code>ds</code> and <code>word count</code> and <code>item</code> define the number for the loop. The local <code>t</code> picks the variable label, and the <code>gen</code> command puts all this information together. Since the data is in a wide form after the reshape, a new variable is generated for each region.We can also plot this manually for the same regions shown earlier:<div id="1e19"><pre>twoway /// (rarea cases0_norm cases1_norm date) /// (rarea cases1_norm cases2_norm date) /// (rarea cases2_norm cases3_norm date) /// (rarea cases3_norm cases4_norm date) /// (rarea cases4_norm cases5_norm date) /// (rarea cases5_norm cases6_norm date) /// (rarea cases6_norm cases12_norm date) /// (scatter ycases1 date if last==1, ms(smcircle) msize(0.2) mlabel(label1_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// (scatter ycases2 date if last==1, ms(smcircle) msize(0.2) mlabel(label2_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// (scatter ycases3 date if last==1, ms(smcircle) msize(0.2) mlabel(label3_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// (scatter ycases4 date if last==1, ms(smcircle) msize(0.2) mlabel(label4_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// (scatter ycases5 date if last==1, ms(smcircle) msize(0.2) mlabel(label5_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// (scatter ycases6 date if last==1, ms(smcircle) msize(0.2) mlabel(label6_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) /// , legend(off)</pre></div>Which gives us this graph:<figure id="9b2c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*HKcvRXAKLtm-aib5KvSUZw.png"><figcaption></figcaption></figure>We have now achieved the core structure required to automate the final figure.<h2 id="8cbd">Automate</h2>In the code below, we use a combination of loops and locals to also define the colors and labels for each region segment of the graph:<div id="bdb5"><pre>*** automate the areas, colors, labels</pre></div><div id="e2df"><pre>ds cases*norm local items : word count r(varlist)' local items = items' - 2 display items'</pre></div><div id="6f0b"><pre>forval x = 0/items' {

colorpalette /// "253 253 150" /// "255 197 1" /// "255 152 1" /// " 3 125 80" /// " 2 75 48" /// , n(13) nograph</pre></div><div id="e33c"><pre>local x0 = x' local x1 = x' + 1

local areagraph areagraph' rarea casesx0'_norm casesx1'_norm date, fcolor("r(px1')'") lcolor(black) lwidth(*0.15) || (scatter ycasesx1' date if last==1, ms(smcircle) msize(0.2) mlabel(label`x1'_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ||

}</pre></div><div id="5f40"><pre>*** get the date ranges in order</pre></div><div id="1676"><pre>summ date local x1 = r(min)' local x2 = r(max)' + 50</pre></div><div id="a665"><pre>*** generate the graph</pre></div><div id="2a21"><pre>graph twoway areagraph' /// , /// legend(off) /// ytitle("", size(small)) /// ylabel(-300000(100000)300000) /// yscale(noline) /// ylabel(, nolabels noticks nogrid) /// xscale(noline) /// xtitle("") /// xlabel(x1'(15)`x2', labsize(*0.6) angle(vertical) glwidth(vvthin) glpattern(solid)) /// title("{fontface Arial Bold: COVID-19 Daily Cases - The World}") /// note("Data sources: Our World in Data. World Bank 2020 classifications used for country groups.", size(tiny))</pre></div>The code above have a lot of parts that fit together. The total number of values that need to be picked are stored in the local <code>items</code>. A custom color palette that has been discussed in the <a href="https://readmedium.com/stata-graphs-define-your-own-color-schemes-4320b16f7ef7">Color guide</a> and applied in the <a href="https://readmedium.com/covid-19-visualizations-with-stata-part-9-customized-bar-graphs-dde096567837">Guide 9 on bar graphs</a>, is also being used here. This can be replaced with any other color scheme. The core body of the figure is stored in the local <code>areagraph</code> which contains two parts; one part for the area graph and one part for the labels including all the customizations of the lines, colors, widths, sizes etc. The color information is also dynamically applied from the values stored in locals after the <code>colorpalette</code> command.The date range is stored in the two locals <code>x1</code> and <code>x2</code>. The graph command call the <code>areagraph </code>local, and the date locals. Y-axis is turned off completely including the line which show the values centered around zero. The title is also customized using the <code>fontface</code> argument (see the <a href="https://readmedium.com/stata-graphs-get-those-fonts-right-c38d35625142">Font Guide</a> for details).From the code above, we get the following final figure:<figure id="f0b1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*juFGIEfTB1JhjfWM5OlZ3A.png"><figcaption></figcaption></figure>The stream graph above provides a neat representation of how the cases evolve and which region is contributing to the daily cases. It also helps immediately convey the relative scale of the epidemic and how different regions evolve over time. For example, the United States and Europe are major contributors at the time of writing this article. This graph can also be generated per-capita to normalize the regions for more accurate comparisons.<h1 id="b206">Exercise</h1>Generate the graph for daily deaths shown below:<figure id="38fb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4aosUCaKFIL6ULW38fy8bw.png"><figcaption></figcaption></figure>This graph uses the custom color palette:<div id="e112"><pre>colorpalette /// "253 253 150" /// "255 197 1" /// "255 152 1" /// " 3 125 80" /// " 2 75 48" ///</pre></div>Don’t forget to clear the locals if you generate the stream graph for daily deaths right after the daily cases graph. Locals can be simply reset as follows:<div id="ed42"><pre>local areagraph</pre></div>Hope you enjoyed this guide!<h1 id="2e65">Other Stata guides</h1><a href="https://readmedium.com/covid-19-data-visualization-with-stata-part-1-an-introduction-to-data-setup-and-customized-6b879a1e8647">Part 1: An introduction to data setup and customized graphs</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-2-customizing-color-schemes-206af77d00ce">Part 2: Customizing colors schemes</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-3-heat-plots-e2ef5ac1160b">Part 3: Heat plots</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-4-maps-fbd4fe2642f6">Part 4: Maps</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-5-stacked-area-graphs-ed976d025365">Part 5: Stacked area graphs</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-6-animations-f9d2b09985c2">Part 6: Animations</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-7-doubling-time-graphs-1-58b7687cbdc0">Part 7: Doubling time graphs</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-8-joy-plots-ridge-line-plots-dbe022e7264d">Part 8: Ridge-line plots (Joy plots)</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-9-customized-bar-graphs-dde096567837">Part 9: Customized bar graphs</a><a href="https://readmedium.com/covid-19-visualizations-with-stata-part-10-stream-graphs-9d55db12318a">Part 10: Stream graphs</a>If you enjoy these guides and find them useful, then please like and follow <a href="https://medium.com/the-stata-guide">The Stata Guide</a>. Also, please share your visualizations if you use these guides!<h1 id="3c6a">About the author</h1>I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria where I work at the <a href="https://www.wu.ac.at/en/ecolecon/institute">Vienna University of Economics and Business (WU)</a> and at the <a href="https://iiasa.ac.at/">International Institute for Applied Systems Analysis (IIASA)</a>. You can find my research work on <a href="https://www.researchgate.net/profile/Asjad_Naqvi">ResearchGate</a> and <a href="https://scholar.google.com/citations?user=oWGGVpYAAAAJ&hl=en">Google Scholar</a>, and Stata code repository on <a href="https://github.com/asjadnaqvi">GitHub</a>. You can follow my COVID-19 related Stata visualizations on my <a href="https://twitter.com/AsjadNaqvi">Twitter</a>. I am also featured on the Stata <a href="https://www.stata.com/covid19/">COVID-19 webpage</a> in the visualization and graphics section.You can connect with me via <a href="https://medium.com/@asjadnaqvi">Medium</a>, <a href="https://twitter.com/AsjadNaqvi">Twitter</a>, <a href="https://www.linkedin.com/in/asjad-naqvi-phd-9a539512/">LinkedIn </a>or simply via email: [email protected].My Medium blog for Stata stuff here: <a href="https://medium.com/the-stata-guide">The Stata Guide</a> where new awesome content is released regularly. Clap, and/or follow if you like these guides!</article></body>

COVID-19 visualizations with Stata Part 10: Stream graphs

In this Stata guide, learn how to create the following Stream graph using publicly available COVID-19 information on daily cases and deaths from the Our World in Data (OWID) database:

Stream graphs are a follow-up of the Stacked-area graphs guide. It is recommended that you go through the earlier guide before using this one since the essential building blocks are the same, and the earlier guide explains the steps carefully. The additional things we will learn in this guide are preserving and labels before reshaping and applying them after reshaping (also introduced in Guide 9 on Bar graphs), automating area graphs, colors, and labels.

This visualization is inspired by Financial Times’ COVID-19 tracker which currently displays the a stream graph on COVID-19 deaths on their website.

Preamble

Like all previous guides, this guide assumes a basic knowledge of Stata. This guide deals with advanced usage of locals, loops, and code structures that require some experience and familiarity with Stata programming. If you are using this guide for the first time, and are new to Stata, then Guide 1 and Guide 2 are highly recommended, followed by the next set of guides which are in increasing order of difficulty.

This guide uses the following folder structure for the work-flow management:

Within the graphs folder, I also create an additional sub-folder called guide10, to store the figures generated here. For details on how to organize your files, please see Guide 1.

In order to make the graphs exactly as they are shown here, several additional item are required:

In order to make the graphs exactly as they are shown here, install the schemepack suite (more info in the Scheme guide and on GitHub):

ssc install schemepack, replace

and set the scheme to White Tableau:

set scheme white_tableau

Install Ben Jann’s colorpalette package (more on colors in Guide 2 and in the Color guide)

ssc install palettes, replace

Set default graph font to Arial Narrow (see the Font guide on customizing fonts)

graph set window fontface "Arial Narrow"

This guide has been written in version 16.1 and should work with version 14 and onwards. Earlier versions might need some modification for implementing custom colors.

Get the data in order

We pull the data from the Our World in Data’s COVID-19 webpage as follows:

************************
***  COVID 19 data   ***
************************

insheet using "https://covid.ourworldindata.org/data/owid-covid-data.csv", clear
save ./raw/full_data_raw.dta, replace

gen date2 = date(date, "YMD")
format date2 %tdDD-Mon-yy
drop date
ren date2 date

ren location country
replace country = "Slovak Republic" if country == "Slovakia"

drop if date < 21915  // 1st Jan 2020

save "./master/OWID_data.dta", replace

All observations before 1st Jan 2020 are dropped in the dataset.

Since OWID uses very broad classifications for continents (five continents in total), we will use the World Bank 2020 classifications for country groupings which provides a very large set of regions.

The Excel file can be downloaded from the page linked above. I have already cleaned the file and uploaded it on my GitHub page in Stata format. It can be directly pulled into Stata and saved in the master folder as follows:

**********************************
***  Country classifications   ***
**********************************
  
copy "https://github.com/asjadnaqvi/COVID19-Stata-Tutorials/blob/master/data/country_codes.dta?raw=true" "./master/country_codes.dta", replace

Next we merge the two files together and drop anything that does not match:

use "./master/OWID_data.dta", clear
merge m:1 country using "./master/country_codes.dta"
drop if _m!=3

keep country date new_cases new_deaths group*

summ date
drop if date>=r(max)

We also only keep the variables we need and drop the last date observation to avoid missing values for some countries. This might not be necessary and it all depends on when the data is pulled from the OWID website. In the past, not all countries were updated at the same time.

If everything works well, the dataset should look something like this:

Stata 16.1 interface using the Dark theme

Each group variable corresponds to the classification defined by the World Bank.

Setup for stream graphs

Since we now have World Bank country groupings, we can split the data in as many groups as we want. For now, I am defining the following 12 regions:

gen region = .

replace region = 1 if group29==1 & country=="United States" 
replace region = 2 if group29==1 & country!="United States" 
replace region = 3 if group20==1 & country=="Brazil" 
replace region = 4 if group20==1 & country!="Brazil"
replace region = 5 if group10==1 
replace region = 6 if group8==1 & group10!=1 & country=="United Kingdom" 
replace region = 7 if group8==1 & group10!=1 & country!="United Kingdom" 
replace region = 8 if group26==1 
replace region = 9 if group37==1 
replace region = 10 if group35==1 & country=="India"
replace region = 11 if group35==1 & country!="India" 
replace region = 12 if group6==1

This of course can be increased or decreased based on the level of detail required. We can also label the values of this variable:

lab de region  1 "United States" 2 "Rest of North America" 3 "Brazil" 4 "Rest of Latin America" 5 "European Union" 6 "United Kingdom" 7 "Rest of Europe" 8 "Middle East and North Africa" 9 "Sub-Saharan Africa" 10 "India" 11 "Rest of South Asia" 12 "East Asia and Pacific"

lab val region region

In the next step, we collapse the data and sum up the daily cases and deaths by date and region combination.

collapse (sum) new_cases new_deaths, by(date region)

format date %tdDD-Mon-yy
format new_cases %9.0fc

*** minor cleaning of negative cases
replace new_cases = 0 if new_cases < 0
replace new_deaths = 0 if new_deaths < 0

Note that all variables that are not defined in the collapse command are automatically dropped from the dataset. We also clean up date format and the variables.

Now we declare the data to be a panel dataset using the xtset command. We can use the panel structure to generate a 7-day moving average to smooth out the series:

xtset region date
tssmooth ma new_cases_ma7  = new_cases , w(6 1 0) 
tssmooth ma new_deaths_ma7  = new_deaths , w(6 1 0)

and we can plot the series for the first few regions to see what it looks like:

twoway ///
 (line new_cases_ma7 date if region==1) ///
 (line new_cases_ma7 date if region==2) ///
 (line new_cases_ma7 date if region==3) ///
 (line new_cases_ma7 date if region==4) ///
 (line new_cases_ma7 date if region==5) ///
 (line new_cases_ma7 date if region==6), ///
  legend(off)

which gives us this graph:

Next we use the logic introduced in Guide 5 on Stacked area graphs and generate a new set of variables which provides cumulative graphs by stacking values on top of each other:

******** new we stack these up

gen stack_cases  = .
gen stack_deaths = .

sort date region

levelsof date, local(dates)

foreach y of local dates {
  
  summ region

*** cases
replace stack_cases  = new_cases_ma7 if date==`y' &  region==`r(min)'

replace stack_cases  = new_cases_ma7  + stack_cases[_n-1]  if  date==`y' & region!=`r(min)'

*** deaths

replace stack_deaths = new_deaths_ma7 if  date==`y' & region==`r(min)'

replace stack_deaths = new_deaths_ma7 + stack_deaths[_n-1]  if  date==`y' & region!=`r(min)'

Essentially, we are taking the first region observation as it is, and for the subsequent regions iteratively adding up the values. We can also plot the new variables:

twoway ///
 (line stack_cases date if region==1) ///
 (line stack_cases date if region==2) ///
 (line stack_cases date if region==3) ///
 (line stack_cases date if region==4) ///
 (line stack_cases date if region==5) ///
 (line stack_cases date if region==6), ///
  legend(off)

where we get this graph:

Here the difference between the two lines are the daily cases for each region. This figure is also easier to look at relative to the earlier graph.

If you look at the help of the twoway rarea command:

help twoway_rarea

here you will see that the syntax is:

twoway rarea y1var y2var xvar [if] [in] [, options]

This implies that in Stata, if we need to make area graphs, then each region needs to be its own variable. This essentially means we need to reshape the data and make it wide.

Before we reshape, we keep the variables we need and rename the new stack_* variable for convivence:

keep region date new_cases stack_cases new_deaths stack_deaths

// rename just to keep life easy  
ren stack_cases   cases
ren stack_deaths  deaths

As discussed in Guide 9, reshaping losses the information on value labels. We can use this three-step process to (a) preserve the labels in locals before reshaping, (b) reshape the data, and, (c) apply the labels after the reshape:

*** preserve the labels

levelsof region, local(idlabels)      // store the id levels
 
foreach x of local idlabels {       
  local idlab_`x' : label region `x'  
 }

*** reshape the data

reshape wide cases new_cases deaths new_deaths, i(date) j(region) 
order date cases* new_cases* deaths*

*** and apply the labels back

foreach x of local idlabels {

 lab var     cases`x'  "`idlab_`x''" 
 lab var new_cases`x'  "`idlab_`x''"   
 lab var     deaths`x' "`idlab_`x''"    
 lab var new_deaths`x' "`idlab_`x''"    
 }

Since there are locals involved, the above code has to run in one go. If the code runs fine, we should get something like this where each variable is given a label based on the corresponding number in the variable name:

We can now redraw the line graph shown earlier but now we just do it using variable names rather than if conditions:

twoway ///
 (line cases1 date) ///
 (line cases2 date) ///
 (line cases3 date) ///
 (line cases4 date) ///
 (line cases5 date) ///
 (line cases6 date) ///
 (line cases12 date) ///
  , legend(off)

Note that here I am using the last variable cases12 just to show the extent of the data:

Since we want areas to stack up, we also need to define a dummy 0 variable for the first country:

gen cases0 = 0  
gen deaths0 = 0

twoway ///
 (line cases0 date) /// 
 (line cases1 date) ///
 (line cases2 date) ///
 (line cases3 date) ///
 (line cases4 date) ///
 (line cases5 date) ///
 (line cases6 date) ///
 (line cases12 date) ///
  , legend(off)

Which just gives us this figure:

The importance of having a zero line will become obvious below.

Now we have all the data in place, we can re-center the graph around zero on the y-axis. For this we need to take the maximum value of each date, divide it by two, and subtract each region’s observation by it.

In order to re-center the graph, we take the highest value, or the value of the last variable and divided it by two:

ds cases*
local items : word count `r(varlist)'
local items = `items' - 1
display `items'

gen meanval_cases  =  cases`items' / 2
gen meanval_deaths = deaths`items' / 2

foreach x of varlist cases* {
 gen `x'_norm  = `x' - meanval_cases
}

foreach x of varlist deaths* {
 gen `x'_norm  = `x' - meanval_deaths
}

drop meanval*

The first three lines automate to process of counting the number of variables in the dataset. Note, that we do local items = `items' — 1 to account for the additional cases0 variable we generated earlier, so that the total number of variables in our example are 12, and not 13, for the loop to run properly.

The automation helps if regions are added or subtracted, or the graph is looped over multiple regions, for example, generating a stream graph for each continent with all the countries.

We can now plot the normalized variable given with the suffix *_norm as follows:

twoway ///
 (line cases0_norm date) ///
 (line cases1_norm date) ///
 (line cases2_norm date) ///
 (line cases3_norm date) ///
 (line cases4_norm date) ///
 (line cases5_norm date) ///
 (line cases6_norm date) ///
 (line cases12_norm date) ///
  , legend(off)

which gives us the core skeleton structure we need for stream graphs:

Here we can see the y=0 line has also been re-centered. We can now convert the above figure into an area graph as follows:

twoway ///
 (rarea cases0_norm  cases1_norm date) ///
 (rarea cases1_norm  cases2_norm date) ///
 (rarea cases2_norm  cases3_norm date) ///
 (rarea cases3_norm  cases4_norm date) ///
 (rarea cases4_norm  cases5_norm date) ///
 (rarea cases5_norm  cases6_norm date) ///
 (rarea cases6_norm cases12_norm date) ///
  , legend(off)

which gives us this figure:

Notice also the y-axis which is now showing zero in the center and everything is distributed around it. Also note the pattern for generating the rarea graph where two y-variables have a difference of one in the name.

Labels

We can also automate the labels in three steps.

Step 1: generate the mid points of the last data observation:

*** this part is for the mid points

summ date
gen last = 1 if date==r(max)

ds cases*norm
local items : word count `r(varlist)'
local items = `items' - 2
display `items'

forval i = 0/`items' {
local i0 = `i'
local i1 = `i' + 1

gen ycases`i1'  = (cases`i0'_norm + cases`i1'_norm) / 2 if last==1
gen ydeaths`i1' = (deaths`i0'_norm + deaths`i1'_norm) / 2 if last==1

Here note the use of the locals and the word count again to automate the whole process. The mid point has to be calculated as starting value + ending value / 2. Since we are counting from zero, we update the items local by reducing it’s value by 2. Within the loop, we define two locals, i0 and i1, as indices for variable names, which can then be dynamically used in generating the mid points.

Step 2: Next we generate a variable for the share of cases and deaths for the last observation. This is just to indicate how much each region is contributing to the total of the last data point.

This is achieved using a fairly straightforward loop:

*** this part is for the shares

egen lastsum_cases  = rowtotal(new_cases*)  if last==1
egen lastsum_deaths = rowtotal(new_deaths*) if last==1

foreach x of varlist new_cases* {
 gen `x'_share = (`x' / lastsum_cases) * 100
 }

foreach x of varlist new_deaths* {
 gen `x'_share = (`x' / lastsum_deaths) * 100
 }

drop lastsum*

Note here that I am not using the smoothed variable but the actual data for the accurate value of the share of cases. This is also the reason, we carry this variable forward throughout the collapse and reshaping process.

Step 3: generate the variables containing the label for the graphs. Here we again use a mix of several locals to automate the label generation process:

**** here we generate the labels

ds cases*norm
local items : word count `r(varlist)'
local items = `items' - 1

foreach x of numlist 1/`items' {
    
 local t : var lab cases`x'

*** cases 
gen label`x'_cases  = "`t'" + " (" + string( new_cases`x', "%9.0f") + ", " + string( new_cases`x'_share, "%9.0fc") + "%)" if last==1

*** deaths
gen label`x'_deaths = "`t'" + " (" + string(new_deaths`x', "%9.0f") + ", " + string(new_deaths`x'_share, "%9.0fc") + "%)" if last==1
  
}

In the code above, ds and word count and item define the number for the loop. The local t picks the variable label, and the gen command puts all this information together. Since the data is in a wide form after the reshape, a new variable is generated for each region.

We can also plot this manually for the same regions shown earlier:

twoway ///
 (rarea cases0_norm  cases1_norm date) ///
 (rarea cases1_norm  cases2_norm date) ///
 (rarea cases2_norm  cases3_norm date) ///
 (rarea cases3_norm  cases4_norm date) ///
 (rarea cases4_norm  cases5_norm date) ///
 (rarea cases5_norm  cases6_norm date) ///
 (rarea cases6_norm cases12_norm date) ///
  (scatter ycases1 date if last==1, ms(smcircle) msize(0.2) mlabel(label1_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///
  (scatter ycases2 date if last==1, ms(smcircle) msize(0.2) mlabel(label2_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///
  (scatter ycases3 date if last==1, ms(smcircle) msize(0.2) mlabel(label3_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///
  (scatter ycases4 date if last==1, ms(smcircle) msize(0.2) mlabel(label4_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///
  (scatter ycases5 date if last==1, ms(smcircle) msize(0.2) mlabel(label5_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///
  (scatter ycases6 date if last==1, ms(smcircle) msize(0.2) mlabel(label6_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ///  
  , legend(off)

Which gives us this graph:

We have now achieved the core structure required to automate the final figure.

Automate

In the code below, we use a combination of loops and locals to also define the colors and labels for each region segment of the graph:

*** automate the areas, colors, labels

ds cases*norm
local items : word count `r(varlist)'
local items = `items' - 2
display `items'

forval x = 0/`items' {  

colorpalette    ///
 "253 253 150"  ///
 "255 197   1"  ///
 "255 152   1"  ///
 "  3 125  80"  ///
 "  2  75  48"  ///
 , n(13) nograph

local x0 =  `x'
local x1 =  `x' + 1
 
local areagraph `areagraph' rarea cases`x0'_norm cases`x1'_norm date, fcolor("`r(p`x1')'") lcolor(black) lwidth(*0.15) || (scatter ycases`x1' date if last==1, ms(smcircle) msize(0.2) mlabel(label`x1'_cases) mcolor(black%20) mlabsize(tiny) mlabcolor(black)) ||
 
 }

*** get the date ranges in order

summ date 
local x1 = `r(min)'
local x2 = `r(max)' + 50

*** generate the graph

graph twoway `areagraph' ///
 , ///
  legend(off) ///
  ytitle("", size(small)) ///
  ylabel(-300000(100000)300000) ///
  yscale(noline) ///
  ylabel(, nolabels noticks nogrid) ///
  xscale(noline) ///
  xtitle("") ///
  xlabel(`x1'(15)`x2', labsize(*0.6) angle(vertical) glwidth(vvthin) glpattern(solid)) ///
  title("{fontface Arial Bold: COVID-19 Daily Cases - The World}") ///
  note("Data sources: Our World in Data. World Bank 2020 classifications used for country groups.", size(tiny))

The code above have a lot of parts that fit together. The total number of values that need to be picked are stored in the local items. A custom color palette that has been discussed in the Color guide and applied in the Guide 9 on bar graphs, is also being used here. This can be replaced with any other color scheme. The core body of the figure is stored in the local areagraph which contains two parts; one part for the area graph and one part for the labels including all the customizations of the lines, colors, widths, sizes etc. The color information is also dynamically applied from the values stored in locals after the colorpalette command.

The date range is stored in the two locals x1 and x2. The graph command call the areagraph local, and the date locals. Y-axis is turned off completely including the line which show the values centered around zero. The title is also customized using the fontface argument (see the Font Guide for details).

From the code above, we get the following final figure:

The stream graph above provides a neat representation of how the cases evolve and which region is contributing to the daily cases. It also helps immediately convey the relative scale of the epidemic and how different regions evolve over time. For example, the United States and Europe are major contributors at the time of writing this article. This graph can also be generated per-capita to normalize the regions for more accurate comparisons.

Exercise

Generate the graph for daily deaths shown below:

This graph uses the custom color palette:

colorpalette ///
 "253 253 150" ///
 "255 197   1" ///
 "255 152   1" ///
 "  3 125  80" ///
 "  2  75  48" ///

Don’t forget to clear the locals if you generate the stream graph for daily deaths right after the daily cases graph. Locals can be simply reset as follows:

local areagraph

Hope you enjoyed this guide!

Other Stata guides

Part 1: An introduction to data setup and customized graphs

Part 2: Customizing colors schemes

Part 3: Heat plots

Part 4: Maps

Part 5: Stacked area graphs

Part 6: Animations

Part 7: Doubling time graphs

Part 8: Ridge-line plots (Joy plots)

Part 9: Customized bar graphs

Part 10: Stream graphs

If you enjoy these guides and find them useful, then please like and follow The Stata Guide. Also, please share your visualizations if you use these guides!

About the author

I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria where I work at the Vienna University of Economics and Business (WU) and at the International Institute for Applied Systems Analysis (IIASA). You can find my research work on ResearchGate and Google Scholar, and Stata code repository on GitHub. You can follow my COVID-19 related Stata visualizations on my Twitter. I am also featured on the Stata COVID-19 webpage in the visualization and graphics section.

You can connect with me via Medium, Twitter, LinkedIn or simply via email: [email protected].

My Medium blog for Stata stuff here: The Stata Guide where new awesome content is released regularly. Clap, and/or follow if you like these guides!