avatarRandy Runtsch

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3079

Abstract

ubl435/PLAW-115publ435.pdf">Foundations for Evidence-Based Policymaking Act</a> of the U.S. Congress, requires the federal government to make its data available to the public in open and machine-readable form. At the same time, it must ensure privacy and security.</p><p id="536f"><a href="https://resources.data.gov/resources/data-gov-open-data-howto/">How-to</a> guides instruct government entities to publish metadata to describe their public datasets available on the site. Consistent metadata improves discoverability and impact.</p><p id="e0cd">Data.gov is primarily a federal government site and service. But state, local, and tribal governments can publish metadata to describe their public datasets on the platform as well.</p><h1 id="fefe">What Types of Data does Data.gov Store?</h1><p id="a5bb">As mentioned above, Data.gov stores metadata that describes data stored in thousands of datasets available elsewhere. It does not hold the defined data but stores and displays links to downloadable files and APIs to acquire the data.</p><p id="e67a">With over 280,000 datasets, the data indexed on Data.gov is too vast to describe concisely. Here is a small sample of available data types:</p><ul><li>Gross domestic product</li><li>Climate and weather</li><li>Tax revenues, rates, and refunds</li><li>Geospatial data</li><li>Housing statistics</li><li>Oceans</li><li>Census and population</li><li>Wind and solar power</li><li>Education</li><li>Birth and death rates</li></ul><h1 id="c0c4">How can I Find Data on Data.gov?</h1><p id="1e84">The <a href="https://catalog.data.gov/dataset">Data.gov data catalog</a> makes it easy to search for datasets. The screenshot below provides tips to find datasets using the data catalog search page.</p><figure id="0948"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*HAPpRyr0ah4oqiEU"><figcaption>Data.gov data catalog search screen. Image captured by the author.</figcaption></figure><h1 id="2dec">How can I Acquire Data and API Information from Data.gov?</h1><p id="3105">Data and APIs can be accessed by clicking on the data source links in the search results.</p><h2 id="7189">Download Data</h2><p id="48f5">As shown in the screenshot below, a search of the keyword <i>fishing </i>in the search textbox on the <a href="https://catalog.data.gov/dataset">data catalog search page</a> returns 14,230 datasets. The first two datasets contain data about fish stocking and fishing facilities in North Dakota.</p><figure id="0281"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*NLZo1hVORZ64uL1w"><figcaption>Dataset results for a search on “fishing.” Image captured by the author.</figcaption></figure><p id="a935">In this example, when you click on a data source link, you will be routed to a website hosted by North Dakota or a dataset file.</p><p id="e26f">Clicking on CSV downloads a CSV file. When opened in Excel, the file shows data about fish that the state stocked in its lakes.</p><figure id="3a63"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*eIUPjqRX_MbWm_Iw"><figc

Options

aption>Subset of North Dakota fish stocking dataset. Image captured by the author.</figcaption></figure><h2 id="7dd0">API Information</h2><p id="6f63"><b>Metadata API</b></p><p id="a78d">Data.gov manages metadata about datasets, not the raw data. While the search facility makes it easy to find datasets of interest, the <a href="https://www.data.gov/developers/apis">CKAN API</a> and the <a href="https://catalog.data.gov/csw#topic=developers_navigation">CSW endpoint</a> can be used in programs to query datasets and retrieve metadata. See <a href="https://catalog.data.gov/csw#topic=developers_navigation">Data Harvesting</a> for more information.</p><p id="a58b"><b>Dataset APIs</b></p><p id="f39d">Most datasets are available as downloadable files. Fewer datasets have APIs that can be used to access data. To find the APIs or information about them, search the <a href="https://catalog.data.gov/dataset">Data.gov dataset catalog</a> with API selected in the Formats filter section.</p><figure id="1ec5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*O71ofgjftSa1i_f0"><figcaption>Search results for “fishing” and formats that include “API.” Image captured by the author.</figcaption></figure><p id="597f">Clicking the API link for the <i>Great Smokey Mountains National Park Fish Distribution (2014)</i> dataset opens a new browser tab. It displays a page with information about accessing the data.</p><figure id="4722"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*cHWwgAQEy2Vlp7d0"><figcaption>arcGIS page displayed when the API link for the Great Smoky Mountains National Park Fish Distribution (2015) dataset is clicked. Image captured by the author.</figcaption></figure><h1 id="9dcf">Upcoming Articles about Data.gov</h1><p id="c351">Watch for upcoming articles about Data.gov such as these:</p><ul><li>Write a program that uses the <a href="https://docs.ckan.org/en/2.8/api/index.html">CKAN API</a> to access dataset metadata.</li><li>Explore a variety of types of datasets.</li><li>Explore data tools to support the work of data practitioners.</li><li>Interesting or unusual datasets.</li></ul><h1 id="2219">Conclusion</h1><p id="8328">Data.gov provides easy-to-find metadata about datasets managed by all levels of government within the United States. The platform simplifies the process of finding and acquiring data through downloads or APIs.</p><p id="8c0a">Data.gov is an example of government done well.</p><h1 id="b111">About the Author</h1><p id="6b4a">Randy Runtsch is a writer, data engineer, data analyst, programmer, photographer, cyclist, and adventurer. He and his wife live in southeastern Minnesota, U.S.A.</p><p id="87ca">Watch for Randy’s upcoming articles on public datasets to drive data analytics insights and decision-making, programming, data analytics, photography, bicycle touring, and more. You can see some of his photographs at <a href="https://randallruntschimages.shootproof.com/">shootproof.com</a> and <a href="https://www.shutterstock.com/g/rruntsch">shutterstock.com</a>.</p></article></body>

Use Public Datasets Cataloged on Data.gov to Power Data Science Projects

Data.gov houses metadata that describes over 280,000 free and public datasets published at the U.S. federal, state, and local government levels. It simplifies the process of finding data and acquiring it through downloads or APIs.

Photo of charts and graphs on a laptop PC. Courtesy of Lukas Blazak on unsplash.com.

Introduction

Recently, I published an article about how to acquire and analyze data on analytics.usa.gov about the public’s use of about 57,000 U.S. federal government websites. Data.gov, another government site, serves as a public clearinghouse for a vast collection of government datasets of all kinds.

Data.gov contains metadata that describes over 280,000 free public datasets. It catalogs a rich and varied collection of data managed by U.S. federal government entities, and in some cases, at the state, local, and tribal government levels. Data.gov is a clearinghouse of government datasets available to the American public.

This article describes what Data.gov is, how government entities can publish metadata about their public datasets on the site, the types of data the site catalogs, how to search the data, and how to download datasets from their source entities.

What is Data.gov?

The Technology Transformation Services (TTS) department within the U.S. General Services Administration (GSA) manages Data.gov. It established the service in 2009. To date, TTS has collected, documented, and published metadata for 280,518 datasets.

The following statements guide the work on Data.gov:

Mission: Design and deliver a digital government with and for the American public.

Vision: Trusted modern government experiences for all.

TTS built Data.gov with CKAN and WordPress. It develops its code publicly on GitHub.

Datasets indexed on data.gov follow the DCAT-US Schema v1.1 guidelines. With this schema, a consistent set of metadata (Title, Description, Tags, Publisher, and so on) is applied to all datasets to make them discoverable and understood.

How do Government Entities Add Datasets to Data.gov?

The OPEN Government Data Act, part of the Foundations for Evidence-Based Policymaking Act of the U.S. Congress, requires the federal government to make its data available to the public in open and machine-readable form. At the same time, it must ensure privacy and security.

How-to guides instruct government entities to publish metadata to describe their public datasets available on the site. Consistent metadata improves discoverability and impact.

Data.gov is primarily a federal government site and service. But state, local, and tribal governments can publish metadata to describe their public datasets on the platform as well.

What Types of Data does Data.gov Store?

As mentioned above, Data.gov stores metadata that describes data stored in thousands of datasets available elsewhere. It does not hold the defined data but stores and displays links to downloadable files and APIs to acquire the data.

With over 280,000 datasets, the data indexed on Data.gov is too vast to describe concisely. Here is a small sample of available data types:

  • Gross domestic product
  • Climate and weather
  • Tax revenues, rates, and refunds
  • Geospatial data
  • Housing statistics
  • Oceans
  • Census and population
  • Wind and solar power
  • Education
  • Birth and death rates

How can I Find Data on Data.gov?

The Data.gov data catalog makes it easy to search for datasets. The screenshot below provides tips to find datasets using the data catalog search page.

Data.gov data catalog search screen. Image captured by the author.

How can I Acquire Data and API Information from Data.gov?

Data and APIs can be accessed by clicking on the data source links in the search results.

Download Data

As shown in the screenshot below, a search of the keyword fishing in the search textbox on the data catalog search page returns 14,230 datasets. The first two datasets contain data about fish stocking and fishing facilities in North Dakota.

Dataset results for a search on “fishing.” Image captured by the author.

In this example, when you click on a data source link, you will be routed to a website hosted by North Dakota or a dataset file.

Clicking on CSV downloads a CSV file. When opened in Excel, the file shows data about fish that the state stocked in its lakes.

Subset of North Dakota fish stocking dataset. Image captured by the author.

API Information

Metadata API

Data.gov manages metadata about datasets, not the raw data. While the search facility makes it easy to find datasets of interest, the CKAN API and the CSW endpoint can be used in programs to query datasets and retrieve metadata. See Data Harvesting for more information.

Dataset APIs

Most datasets are available as downloadable files. Fewer datasets have APIs that can be used to access data. To find the APIs or information about them, search the Data.gov dataset catalog with API selected in the Formats filter section.

Search results for “fishing” and formats that include “API.” Image captured by the author.

Clicking the API link for the Great Smokey Mountains National Park Fish Distribution (2014) dataset opens a new browser tab. It displays a page with information about accessing the data.

arcGIS page displayed when the API link for the Great Smoky Mountains National Park Fish Distribution (2015) dataset is clicked. Image captured by the author.

Upcoming Articles about Data.gov

Watch for upcoming articles about Data.gov such as these:

  • Write a program that uses the CKAN API to access dataset metadata.
  • Explore a variety of types of datasets.
  • Explore data tools to support the work of data practitioners.
  • Interesting or unusual datasets.

Conclusion

Data.gov provides easy-to-find metadata about datasets managed by all levels of government within the United States. The platform simplifies the process of finding and acquiring data through downloads or APIs.

Data.gov is an example of government done well.

About the Author

Randy Runtsch is a writer, data engineer, data analyst, programmer, photographer, cyclist, and adventurer. He and his wife live in southeastern Minnesota, U.S.A.

Watch for Randy’s upcoming articles on public datasets to drive data analytics insights and decision-making, programming, data analytics, photography, bicycle touring, and more. You can see some of his photographs at shootproof.com and shutterstock.com.

Data Science
Data
Government
API
Analytics
Recommended from ReadMedium