avatarFrancisco Luna

Summary

This webpage provides a list of five datasets that can be used to practice data cleaning, along with their respective links and brief descriptions.

Abstract

The webpage titled "5 Datasets to Practice Data Cleaning" offers a collection of datasets that can be used for data cleaning practice. Each dataset is accompanied by a brief description and a link to its source on Kaggle. The datasets cover various topics, including movies and TV shows from IMDb and Netflix, food choices of college students, data science job postings on Glassdoor, audiobooks from Audible.in, and salaries of different positions in the Mexican federal government. These datasets are ideal for anyone looking to improve their data cleaning skills and gain experience working with diverse datasets.

Bullet points

  • The webpage lists five datasets for practicing data cleaning.
  • The first dataset is about movies and TV shows from IMDb and Netflix.
  • The second dataset focuses on the food choices of college students.
  • The third dataset contains data on data science job postings on Glassdoor.
  • The fourth dataset is about audiobooks from Audible.in.
  • The fifth dataset provides information on salaries of different positions in the Mexican federal government.
  • All datasets are available on Kaggle.

5 Datasets to Practice Data Cleaning

Photo by Brooke Lark on Unsplash

1. Movies Dataset

This dataset is from web scraping from IMDb top Netflix Movies and TV Shows.

Link: https://www.kaggle.com/datasets/bharatnatrayn/movies-dataset-for-feature-extracion-prediction?select=movies.csv

2. Food choices

Of the food choices of college students, the inspiration is to try to understand how important nutrition information is.

Link: https://www.kaggle.com/datasets/borapajo/food-choices?select=food_coded.csv

3. Data Science Job Posting on Glassdoor

The title self-explained everything. This Data set is about jobs on glassdoor for Data scientists.

Link: https://www.kaggle.com/datasets/rashikrahmanpritom/data-science-job-posting-on-glassdoor?select=Uncleaned_DS_jobs.csv

4. Audible Dataset

The data have been scrapped from Audible.in website. It also contains the code used to get the data.

Link: https://www.kaggle.com/datasets/snehangsude/audible-dataset?select=audible_uncleaned.csv

5. Mexican Federal Government Salaries

It contains data about the salaries of different positions in the federal government of Mexico.

Link: https://www.kaggle.com/datasets/ivansabik/mexican-federal-government-salaries

Web Scraping
Data Cleaning
Data Science
Python
Recommended from ReadMedium