How to Generate Realistic (Fake) Data for Your Projects Using Mockaroo
Mockaroo, a beginner-friendly UI for creating randomized datasets, enables aspiring professionals to simulate authentic data for analyses and ML models. We’ll use Mockaroo to create a sample dataset for a data engineering project.

Disclaimer: I‘m not affiliated with Mockaroo; I’m just an admirer of the tool.
Purpose
Mockaroo’s free version allows users to generate up to 1000 rows of data in CSV, XML, JSON and SQL formats while choosing from 150+ data types. Prior to Mockaroo, I downloaded data from Kaggle. While Kaggle provided a range of datasets, they were preprocessed, depriving beginning data engineers, scientists and analysts of an authentic development experience. I used Mockaroo particularly when I was learning SQL because it was one of the few tools I knew of that could generate and export SQL data.
How it Works: Mockaroo

The first step to using Mockaroo is to define a schema. Mockaroo supports standard data types like string, integer, etc. It also supports specialized data types like healthcare codes and cryptocurrency values. One of the best aspects of Mockaroo for data engineers looking for data preprocessing experience is the ability to specify the percentage of blank values. Users can use Mockaroo’s conditional syntax to create custom values within the provided fields.

I chose my fields and specified blank values between 5–30% per column. This means that when I import the data into Python I’ll get NaN values that I can filter. Alternatively, in SQL, I would get Null for these values. In addition to standard data types, I also generated a column with empty arrays.

The final step is to download and import into Python as a dataframe that you can use to practice cleaning, sorting and aggregations on realistic data.

Mockaroo API
While Mockaroo’s flagship product is its static data generator, it also allows users to create mock APIs to help developers preview the infrastructure of their APIs prior to deployment. Users can create a dummy URL, path variables, query strings and entity bodies like they would for a real API.

Take-Away
Mockaroo and tools like it help beginner data engineers and data science students quickly create data so that they can focus on developing the programmatic, logical and analytic skills necessary to excel in a data career. Although Mockaroo is a good place to quickly generate dummy data or learn about data types, in order to advance in any data discipline, you’ll want to work toward creating datasets through aggregation and web scraping.
Create a job-worthy data portfolio. Learn how with my free project guide.





