Python Faker — How to Generate Fake Data Easily
This article is part of the Python Libraries Series. Find more below!
I’ve talked a lot about data science previously. One common need in data science is to gather a lot of data to train models. More generally, generating realistic sample data is a common need in software development. From building a prototype to testing an application, or populating a database for demonstration purposes, generating fake data can save you time and effort.
Today, we’ll discover Faker, a library you can use for this task!
Getting Started with Faker
Let’s start with installing Faker.
pip install FakerDepending on your environment, Faker may eventually not work. I don’t know why. For me, it works everywhere, except when I’m running it from a PyCharm environment. If it doesn’t work, just try to reinstall it, or create a new environment.
Now, let’s discover the basic Faker syntax.
Faker provides a wide range of data types that you can generate, including names, addresses, phone numbers, email addresses, dates, and much more. To generate fake data, you create an instance of the Faker class and call its methods to generate specific data types.
from faker import Faker
fake = Faker()
name = fake.name()
print(name)Faker also allows you to customize the generated data to suit your needs. You can specify the locale, which determines the language and region of the generated data. For example, if you want to generate data in French, you can pass the locale as a parameter when creating the Faker instance:
from faker import Faker
fake = Faker('fr_FR')
name = fake.name()
print(name)In addition to locales, Faker provides various methods to customize the generated data. You can set the seed value to generate the same data repeatedly, generate random numbers within a specific range, and even create custom providers to generate data specific to your domain.
Generating Basic Sample Data
We’ve seen how to generate a sample name. We can also generate a company name using fake.company() .
company_name = fake.company()
print(company_name)In addition to names, you can generate random addresses using the address method. You can also generate phone numbers with fake.phone_number() . Or emails, with fake.email() . It’s always the same syntax!
Working with Specific Data Types
In addition to generating basic sample data like names and addresses, Python Faker can also generate specific data types, such as dates, times, text, and numbers.
Let’s start with dates and times. To generate a random date, you can use the date_between method. Look, here is the syntax:
from faker import Faker
fake = Faker()
start_date = "2022-01-01"
end_date = "2022-12-31"
random_date = fake.date_between(start_date=start_date, end_date=end_date)
print(random_date)For generating times, it’s easier, we just have to use fake.time()
Then, for creating dummy text and paragraphs, we have some options.
First, we can use fake.word() to generate a single word. More interesting, we can use fake.sentence() to generate a sentence. And even more interesting, we can generate complete paragraphs with fake.paragraph() !
paragraph = fake.paragraph(nb_sentences=3)
print(paragraph)And here is the output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam condimentum risus sed velit porttitor, eu rhoncus lorem laoreet. Sed tempor erat ac erat semper, at interdum est euismod.Then, we can also generate numbers or currencies. It’s as easy as before:
random_number = fake.random_number(digits=3)
print(random_number)
# 724For generating currency values, you can use the currency_code and currency_symbol methods.
currency_code = fake.currency_code()
print(currency_code)
# USD
currency_symbol = fake.currency_symbol()
print(currency_symbol)
# $Customizing and Localizing Sample Data
To ensure consistent data generation, you can set a seed value in Python Faker. This seed value guarantees that every time you run your code, the same set of random data will be generated.
It’s particularly useful for testing, or for showing your code to someone.
from faker import Faker
fake = Faker()
fake.seed(1234)
name = fake.name()
print(name)
address = fake.address()
print(address)Then, Faker allows you to customize the generated data by applying specific patterns. This customization can be useful when you need to generate data that follows a particular format or structure.
For example, let’s say you want to generate random phone numbers in a specific format. You can use the numerify method, which replaces hash marks (#) in a string with random digits. Here's an example:
phone_number = fake.numerify(text="###-###-####")
print(phone_number)
# 987-654-3210Similarly, if you want to generate random email addresses with a specific domain, you can use the bothify method. It replaces question marks (?) in a string with random alphanumeric characters. Here's an example:
email_address = fake.bothify(text="[email protected]")
print(email_address)
# [email protected]Finally, we’ve already talked about localizing data, but I’ll show it to you one more time with an example so that you really see how it works.
from faker import Faker
fake = Faker("fr_FR")
name = fake.name()
print(name)
address = fake.address()
print(address)
phone_number = fake.phone_number()
print(phone_number)And here is the output:
Raymond Noël de la Garcia
15, avenue Leconte
31477 Henry-sur-Cordier
0486538507Raymond Noël, a perfectly French name!
Advanced Features — Providers
Let’s say you want to generate data with complex relationships between entities. This can be beneficial when creating interconnected datasets.
For instance, if you need to generate data with relationships between customers and orders, you can use the Factory class provided by the faker.providers module. This class enables you to define custom data providers that generate data with specific relationships.
For example, to generate customers and orders with a one-to-many relationship:
from faker import Faker
from faker.providers import BaseProvider
fake = Faker()
class CustomProvider(BaseProvider):
def customer(self):
return {
"name": fake.name(),
"email": fake.email(),
"address": fake.address()
}
def order(self, customer):
return {
"customer": customer,
"product": fake.word(),
"quantity": fake.random_int(min=1, max=10)
}
fake.add_provider(CustomProvider)
customer = fake.customer()
order = fake.order(customer)
print(customer)
print(order)Here is a sample output, if you’re curious:
>>> print(customer)
{'name': 'Courtney Wolfe', 'email': '[email protected]', 'address': '7284 Daniel Islands\nNorth Eddie, KS 45545'}
>>> print(order)
{'customer': {'name': 'Courtney Wolfe', 'email': '[email protected]', 'address': '7284 Daniel Islands\nNorth Eddie, KS 45545'}, 'product': 'inside', 'quantity': 5}Then, you can handle unique constraints and data validation with Faker. This ensures that the generated data adheres to specific rules.
To handle unique constraints, you can use the unique decorator provided by Python Faker. This decorator ensures that each generated value is unique within a specified context. Here's an example:
from faker import Faker
from faker.providers import BaseProvider
fake = Faker()
class CustomProvider(BaseProvider):
@fake.unique
def username(self):
return fake.user_name()
fake.add_provider(CustomProvider)
username1 = fake.username()
username2 = fake.username()
print(username1)
print(username2)For data validation, Python Faker allows you to use built-in validators or custom validation functions. These validators ensure that the generated data satisfies specific criteria. Here’s a sample code:
from faker import Faker
from faker.providers import BaseProvider
from faker.utils import decorators
fake = Faker()
class CustomProvider(BaseProvider):
@decorators.slug
def slug(self):
return fake.slug()
fake.add_provider(CustomProvider)
slug = fake.slug()
print(slug)
# give-improve-happyFind more in the doc!
More Examples
Let’s end this article with some more examples.
Generating e-commerce data:
from faker import Faker
fake = Faker()
product_name = fake.word()
product_description = fake.paragraph()
price = fake.random_int(min=10, max=100)
customer_name = fake.name()
customer_email = fake.email()
print(product_name)
print(product_description)
print(price)
print(customer_name)
print(customer_email)Generating user data for social media platforms:
from faker import Faker
fake = Faker()
username = fake.user_name()
email = fake.email()
birthdate = fake.date_of_birth(minimum_age=18, maximum_age=65)
profile_picture = fake.image_url()
print(username)
print(email)
print(birthdate)
print(profile_picture)Generating sample data for financial applications:
from faker import Faker
fake = Faker()
account_number = fake.bban()
transaction_amount = fake.random_int(min=10, max=1000)
currency_code = fake.currency_code()
print(account_number)
print(transaction_amount)
print(currency_code)As you can see in the examples, there are some methods I have not talked about. That’s why I think you should have a look at the documentation (find it here), I can’t talk about everything about Faker, it would be too long.
Final Note
I discovered Faker only a few weeks ago and I already love it. It would have saved me so much time if I had known about this library before.
That’s why I’m sharing it with you. It is not very well known and yet it is very useful for some specific tasks. Now, I hope you’ll have fun generating data with Faker!
If you want to discover other Python libraries, click below!
If you liked the story, don’t forget to clap and maybe follow me if you want to explore more of my content :)
You can also subscribe to me via email to be notified every time I publish a new story, just click here!
If you’re not subscribed to medium yet and wish to support me or get access to all my stories, you can use my link:






