Dummy data is randomly generated data that can be substituted for live data. Whether you are a Developer, Software Engineer, or Data Scientist, sometimes you need dummy data to test what you have built, it can be a web app, mobile app, or machine learning model.
If you are using python language, you can use a faker python package to create dummy data of any type, for example, dates, transactions, names, texts, time, and others. Faker is a simple python package that generates fake data with different data types.
Faker package is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.
In this article, you will learn a different way to create Dummy data by using the Faker python package.
You can install the package with pip as follows:
pip install Faker
Note: From version 4.0.0, Faker dropped support for Python 2 and from version 5.0.0 it only supports Python 3.6 and above.
To create and initialize a faker generator use the Faker() method.
from faker import Faker
fake = Faker()
Now you can start creating different dummy data you want.
You can use the name() method to create full fake names.
for _ in range(10):
print(fake.name())
Mathew Brown
Mrs. Julie Chavez
Calvin Little
Manuel Ponce
Alyssa Jackson DVM
Amy Delgado
Matthew Smith
Sarah Rojas
Crystal Werner
Tina Moore
Note: You can also use the first_name() method to create the first name and the last_name() method to create the last name.
If you are working with dates, faker provides different ways to create fake dates and times. In the following examples, you will learn 10 different ways to create dummy dates and times data.
print(fake.date_between(start_date="-3y",end_date="-1y")) # date between 2018 and 2020
print(fake.month())
print(fake.date_time())
print(fake.year())
print(fake.month_name())
print(fake.date_time_this_year())
print(fake.time())
print(fake.timezone())
print(fake.day_of_week())
print(fake.time_object())
2019-05-31
02
2012-05-31 17:53:01
2002
November
2021-06-30 00:34:48
08:17:51
Africa/Gaborone
Thursday
17:59:37
If you want to create fake personal and identity information you can use the profile and simple_profile methods from the faker library.
The simple_profile method creates a fake basic profile with personal information such as name, gender, mail, and address.
generateProfile = Faker()
generateProfile.simple_profile()
{'username': 'qfowler',
'name': 'Matthew Greene',
'sex': 'M',
'address': 'USNV Lopez\nFPO AA 45803',
'mail': '[email protected]',
'birthdate': datetime.date(1995, 8, 14)}
The profile method creates fake personal profiles and identities such as job, company, residence,blood_group, current_location, and others.
generateProfile.profile()
{'job': 'Designer, television/film set',
'company': 'Murillo, Short and Townsend',
'ssn': '893-14-6729',
'residence': '6596 Daniel Spring Suite 910\nJonesborough, ID 59049',
'current_location': (Decimal('4.2622025'), Decimal('-39.109752')),
'blood_group': 'O-',
'website': ['https://hardin-johnson.org/',
'https://patterson.com/',
'https://george-snyder.info/'],
'username': 'samuelbooth',
'name': 'Shawna Spencer',
'sex': 'F',
'address': '125 Darrell Extension Suite 575\nPort Michaelbury, PA 12381',
'mail': '[email protected]',
'birthdate': datetime.date(1989, 11, 25)}
You can also create more than one profile and save the profile data into a pandas data-frame for analysis. In the following example, we will create 1000 profiles with just 3 lines of code.
import pandas as pd
generateProfile = Faker()
# generate 1000 profiles
data = [generateProfile.profile() for i in range(1000)]
# save profiles in pandas dataframe
df = pd.DataFrame(data)
print(df)
Let’s observe the column names of the 1000 profiles created.
print(df.columns)
Index(['job', 'company', 'ssn', 'residence', 'current_location', 'blood_group',
'website', 'username', 'name', 'sex', 'address', 'mail', 'birthdate'], dtype='object')
We have 13 columns in the dataset. Now you can use the dummy data you generate for data analysis and visualization.
If you are working on a software project, you can use the Faker library to generate fake text data to test some features in your web or mobile app. The Faker library provides 4 different methods to create text data as follows.
(a) Create a Single Paragraph
generateText = Faker()
generateText.text()
'Goal everything traditional to. Suggest stage stop international. Hold line south across new charge national.\nClose money commercial success force. Five decision even environment notice every.'
(b) Create Multiple Paragraphs
generateTexts = Faker()
generateTexts.texts()
['Together require growth wind picture raise. Production task tree consumer recognize personal.',
'Be six whose answer. Mr oil successful under particular option.\nStep nor once rise. Eye thank try stay only test service. Then senior within capital action. Gun already entire sign garden.',
'Painting now term direction. Will inside natural bar purpose major.\nOther hear subject do their. Institution between education would laugh example on. Real statement kid specific able foreign.']
(c) Create a Single Sentence
generateSentence = Faker()
generateSentence.sentence()
'Pass front responsibility.'
(d) Create Multiple Sentences
generateSentences = Faker()
generateSentences.sentences()
['Maintain take star someone could kitchen employee.',
'Pay should own word begin.',
'Citizen place although old despite stay.']
Faker library supports the creation of localized data. You need to pass the locale as an argument to the Faker class, by default it supports en_US locale.
You can find a list of localized providers here.
In the following example, we will create 10 names from China.
fake_local = Faker('zh_CN')
for _ in range(10):
print(fake_local.name())
李小红
赵桂香
陈小红
罗建华
宋华
刘秀芳
郭秀华
朱秀云
金艳
侯琴
You can also set multiple locales from version 3.0.0.
multiple_fake = Faker(['uk_UA', 'en_US', 'ja_JP'])
for _ in range(10):
print(multiple_fake.city())
長生郡長生村
Christieland
Rileyshire
長生郡白子町
Port Curtisborough
Pruittview
селище Одарка
хутір Богодар
село Альберт
横浜市都筑区
In the above example, we created multiple cities from 3 different locations.
To create the same fake data output, you need to seed the fake generator and then you can run the same code.
myGenerator = Faker()
myGenerator.random.seed(1234)
for i in range(10):
print(myGenerator.country())
Slovakia (Slovak Republic)
Kazakhstan
Brazil
Albania
Bermuda
United States Minor Outlying Islands
Western Sahara
Wallis and Futuna
Sri Lanka
Mozambique
Note: You can use any random number as a seed.
If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!
You can also find me on Twitter @Davis_McDavid.
And you can read more articles like this here.
Want to keep up to date with all the latest in python? Subscribe to our newsletter in the footer below.