In 2022, Gartner named Microsoft Power BI the Business Intelligence and Analytics Platforms leader.
With the aid of business intelligence tools like Microsoft Power BI, unstructured data can go through extraction, cleaning, and analysis processes to create insights that help organizations make data-driven decisions.
In this article, we will look at the 13 Best Datasets for Power BI Practice, which are essential in helping data professionals build their proficiency in Power BI.
The Sample Superstore Sales dataset provides sales data for a fictional retail company, including information on products, orders and customers.
This dataset includes the following variables:
The Adventure Works DW is a sample database for Microsoft SQL Server Analysis Services (SSAS). It offers a dimensional data model for a fictional bicycle manufacturer, Adventure Works Cycles. It also comprises information on product catalogues, sales, customer demographics and time-based data for analysis & reporting.
This dataset includes the following variables:
Customer -This includes customer demographics, such as age, gender, education, and income.
Sales - This includes sales information, such as sales territory, salesperson, and order date.
Product - This includes product categories, subcategories, and product names.
Date -This includes the date and related attributes such as quarter, month, day, and day of the week.
Geography - This includes customers' state, city, postal code and sales orders.
To download this dataset, you can click
This real-world dataset comprises data on flight numbers, departure, airlines, arrival times and the reason for any delays or cancellations. With this dataset, Power BI users perform data analysis and create interactive dashboards to identify the most common causes of flight disruptions by studying the frequency of cancellations by airline and flight delays.
It comprises the following variables:
NYC Taxi Data is a rich and complex dataset that contains info on taxi trips in New York City, including trip durations, fare amounts, and pickup and drop-off locations. It covers millions of trips and spans several years, providing a rich source of information about urban mobility and transportation patterns in the city.
By analyzing this data, you can gain insights into various areas of the taxi industry in NYC. For example, you can visualize the distribution of trips over time and space, and identify hot spots of taxi activity in the city.
The dataset includes the following variables:
Trip Duration - The duration of the trip, in seconds.
Trip Distance - The distance travelled by taxi, in miles.
Number of Passengers - Total number of passengers in the taxi.
Fare Amount - The fare charged to the passenger, in dollars.
Payment Method - The method of payment used by passengers (e.g credit card, cash etc.).
Pickup and Drop-off Location - The GPS coordinates of the pickup and drop-off locations.
Trip Type - This indicates whether the trip is a dispatched trip (green taxi or for-hire) or a street hail (yellow taxi).
Pickup and Drop-off Time - The time and date at which the pickup and drop-off took place.
To download this dataset, click
The Global Superstore dataset is a simulation of retail sales operations with stores in multiple countries. It includes information about customers, orders and products, which is particularly useful for exploring retail sales data, as it offers a large and diverse set of data that can be used to analyze customer behaviour, product performance and sales patterns.
It comprises the following variables:
Order ID - A unique identifier for each order.
Order Date - The date and time the order was placed.
Ship Date - The date and time the order was shipped.
Ship Mode - The method used to ship the order (e.g. standard, express).
Customer ID - A unique identifier for each customer.
Customer Name - The full name of the customer.
Segment - The customer segment such as Home Office or Corporate.
Country - The country where the customer resides.
City - The city where the customer resides.
State - The state where the customer resides.
Postal Code - The postal code of the customer's residence.
Region - The geographic region where the customer resides.
Product ID - A unique identifier for each product.
Category - The broad product category, such as Furniture, Office Supplies, or Technology.
Sub-Category - The specific product sub-category, such as Chairs, Paper, or Phones.
Product Name - The name of the product.
Sales - The total sales revenue for the product.
Quantity - The number of units of the product sold.
Discount - The discount applied to the product.
Profit - The total profit earned from the product.
To download this dataset, click
This dataset is a comprehensive dataset which provides historical weather information for the Seattle, Washington area. It can be used to study the climate and weather patterns as well as weather’s impact on various industries and activities, such as tourism, agriculture and transportation.
Some of the critical variables in the Seattle Weather Data include:
This dataset contains information on GDP, life expectancy, and literacy rates for various nations throughout the world. It also includes many economic and social variables.
Some of the variables included in this dataset are:
Gross Domestic Product (GDP)
Inflation
Unemployment rate
Government debt
Trade balance
Life expectancy
Infant mortality rate
Access to electricity
Literacy rate
Mobile cellular subscriptions
Note: The variables included in the dataset depend on the year and the country being analyzed.
You can download the dataset directly from the
The US Health Dataset provides comprehensive information on health behaviour and health status, including data on healthcare utilization, physical activity and chronic diseases. It can be used to study trends in public health and to investigate the impact of lifestyle and health behaviour on health outcomes.
The US Health Data is sourced from the Centers for Disease Control and Prevention (CDC), the National Center for Health Statistics (NCHS), and the Agency for Healthcare Research and Quality (AHRQ).
The common variables in this dataset include:
Demographic information - Age, gender, race, and ethnicity
Health status indicators - Self-reported health, chronic conditions, and disability
Healthcare utilization measures - Hospitalizations, emergency room visits, and primary care visits
Health behaviours - Smoking, exercise, and diet
Health outcomes - Life expectancy, mortality rates, and incidence of specific diseases
Healthcare costs - Total medical expenditures, out-of-pocket costs, and insurance coverage
Access to healthcare, including insurance coverage, availability of healthcare providers, and proximity to healthcare facilities
Note: Variables included in the US Health Dataset can vary depending on the data source.
Stack Overflow Survey Results contain results from the annual Stack Overflow developer survey. It includes various aspects of developer experience, such as salary and compensation, preferred technologies, work satisfaction etc. It can be used to explore and gain insights into the state of the developer community.
This dataset contains a large number of variables, including but not limited to the following:
Personal Information - Age, gender, country, and education level.
Employment - Employment of employment, company size, and job title.
Development Experience - Years of experience, primary programming language, and development environment.
Salary and Compensation - Salary, currency, and benefits.
Work Satisfaction - Job satisfaction, career satisfaction, and job search.
Technology Usage - Preferred operating system, programming language, development environment, and tooling.
Community Involvement - Contributions to open-source projects, Stack Overflow reputation, and participation in developer communities.
The dataset can be downloaded directly from the
This popular open-source dataset offers information on the passengers onboard the Titanic ship when it sank on April 15, 1912.
Some of the variables included in the dataset:
PassengerId - A unique identifier for each passenger.
Survived: This shows whether the passenger survived or not (0 = No, 1 = Yes).
Pclass: A passenger's class (1 = 1st, 2 = 2nd, 3 = 3rd).
Name - A passenger's name.
Sex - A passenger's gender.
Age - A passenger's age.
SibSp - The number of siblings/spouses aboard.
Parch - The number of parents/children aboard.
Ticket - The ticket number.
Fare - The fare paid for the ticket.
Cabin - The cabin number.
Embarked - The port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
You can download the dataset on
The Wine Quality dataset contains information on red and white wine samples. The goal of this Power BI dataset is to classify the quality of the wine based on chemical properties like pH, density, alcohol content and citric acid content.
The common variables included in this dataset:
Fixed Acidity - The number of fixed acids in the wine, expressed in g/dm^3.
Volatile Acidity - The number of volatile acids in the wine, expressed in g/dm^3.
Citric Acid - The amount of citric acid in the wine, expressed in g/dm^3.
Residual Sugar: The amount of residual sugar in the wine, expressed in g/dm^3
Chlorides - The amount of chloride in the wine, expressed in g/dm^3.
Free Sulfur Dioxide - The amount of free sulfur dioxide in the wine, expressed in mg/dm^3.
Total Sulfur Dioxide - The amount of total sulfur dioxide in the wine, expressed in mg/dm^3.
Density - The density of the wine, expressed in g/cm^3.
pH - The pH level of the wine.
Sulphates - The number of sulphates in the wine, expressed in g/dm^3.
Alcohol - The alcohol content of the wine, expressed in % vol.
Quality - The quality rating of the wine, on a scale of 0 to 10.
You can download the dataset from
The US Crime Rates dataset provides information on crime rates in the United States. It is organized based on geographical region, period or other relevant factors and is mostly used to analyze crime trends and patterns or as well to support criminal justice decision-making and law enforcement. It is also commonly used for exploratory data analysis and visualization and can be used to create interactive dashboards and reports in Power BI.
Some of the variables included in the dataset:
M - The percentage of males aged 14–24.
Po1 - The per capita expenditure on police protection in 1960.
Po2 - The per capita expenditure on police protection in 1959.
M.F - The number of males per 100 females.
You can download the dataset from
This dataset is a collection of data on Airbnb listings, including price, amenities, type of property, number of bedrooms and location in New York City. It is commonly used for exploratory data analysis and visualization, with a focus on the distribution of listings and prices across different locations and neighbourhoods.
Some of the variables included in the dataset:
Id - Airbnb's unique identifier for the listing.
Host Id - Airbnb's unique identifier for the host.
Host name - The name of the listing.
Neighbourhood Group - The neighbourhood group e.g Manhattan, Brooklyn etc.
Host identity verification - This shows if the host identity is either verified or unconfirmed.
The dataset can be accessed on Kaggle by clicking
Healthcare analytics, comparison of states, analysis of healthcare spending and outcome.
Quality analysis, prediction of wine quality based on its chemical properties, wine preference analysis and recommendations.
These datasets and common use cases will help you better understand the role of Power BI in helping organizations make smarter, real-time decisions.
They are also available for anyone to download and use freely.