paint-brush
How to Automate Spreadsheets With Python and Openpyxlby@thisisjessintech
827 reads
827 reads

How to Automate Spreadsheets With Python and Openpyxl

by Jess in TechOctober 25th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Learn how to perform simple data automation operations, such as Accessing Cell Values, Conditional Formatting, and working with Formula and Calculations.
featured image - How to Automate Spreadsheets With Python and Openpyxl
Jess in Tech HackerNoon profile picture

Spreadsheets have been a critical tool for managing information for individuals and organizations. However, manual spreadsheet tasks can be time-consuming and error-prone. This guide will show you how Python and Openpyxl, a powerful Python library for working with Excel files, can revolutionize how you manage your spreadsheets. We'll also explore the benefits of using Python for spreadsheet tasks and how Openpyxl can help you automate many tedious and error-prone spreadsheet tasks.


In this article, we will learn how to perform simple data automation operations, such as Accessing Cell Values, Conditional Formatting, and working with Formulas and Calculations. You will be able to streamline your routine tasks with spreadsheets.

Getting Started

Let’s start with setting up your environment.

Installing Python

You can get your operating system's most recent version of Python from the official Python website. After that, simply adhere to the installation instructions to complete the process.

Installing Openpyxl

After that, add Openpyxl to your toolbox. Open your terminal or command prompt and use the following command to install Openpyxl using pip, Python's package manager.


pip install openpyxl


If your installation is successful, you will see the following output.



Now you can use Openpyxl to automate your spreadsheet operations.

Basic Spreadsheet Operations

Imagine you have an Excel sheet filled with valuable data, such as employee information. For this tutorial, we will use the Excel sheet provided by The Spreadsheet Guru. You can download it from the given link, and it will contain the following dataset. We will use this Excel sheet to show how to do various spreadsheet operations with Openpyxl.


Loading and Accessing Data

Let’s start with accessing the Excel sheet we downloaded earlier to show how to load an Excel sheet on your local machine. Here is the folder structure of my project. Both the employee_data file and the working.py file are in the same directory.



To load the Excel file with Openpyxl, execute the following Python code.


import openpyxl

wb = openpyxl.load_workbook('employee_data.xlsx')


It will not provide any output. But the code will be executed without any errors.

Accessing Cell Values

One of the easiest ways to prove that the above code opens the file is to access and display a cell value. Let's print the full name of the employee in cell B2. To access a certain cell value, you must first access the sheet it belongs to. In this case, the name of the sheet is Data.


import openpyxl

#Load the Excel file
wb = openpyxl.load_workbook('employee_data.xlsx')

#Access the sheet
sheet = wb['Data']

#Access the cell value
first_name = sheet['B2'].value

#Print the cell value
print(f"The first name is: {first_name}")


You will get the following output.



This is a simple example of accessing cell values in an Excel sheet. Now, let’s do something more complex.


Suppose you want to display all employees' full names and annual salaries stored in columns B and J, respectively. Python can help you achieve this task with the following code:


import openpyxl

#Load the Excel file
wb = openpyxl.load_workbook('employee_data.xlsx')

#Access the sheet
sheet = wb['Data']

for row in sheet.iter_rows(min_row=2, values_only=True):
    full_name = f"{row[1]}"
    annual_salary = row[9]
    print(f"Employee: {full_name}, Annual Salary: {annual_salary}")


Once you execute this code, you will get the following output.



Let’s try another example. Filtering is one of the everyday operations you perform with Excel data. Let's filter all employees whose names start with 'E'.


import openpyxl

wb = openpyxl.load_workbook('employee_data.xlsx')

sheet = wb['Data']

for row in sheet.iter_rows(min_row=2, values_only=True):
    full_name = f"{row[1]}"
    annual_salary = row[9]
   
    # Filter employees whose name starts with 'E'
    if full_name.startswith("E"):
        print(f"Employee: {full_name}")


This code will give the following output.


Modifying Cell Values

Modifying cell values is a fundamental operation you will frequently perform in Excel. Here's an example to illustrate this. You must change an employee's phone number or address using a program. Let's say everyone previously based in Beijing is now relocating to Miami, and you want to modify the data accordingly and save it to a new Excel file. How can you make this change for multiple employees in one go?


import openpyxl

wb = openpyxl.load_workbook('employee_data.xlsx')

sheet = wb['Data']

# Change cell values from Beijing to Miami
for row in sheet.iter_rows(min_row=2, min_col=13, max_col=13):
    for cell in row:
        if cell.value == 'Beijing':
            cell.value = 'Miami'

 # Save the modified workbook
wb.save('employee_data_updated.xlsx')  


In this code snippet, we created a new Excel file called employee_data_updated, where all the employees who lived in Beijing moved to Miami.

Advanced Operations

In this section, we will investigate more challenging operations with Openpyxl.

Conditional Formatting

Excel's conditional formatting function is very effective. Suppose you want to highlight in green all employees earning more than $50,000 annually. Follow these steps to set up the conditional formatting rules:


import openpyxl
from openpyxl.styles import PatternFill

wb = openpyxl.load_workbook('employee_data.xlsx')

sheet = wb['Data']

# Define the green fill style
green_fill = PatternFill(start_color='00FF00', end_color='00FF00', fill_type='solid')

# Iterate through rows starting from the second row (assuming the headers are in the first row)
for row_number, row in enumerate(sheet.iter_rows(min_row=2, values_only=True), start=2):
    annual_salary = row[9]  
   
    # Check if the Annual Salary is greater than $50,000
    if annual_salary > 50000:
        # If the condition is met, apply a green fill to the Annual Salary cell
        cell = sheet.cell(row=row_number, column=10)  
        cell.fill = green_fill

wb.save('employee_data_updated.xlsx')


You can easily understand the code by reading the comments. Once you run the code above, you will see the following change in your new Excel sheet file.

However, if you want to make salaries greater than $50,000 bold and italic, you can use the following code:



import openpyxl
from openpyxl.styles import Font

wb = openpyxl.load_workbook('employee_data.xlsx')

sheet = wb['Data']

# Define the bold and italic font styles combined
bold_italic_font = Font(bold=True, italic=True)

for row_number, row in enumerate(sheet.iter_rows(min_row=2, values_only=True), start=2):
    annual_salary = row[9]

    if annual_salary > 50000:
        # If the condition is met, apply the bold and italic font style to the Annual Salary cell
        cell = sheet.cell(row=row_number, column=10)
        cell.font = bold_italic_font

wb.save('employee_data_updated.xlsx')


Here is the output of it.


Formulas and Calculations

Excel is well known for its formula capabilities. With Openpyxl, you can automate the application of these formulas. Let’s find the sum, average, and median of all salaries. You can find all salaries between the J2 cell and the J1001 cell. Although we can calculate these values using Python, we will use the Excel formula “=SUM(J2:J1001)” to show you how to automate Excel formulas.


import openpyxl

wb = openpyxl.load_workbook('employee_data.xlsx')

sheet = wb['Data']

# Insert labels and formulas to find the sum, average, and median
sheet['O2'].value = "Sum"
sheet['P2'].value = "=SUM(J2:J1001)"

sheet['O3'].value = "Average"
sheet['P3'].value = "=AVERAGE(J2:J1001)"

sheet['O4'].value = "Median"
sheet['P4'].value = "=MEDIAN(J2:J1001)"

wb.save('employee_data_updated.xlsx')


The updated Excel file displays equations on P2, P3, and P4 cells.


Batch Processing of Multiple Files

Consider a folder containing several Excel files, each requiring the same data processing steps. You can automate this batch processing with Python.

import os
import openpyxl

# Define the directory containing Excel files
directory = 'excel_files/'

# Iterate through files in the directory
for filename in os.listdir(directory):
    if filename.endswith('.xlsx'):
        file_path = os.path.join(directory, filename)
       
        # Load and process each file
        wb = openpyxl.load_workbook(file_path)
        # Perform operations…
       
        wb.save(file_path)


The processing of numerous Excel files is automated by this script, which helps you save time and effort. You need to update the directory paths according to your requirements and write the operations you need to perform.

Combining Data from Multiple Sheets

Another routine Excel task you will encounter is combining data from various sheets or workbooks. Python can make this process easier. For example, if you have sales figures for multiple quarters in separate workbooks, you can combine them using Python.


import openpyxl

# Load the target workbook where data will be consolidated
target_wb = openpyxl.load_workbook('combined_sales.xlsx')
target_sheet = target_wb['CombinedData']

# List of source workbooks
source_files = ['sales_data_q1.xlsx', 'sales_data_q2.xlsx']

for source_file in source_files:
    source_wb = openpyxl.load_workbook(source_file)
    source_sheet = source_wb.active  # Assuming data is in the first sheet

    # Append rows from source to target, skipping the header
    for row in source_sheet.iter_rows(min_row=2, values_only=True):
        target_sheet.append(row)

# Save the combined data
target_wb.save('combined_sales.xlsx')


With the help of this code, you can merge data from many sheets into one. You need to update the file names according to your needs.

Best Practices

Let’s see some of the best practices you can follow when automating Spreadsheet Operations with Python and Openpyxl.

Optimizing Code Efficiency

Efficiency is crucial, especially when working with enormous datasets. Use effective methods, eliminate unnecessary operations, and consider implementing multithreading or multiprocessing for parallel processing when optimizing your code.

Version Control and Collaboration

Using version control tools like Git allows you to keep track of changes, collaborate effectively, and keep a revision history for your automation scripts.

Documentation and Comments

With comments and docstrings, thoroughly document your code. It will help others to quickly understand your automation scripts and help you maintain your docs as they get bigger.

Testing and Validation

You must thoroughly test your automation scripts in a controlled environment before deploying them in production. Make sure they perform as expected and are capable of handling a variety of situations.


You can learn about other third-party packages for working with spreadsheets using Python in this article.

Conclusion

Using Python and openpyxl to automate spreadsheets has many benefits. Python and Openpyxl offer the required tools and flexibility, whether creating reports, conducting data analysis, or speeding up data validation.


You can gain fresh insights into data management and save time, money, and effort by becoming an expert in spreadsheet automation. We expect you to learn how to use openpyxl and Python to create innovative spreadsheets and to use automation as a fundamental skill in your daily work.