Working with Spreadsheets using Python (Part 1)

Written by i_tarun_gupta | Published 2017/02/05
Tech Story Tags: python | microsoft-excel | spreadsheets | experience | technology

TLDRvia the TL;DR App

Working with Python is a bliss in itself, most of the times. I have been involved with Python for more than two and a half years now, and for most of that time I hardly had any issues until I had to work with spreadsheets.

Python has easy to write syntax and dynamic style, which is why it is a great language for beginners to start with. But the benefits do not end there. Python can, and is used in many big products, such as the framework Django.

Django powers social media apps such as Instagram. Other examples are BitTorrent, Google App Engine, and Ubuntu Software Center. Even the core functionality of YouTube is written in Python.

All these examples testify to the robustness and usefulness of Python.

Being a great scripting language, developers prefer to use Python in scenarios where they have to play with and manipulate data. In my experience of working in a product based company, I have come across projects that require the data collected from products to be reported back to the clients.

This requires fetching data from the database and manipulating it into a presentable form, which is generally in a spreadsheet format. Having some experience with Python, I decided to apply this to making spreadsheets.

These are tips for those who work with an abundance of data and want to harness that using spreadsheets. I have come across a multitude of issues as a developer and would like to share some of those here, and how you can work your way around them.

There are many third-party packages available to work with spreadsheets using Python. Some of them are listed at:

Python Excel_A website for people who need to work with Excel files in Python_www.python-excel.org

You may know that Microsoft Excel provides two file extensions for spreadsheets, one is .xls and other being .xlsx . The former corresponds to files for Microsoft Excel 2003 or earlier, and the latter is for the newer versions.

The first problem that arises in working with spreadsheets using Python is because of these two different extensions. The package xlwt supports the .xls extension of Excel, and openpyxl supports the .xlsx extensions of Excel.

I will be demonstrating how to use the xlwt package for working with spreadsheets. You can install this package using the following command on terminal or command prompt (Prerequisite: pip must be installed on your system. For installation instructions of pip , visit this link)

pip install xlwt

Note: This article uses xlwt==1.3.0

After the installation is complete, open a text editor to create a new file and give it any name you like; I am giving it the name excelScript.py .

Inside the script, write the following code:

import xlwt

def main():book = xlwt.Workbook()sheet = book.add_sheet('Sheet 1')book.save('Sample.xls')

if __name__ == '__main__':main()

The code above creates a blank Excel file with the name “Sample.xls” , containing a single sheet named “Sheet 1". This was an example of simple .xls file creation. You can run the script above using

python excelScript.py

in your command line.

To write data to the Excel sheet, change the code to the following:

import xlwt

def main():book= xlwt.Workbook()sheet = book.add_sheet('Sheet 1')

# sheet.write(r, c, <text>)  
sheet.write(0, 0, 'sample')

book.save('Sample.xls')

if __name__ == '__main__':main()

This will write “sample” on zeroth row and zeroth column. Keep in mind that the xlwt package reads row and column indices starting at zero. This way you can write into your Excel file and create as many sheets you like. Run this script using the command

python excelScript.py

in your command line. This is a common way to write to Excel using the xlwt package where you can merge the columns or rows. You can give it styling also. The following code demonstrates how to do so:

import xlwt

def main():book= xlwt.Workbook()sheet = book.add_sheet('Sheet 1')

# sheet.write\_merge(r1, r2, c1, c2, <text>)  
# For styling add  
# xlwt.easyxf('align: horz center, vert center')  
# add the above after <text> as 6th argument to write\_merge

sheet.write\_merge(0, 1, 0, 1, 'sample 1', xlwt.easyxf('align: horz center, vert center'))

book.save('Sample.xls')

if __name__ == '__main__':main()

This will merge columns (0, 1) and rows (0, 1) to make a single column and write “sample 1” into it which will be both horizontally and vertically center aligned. xlwt.easyxf does the styling for you. If you open the xls file, you will see that the previously written sample is gone.

For information on more methods provided by xlwt you can read the documentation here.

Please note that if you use _xlwt_ and if a file with the same name exists then it will be replaced.

The procedure above is good if you want to create a new .xls file, and if you are working with an older format of the Microsoft Excel spreadsheet.

However, this package has its limitations as you can only create 65536 rows and 256 columns in the older versions of Excel. Similarly, you can write only these many rows and columns using this package. If you have more data to write than this, then unfortunately this package cannot help you. You will have to use another Python package called openpyxl .

openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files - openpyxl 2.4.2 documentation_Edit description_openpyxl.readthedocs.io

You can read more about how to use openpyxl to work with Excel files in part 2 of this article.

Continuing with the xlwt package, I found that it is very easy to create new .xls files. But what if you already have an existing .xls file and want to append new data to the existing file. In such a case the xlwt package does not comply. Instead, you have to work around this. Note that if you have a .xls file then you cannot use openpyxl , which does support data appending into already existing files, because that is incompatible with older versions of Excel files.

For the fix I am going to explain, you need two more Python packages. Namely, the xlrd package for reading the Excel file, and the xlutils package to make a copy of the Excel object into the xlwt object so that you can write into it.

To install these packages use the following commands:

pip install xlrdpip install xlutils

The following procedure can be applied to append data to an existing .xls file using xlwt. In the excelScript.py file, change the code to the following:

from xlutils.copy import copyimport xlrdimport xlwt

def main():rb = xlrd.open_workbook('sample.xls')

wb = copy(rb)  
sheet = wb.get\_sheet(0)  
 
sheet.write(5, 5, 'sample 2')  
sheet.write\_merge(3, 4, 3, 4, 'sample 3', xlwt.easyxf('align: horse center, vert center'))

wb.save('sample.xls')

if __name__ == '__main__';main()

When you run this script using the

python excelScript.py

command, and open the Excel file created, you will notice that “sample 1” was written to .xls previously present there and “sample 2” and “sample 3” were also added to that file. Voila! This is the functionality we needed. But how did we achieve this?

Here is what happened. First you read “sample.xls” using the xlrd package and saved it into a variable rb in an object format. Then you copied that object into a variable wb using the copy method of xlutils package. This means that at that moment, both variables rb & wb contain the same object data. Now, we access the first sheet of Excel using the index zero and store its object data into variable sheet , and now you can use all the functional capabilities of the xlwt package to finally save your Excel sheet with the same name as before, “sample.xls”.

If you have carefully followed then you may have figured out that we really didn’t append data to an existing file. Instead what we did was to copy the data of the existing file into memory using xlrd , then writing it to a new Excel file using the copy method of xlutils package, and then writing the new data into that new Excel file, while saving it with same name as before so that it can be overwritten.

I am afraid there isn’t a more elegant way to achieve this if you are working with older .xls files.

The story doesn’t end here. This was all about working with .xls files but what if you wanted to do the same thing but this time you have got .xlsx files to work with. In such a case, you are in luck because you can work with openpyxl. You can read more about it in part 2 of this article.

Hope you found this useful! See you in part 2.

Thanks for reading!

If you enjoyed reading this and would like to read some pieces of life through my eyes, please visit here.


Published by HackerNoon on 2017/02/05