While this post seeks to break down the concept of a generator, it is assumed that there is comfortability with the basics of programming or in this case Python programming. Thanks!
I would venture to say that generators can make your code more efficient, easier to maintain, and handle memory properly in your programming. Interestingly enough, I didn’t start using Python generators heavily until after college in my software engineering career. Up until that point, I used to load mostly everything in memory not caring about efficiency and performance (we were all beginners at some point 😂). Now, the tide has turned and I have seen the glory of the usage of generators.
via GIPHY
Let’s assume you have a file named
dummy.txt
and you want to read in and print what’s in the file. There are two ways to tackle this problem: You could read the whole file into memory or you could read line-by-line.
# dummy.txt contains the following lines
# Hello, this is dummy.txt
# I want to show you that generators are the way to go!
# You do not have to use generators all the time
# But they sure do help when you need them
# you load the whole file into memory
with open("dummy.txt", "r+") as txtfile:
lines_in_file = txtfile.readlines()
print(lines_in_file)
>>> ['Hello, this is dummy.txt\n',
'I want to show you that generators are the way to go!\n',
'You do not have to use generators all the time\n',
'But they sure do help when you need them\n']
# you only read/load in memory a certain number of bytes
with open("dummy.txt", "r+") as txtfile2:
line_in_file = txtfile2.readline(10)
print(line_in_file)
>>> Hello, thi
Looking at this example using the
readlines()
function, you load ALL the contents of the .txt file into memory via a variable lines_in_file
. An alternative approach loads a subset into the variable line_in_file
using readline(size)
. The size parameter is the number of bytes to be read and if 0, reads until the newline character is approached.While
dummy.txt
was only 4 lines of text, what if you had a text file and it was 100,000’s of lines that you had to parse and then do more computation on? Or what about a data structure you are working with and you don’t know the size? Do you want to load all of it into memory?!?! While you could use readlines()
, it is safer and more resourceful to use readline(size)
as you are only reading either a full line in the file or piece of the line in the file.How does this relate to generators? Well, generators operate in a similar vein as
readline(size)
. They evaluate the value as necessary (in our case, necessary is the size parameter or reaching a terminating new-line character) and not all at once like readlines()
.Note: both readline(size) and readlines() are not generators. They are used for example purposes.
Generators are a way of returning values as you need them vs. all at once (if you have a relatively large data structure). If generators could talk they would say, “hey dude, I’ll give you what you need as you need it, but not everything at one time”. Sometimes it isn’t efficient to load all the contents of a data structure into a variable (one especially of unknown length), as you only have so much memory on your computer. Usually, it’s best to conserve your computational resources when you can, and generators can help a ton in that area. Let’s look at implementation to get a fuller picture 😃.
# generator function
def rand_statements():
yield "this is the first statement"
print("first statement has passed")
yield "this is the second statement"
#create generator object from function
rand_genobject = rand_statements()
l1 = [1,2]
for number in l1:
print(number)
print(next(rand_genobject))
>>> 1
>>> this is the first statement
>>> 2
>>> first statement has passed
>>> this is the second statement
In the for loop above, the value in list
l1
is printed first and then the generator object is called to print the value from the generator function. Notice after the last value in the l1
is printed (the number 2 in this case), the value from the generator object doesn’t start at the beginning of the function. It kinda starts in the middle right after the first yield
statement.The idea when creating your generator function is the usage of
yield
. yield
basically means “stop here and send what value is passed from me to the caller (which in this case is rand_genobject
inside the for loop), and next time I’m called continue the function starting with the next line in code (which would be the print statement in rand_statements()
)”. This idea is very different than a return
statement in a function. When a normal function is called more than once, the return statement starts execution at the beginning of the function. The generator function starts from where it left off, similar to when you yield at a yield sign, the function keeps going giving you want you need when you need it (i.e. if you have another yield statement, the generator object will evaluate that next value). This is also called saving/accessing the state of the function when you re-enter.The
next()
that is called on the generator object is saying “hey this is the value that was yielded my guy”.For my TLDR readers and others who would appreciate a summary on generators:
At a high level, that’s it! It’s not necessary to use generators for every use case imaginable in your Python programming, but it doesn’t hurt to make your programming more efficient and trying it out! My goal with this blog post was to give a general working knowledge and get the wheels spinning in your head (hopefully I did that 🤞🏾).
Until next time! ✌🏾