While this post seeks to break down the concept of a generator, it is assumed that there is comfortability with the basics of programming or in this case Python programming. Thanks!
and you want to read in and print what’s in the file. There are two ways to tackle this problem: You could read the whole file into memory or you could read line-by-line.
# dummy.txt contains the following lines # Hello, this is dummy.txt # I want to show you that generators are the way to go! # You do not have to use generators all the time # But they sure do help when you need them # you load the whole file into memory with open("dummy.txt", "r+") as txtfile: lines_in_file = txtfile.readlines() print(lines_in_file) >>> ['Hello, this is dummy.txt\n', 'I want to show you that generators are the way to go!\n', 'You do not have to use generators all the time\n', 'But they sure do help when you need them\n'] # you only read/load in memory a certain number of bytes with open("dummy.txt", "r+") as txtfile2: line_in_file = txtfile2.readline(10) print(line_in_file) >>> Hello, thi
function, you load ALL the contents of the .txt file into memory via a variable
. An alternative approach loads a subset into the variable
. The size parameter is the number of bytes to be read and if 0, reads until the newline character is approached.
was only 4 lines of text, what if you had a text file and it was 100,000’s of lines that you had to parse and then do more computation on? Or what about a data structure you are working with and you don’t know the size? Do you want to load all of it into memory?!?! While you could use
, it is safer and more resourceful to use
as you are only reading either a full line in the file or piece of the line in the file.
. They evaluate the value as necessary (in our case, necessary is the size parameter or reaching a terminating new-line character) and not all at once like
Note: both readline(size) and readlines() are not generators. They are used for example purposes.
# generator function def rand_statements(): yield "this is the first statement" print("first statement has passed") yield "this is the second statement" #create generator object from function rand_genobject = rand_statements() l1 = [1,2] for number in l1: print(number) print(next(rand_genobject)) >>> 1 >>> this is the first statement >>> 2 >>> first statement has passed >>> this is the second statement
is printed first and then the generator object is called to print the value from the generator function. Notice after the last value in the
is printed (the number 2 in this case), the value from the generator object doesn’t start at the beginning of the function. It kinda starts in the middle right after the first
basically means “stop here and send what value is passed from me to the caller (which in this case is
inside the for loop), and next time I’m called continue the function starting with the next line in code (which would be the print statement in
)”. This idea is very different than a
statement in a function. When a normal function is called more than once, the return statement starts execution at the beginning of the function. The generator function starts from where it left off, similar to when you yield at a yield sign, the function keeps going giving you want you need when you need it (i.e. if you have another yield statement, the generator object will evaluate that next value). This is also called saving/accessing the state of the function when you re-enter.
that is called on the generator object is saying “hey this is the value that was yielded my guy”.