Topics Covered
Helpful links
- Full Advanced Python Course Link
- Gitlab Code Page
- Additional Help at Python.org
- Google Colab: The Easiest Way to Code
Generators are a unique sort of function that will return what is called a lazy iterator. Lazy iterators are items that you may loop over like a list. But, unlike lists, lazy iterators do not save their contents to your system’s memory. Maybe have a complicated function that needs to remember where it left off each time it’s called. Or, Have you ever had to analysis data so huge that your FPS dropped to 0? This is where Generators and Python yield statement come into play. If you’re wanting to optimize your data functions, than you are in the right place. Do you want a better understanding of iterators in Python. Then take a look at our posts Python For Loops and Python While Loops at Introduction to Python.
What is a Generator?
Generators allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop. But, will not save their contents to your system’s memory. Definition comes from Python.org with some additional reading.
Course Objectives:
- Learn to use Generators on large files
- Make an infinite loop
- Test either a generator or interator should be used
- More on the yield statement
- Learn a few generator methods
- Cheat Sheet
Learn to use Generators on large files
With a better understanding of generators. Lets take a look at the first example below. If we wanted to find out the number of lines in this Python file. Then we would interate over the file and find each row and print the total.
csvFile = open("sample_data/california_housing_test.csv")
rowCount = 0
for row in csvFile:
rowCount += 1
print(f"Row count is {rowCount}")
Row count is 3001
But what if we were needing to iterate over a larger file that has an unditermend amount of rows. If you try to iterate over large files you will see that is takes time for it to run. Sometimes even cause your computer to hang up. So how do we iterate over a large file then? The answer is generators using the yield statement.
def csvReader(file_name):
thefile = open(file_name)
result = thefile.read()
return result
csvFile = csvReader("sample_data/california_housing_train.csv")
rowCount = 0
for row in csvFile:
rowCount += 1
print(f"Row count is {rowCount}")
Row count is 1706430
The example below uses the generator and yield statement to accomplish the same iteration as above. First you will notice that the generator has to be in a function for the yield statement to work. Additionally the amount of memory used is a lot less. This is because as soon as the function gets the result it gives the result and continues. As with a return statement the code would iterate over the file, store the date, then give the results and stop.
def csvReader(file_name):
for row in open(file_name, "r"):
yield row
csvFile = csvReader("sample_data/california_housing_train.csv")
rowCount = 0
for row in csvFile:
rowCount += 1
print(f"Row count is {rowCount}")
Row count is 17001
In the same way as you can do other comprehensions, so can you with a generator. This can keep you from having to create a function to iterate over a file. The example function above can also be writen as one line.
csv_gen = (row for row in open("sample_data/california_housing_train.csv"))
Make an infinite loop
Lets take a break from iterating files and dig a little more into generators. Lets create a function generator that will for sure run forever unless you stop it. With that being said lets see what happens with the return statement and then the yield statement.
def function_infinite_sequence():
num = 0
while True:
return num
num += 1
print(type(function_infinite_sequence()))
for i in function_infinite_sequence():
print(i)
<class 'int'="">
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
6
7 print(type(function_infinite_sequence()))
----> 8 for i in function_infinite_sequence():
9 print(i)
TypeError: 'int' object is not iterable
def generator_infinite_sequence():
num = 0
while True:
yield num
num +=1
print(type(generator_infinite_sequence()))
for i in generator_infinite_sequence():
print(i)
What’s the difference between return and yield
From the examples above the first one is a function with a return statement. As well as the second one is a function with a yield statement. Notice the placement of the return and yield statements are in the same place. Though the return function just returns an integar and the yield creates a generator. You also can’t use the for loop with the return statement because the return exits the loop. You can also use next() to step through the generator iteration for testing if needed with the yield statement. This will give you whatever value is in the yield statement at that time.
step = generator_infinite_sequence()
next(step)
0
next(step)
1
Test whether a generator or interator should be used
Sometimes when you are creating iterators you can choose between speed and memory usage. In the example below we have a list comprhesion and a generator comprehesion. One will create a list and the other will be a generator object. With that information you can check which one is bigger in size and its speed. This can be done by importing sys and cProfile to check this vaulable information.
nums_squared_lc = [num**2 for num in range(5)]
nums_squared_gc = (num**2 for num in range(5))
print(nums_squared_lc)
print(nums_squared_gc)
[0, 1, 4, 9, 16]
at 0x7f8552809dd0>
import sys
import cProfile
nums_squared_lc = [num**2 for num in range(10000)]
nums_squared_gc = (num**2 for num in range(10000))
print(sys.getsizeof(nums_squared_lc))
print(sys.getsizeof(nums_squared_gc))
87632
128
print(cProfile.run('sum([i * 2 for i in range(10000)])'))
print(cProfile.run('sum((i * 2 for i in range(10000)))'))
5 function calls in 0.002 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 0.001 0.001 :1()
1 0.000 0.000 0.002 0.002 :1()
1 0.000 0.000 0.002 0.002 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
None
10005 function calls in 0.003 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10001 0.002 0.000 0.002 0.000 :1()
1 0.000 0.000 0.003 0.003 :1()
1 0.000 0.000 0.003 0.003 {built-in method builtins.exec}
1 0.001 0.001 0.003 0.003 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
None
More on the yield statement
When using the yield statement you are basicly controlling the iteration steps in the generator. The generator will run the yield statement and suspend the iteration to return the yield value. Once the yield value comes back the iteration will contiune unlike a return statement. When a generator suspends, the state of the function is saved. This includes any variables local to the generator, the location in your code, and any exception handling. Though once you have iterated over all values your generator will return a StopIteration exception.
def youryield():
yieldline = "This will print the first string"
yield yieldline
yieldline = "This will print the second string"
yield yieldline
twoyields = youryield()
print(next(twoyields))
This will print the first string
print(next(twoyields))
This will print the second string
print(next(twoyields))
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
in ()
----> 1 print(next(twoyields))
StopIteration:
Learn a few generator methods
Text