Slicing an iterator

Posted on Sun 22 September 2019 in Python

What does slicing mean ?

In Python, slicing an iterable refers to getting a piece(subset) of the iterable. We can slice sequences(datatype that are ordered and could be indexed) like lists, tuples, string etc. However, we cannot slice collections like dicts, sets, because they are not sequences.

To slice a sequence we use the [] (slice) operator.

Slicing a list produces another list

>>> animals = ['Lion', 'Elephant', 'Deer', 'Zebra']
>>> animals[1:3]
['Elephant', 'Deer']
>>> animals_subset = animals[1:3]
>>> animals_subset
['Elephant', 'Deer']
>>> animals = ['Lion', 'Elephant', 'Deer', 'Zebra']
>>> animals[1:3]
['Elephant', 'Deer']
>>> animals_subset = animals[1:3]
>>> animals_subset
['Elephant', 'Deer']

Slicing a string produces another string

>>> name = "Guido van Rossum"
>>> first_name = name[:5]
>>> first_name is not name
True

Slicing a set throws TypError

>>> engineers = {'John', 'Jane', 'Jack', 'Janice'}
>>> engineers[1:4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

Slicing an iterator

Many a times you have to get a piece of an iterator. We cannot use the slice operator on an iterator.

>>> an_iterator = enumerate(['Nitin', 'George', 'Cherian'])
>>> an_iterator
<enumerate object at 0x7f36c7922798>
>>> an_iterator[1:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'enumerate' object is not subscriptable

In the above code, enumerate() returns an enumerate object which is an iterator. An iterator supports only __next__ and __iter__ special methods and nothing else. We get TypeError if we try to use the slice operator on it.

Then, how do we slice an iterator? islice function of itertools module to the rescue.

itertools.islice can take two forms:

itertools.islice(iterable, stop) - Yields items from a slice of iterable, similar to sequence[:stop], except that the iterable can be any iterable including iterators and not just sequences.

itertools.islice(iterable, start, stop, step=1) - Yields items from a slice of iterable, similar to sequence[start:stop:step], except that the iterable can be any iterable including iterators and not just sequences.

Now, let's try to apply the islice object on the enumerate object.

>>> from itertools import islice
>>> an_iterator = enumerate(['Nitin', 'George', 'Cherian'])
>>> name_slice = islice(an_iterator, 1, None)
>>> name_slice
<itertools.islice object at 0x7f36c7918b88>
>>> list(name_slice)
[(1, 'George'), (2, 'Cherian')]

In the above code, we try to slice the enumerate object using islice, where start=1, stop=None. islice returns an iterator over the slice of the enumerate object. We then use the list constructor to get the individual elements of the slice.

Another use-case of slicing an iterator

A more real-world use-case would be to get the first n lines of a large debug log file.

Rather than reading the entire log file into a list and then filtering the first n lines, a more efficient way would be to apply the islice function on the file object like so:

>>> from itertools import islice
>>> with open("debuglog") as f:
...   first_10_lines = list(islice(f, 10))
...   print(first_10_lines)
... 
['Oct 30 09:00:01 Lanner1515-148-157 syslog-ng[786]: Configuration reload request received, reloading configuration;\n', 'Oct 30 09:00:03 Lanner1515-148-157 systemd[1]: Removed slice User Slice of root.\n', 'Oct 30 09:00:03 Lanner1515-148-157 systemd[1]: Stopping User Slice of root.\n', 'Oct 30 09:00:03 Lanner1515-148-157 ssConfigClient: 4933:CONFIG:INFO [CibStatus.cpp:pollCIBStatus:442] Changing polling threshold to 3 due to status Inactive reason Controller container Initialization is in progress\n', "Oct 30 09:00:11 Lanner1515-148-157 ssConfigClient: Last message '4933:CONFIG:INFO [Ci' repeated 1 times, suppressed by syslog-ng on Lanner1515-148-157\n", 'Oct 30 09:00:11 Lanner1515-148-157 alarm_man: Application starting, Version:  18.2.1.6, Built on Wed Oct 24 09:30:38 UTC 2018 (pid=16149)\n', 'Oct 30 09:00:11 Lanner1515-148-157 alarm_man: Application complete.\n', 'Oct 30 09:00:13 Lanner1515-148-157 ssConfigClient: 4933:CONFIG:INFO [CibStatus.cpp:pollCIBStatus:442] Changing polling threshold to 3 due to status Inactive reason Controller container Initialization is in progress\n', "Oct 30 09:00:54 Lanner1515-148-157 ssConfigClient: Last message '4933:CONFIG:INFO [Ci' repeated 8 times, suppressed by syslog-ng on Lanner1515-148-157\n", 'Oct 30 09:00:54 Lanner1515-148-157 alarm_man: Application starting, Version:  18.2.1.6, Built on Wed Oct 24 09:30:38 UTC 2018 (pid=17430)\n']

In the above code, the file object f is an iterator. We apply the islice function on it to get the first 10 lines of the file and then use the list constructor to produce the list of first 10 lines.

Conclusion

More often than not, there arises a situation in Python, where we need to get a slice of an iterator. In this post, I have introduced the islice function of the itertools module, which could be used for slicing an iterator. It could be used to slice any iterable and not just iterators. I encourage you to read, the islice docs for more information.

Try to find out the portions in your code base, where you could have used the islice function for more efficient slicing.

That's it readers for this week, until next time! Happy coding Python!