PEP 234 – Iterators
PEP 234 – 迭代器

Author: 作者:: Ka-Ping Yee <ping at zesty.ca>, Guido van Rossum <guido at python.org>
Status: 状态：: Final 最终
Type: 类型:: Standards Track 标准跟踪
Created: 创建于: 30-Jan-2001 2001 年 1 月 30 日
Python-Version:: 2.1
Post-History:: 30-Apr-2001 2001 年 4 月 30 日

Table of Contents 目录

Abstract 摘要
C API Specification C API 规范
Python API Specification
Python API 规范
Dictionary Iterators 字典迭代器
File Iterators 文件迭代器
Rationale 原理
Resolved Issues 已解决的问题
Mailing Lists 邮件列表
Copyright 版权所有

Abstract 摘要

This document proposes an iteration interface that objects can provide to control the behaviour of for loops. Looping is customized by providing a method that produces an iterator object. The iterator provides a get next value operation that produces the next item in the sequence each time it is called, raising an exception when no more items are available.
本文档提出了一种迭代接口，对象可以通过提供该接口来控制 for 循环的行为。通过提供一个生成迭代器对象的方法来自定义循环。迭代器提供了一个获取下一个值的操作，每次调用时生成序列中的下一个项，当没有更多项可用时抛出异常。

In addition, specific iterators over the keys of a dictionary and over the lines of a file are proposed, and a proposal is made to allow spelling dict.has_key(key) as key in dict.
此外，还提出了针对字典键和文件行的特定迭代器，并建议允许将 dict.has_key(key) 拼写为 key in dict 。

Note: this is an almost complete rewrite of this PEP by the second author, describing the actual implementation checked into the trunk of the Python 2.2 CVS tree. It is still open for discussion. Some of the more esoteric proposals in the original version of this PEP have been withdrawn for now; these may be the subject of a separate PEP in the future.
注意：这是由第二作者对本 PEP 的几乎完全重写，描述了实际实现并已检入 Python 2.2 CVS 树的主干。它仍然开放讨论。原版 PEP 中一些较为深奥的提议目前已被撤回；这些可能会成为未来单独 PEP 的主题。

C API Specification C API 规范

A new exception is defined, StopIteration, which can be used to signal the end of an iteration.
定义了一个新的异常， StopIteration ，可用于表示迭代的结束。

A new slot named tp_iter for requesting an iterator is added to the type object structure. This should be a function of one PyObject * argument returning a PyObject *, or NULL. To use this slot, a new C API function PyObject_GetIter() is added, with the same signature as the tp_iter slot function.
在类型对象结构中添加了一个名为 `tp_iter` 的新槽，用于请求迭代器。这应该是一个接受一个 `PyObject*` 参数并返回一个 `PyObject*` 或 `NULL` 的函数。为了使用这个槽，添加了一个新的 C API 函数 `PyObject_GetIter`，其签名与 `tp_iter` 槽函数相同。

Another new slot, named tp_iternext, is added to the type structure, for obtaining the next value in the iteration. To use this slot, a new C API function PyIter_Next() is added. The signature for both the slot and the API function is as follows, although the NULL return conditions differ: the argument is a PyObject * and so is the return value. When the return value is non-NULL, it is the next value in the iteration. When it is NULL, then for the tp_iternext slot there are three possibilities:
另一个新的槽位，名为 tp_iternext ，被添加到类型结构中，用于获取迭代中的下一个值。为了使用这个槽位，添加了一个新的 C API 函数 PyIter_Next() 。槽位和 API 函数的签名如下，尽管 NULL 的返回条件不同：参数是一个 PyObject * ，返回值也是 PyObject * 。当返回值非 NULL 时，它是迭代中的下一个值。当它是 NULL 时，对于 tp_iternext slot 有三种可能性：

No exception is set; this implies the end of the iteration.
没有设置异常；这意味着迭代结束。
The StopIteration exception (or a derived exception class) is set; this implies the end of the iteration.
StopIteration 异常（或派生的异常类）被设置；这意味着迭代的结束。
Some other exception is set; this means that an error occurred that should be propagated normally.
其他异常被设置；这意味着发生了应该正常传播的错误。

The higher-level PyIter_Next() function clears the StopIteration exception (or derived exception) when it occurs, so its NULL return conditions are simpler:
更高层次的 PyIter_Next() 函数在 StopIteration 异常（或派生异常）发生时清除它，因此其 NULL 返回条件更简单：

No exception is set; this means iteration has ended.
没有设置异常；这意味着迭代已经结束。
Some exception is set; this means an error occurred, and should be propagated normally.
某些异常被设置；这意味着发生了错误，应该正常传播。

Iterators implemented in C should not implement a next() method with similar semantics as the tp_iternext slot! When the type’s dictionary is initialized (by PyType_Ready()), the presence of a tp_iternext slot causes a method next() wrapping that slot to be added to the type’s tp_dict. (Exception: if the type doesn’t use PyObject_GenericGetAttr() to access instance attributes, the next() method in the type’s tp_dict may not be seen.) (Due to a misunderstanding in the original text of this PEP, in Python 2.2, all iterator types implemented a next() method that was overridden by the wrapper; this has been fixed in Python 2.3.)
在 C 语言中实现的迭代器不应实现一个语义与 tp_iternext 槽位相似的 next() 方法！当类型的字典被初始化时（通过 PyType_Ready() ）， tp_iternext 槽位的存在会导致一个包装该槽位的方法 next() 被添加到类型的 tp_dict 中。（例外：如果类型不使用 PyObject_GenericGetAttr() 来访问实例属性，类型 tp_dict 中的 next() 方法可能不会被看到。）（由于本 PEP 原始文本的误解，在 Python 2.2 中，所有迭代器类型都实现了一个被包装器覆盖的 next() 方法；这在 Python 2.3 中已修复。）

To ensure binary backwards compatibility, a new flag Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags field, and to the default flags macro. This flag must be tested before accessing the tp_iter or tp_iternext slots. The macro PyIter_Check() tests whether an object has the appropriate flag set and has a non-NULL tp_iternext slot. There is no such macro for the tp_iter slot (since the only place where this slot is referenced should be PyObject_GetIter(), and this can check for the Py_TPFLAGS_HAVE_ITER flag directly).
为确保二进制向后兼容性，在 tp_flags 字段的标志集中以及默认标志宏中添加了一个新标志 Py_TPFLAGS_HAVE_ITER 。在访问 tp_iter 或 tp_iternext 槽之前，必须测试此标志。宏 PyIter_Check() 测试对象是否设置了适当的标志并且具有非 NULL 的 tp_iternext 槽。对于 tp_iter 槽没有这样的宏（因为唯一引用此槽的地方应该是 PyObject_GetIter() ，并且这可以直接检查 Py_TPFLAGS_HAVE_ITER 标志）。

(Note: the tp_iter slot can be present on any object; the tp_iternext slot should only be present on objects that act as iterators.)
（注意： tp_iter 槽可以存在于任何对象上； tp_iternext 槽应仅存在于作为迭代器的对象上。）

For backwards compatibility, the PyObject_GetIter() function implements fallback semantics when its argument is a sequence that does not implement a tp_iter function: a lightweight sequence iterator object is constructed in that case which iterates over the items of the sequence in the natural order.
为了向后兼容，当 PyObject_GetIter() 函数的参数是一个未实现 tp_iter 函数的序列时，它会实现回退语义：在这种情况下，会构造一个轻量级的序列迭代器对象，该对象按自然顺序遍历序列中的项。

The Python bytecode generated for for loops is changed to use new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol rather than the sequence protocol to get the next value for the loop variable. This makes it possible to use a for loop to loop over non-sequence objects that support the tp_iter slot. Other places where the interpreter loops over the values of a sequence should also be changed to use iterators.
为 ` for ` 循环生成的 Python 字节码被更改为使用新的操作码 ` GET_ITER ` 和 ` FOR_ITER `，这些操作码使用迭代器协议而不是序列协议来获取循环变量的下一个值。这使得可以使用 ` for ` 循环来遍历支持 ` tp_iter ` 槽的非序列对象。解释器在其他地方遍历序列值的地方也应更改为使用迭代器。

Iterators ought to implement the tp_iter slot as returning a reference to themselves; this is needed to make it possible to use an iterator (as opposed to a sequence) in a for loop.
迭代器应该实现 `__iter__` 槽，返回对自身的引用；这是为了使迭代器（而不是序列）能够在 `for` 循环中使用。

Iterator implementations (in C or in Python) should guarantee that once the iterator has signalled its exhaustion, subsequent calls to tp_iternext or to the next() method will continue to do so. It is not specified whether an iterator should enter the exhausted state when an exception (other than StopIteration) is raised. Note that Python cannot guarantee that user-defined or 3rd party iterators implement this requirement correctly.
迭代器实现（无论是用 C 语言还是 Python）应保证一旦迭代器发出耗尽信号，后续对`__next__`或`next`方法的调用将继续如此。未规定迭代器在引发异常（除`StopIteration`外）时是否应进入耗尽状态。注意，Python 无法保证用户定义或第三方迭代器正确实现此要求。

Python API Specification

The StopIteration exception is made visible as one of the standard exceptions. It is derived from Exception.

A new built-in function is defined, iter(), which can be called in two ways:

iter(obj) calls PyObject_GetIter(obj).
iter(callable, sentinel) returns a special kind of iterator that calls the callable to produce a new value, and compares the return value to the sentinel value. If the return value equals the sentinel, this signals the end of the iteration and StopIteration is raised rather than returning normal; if the return value does not equal the sentinel, it is returned as the next value from the iterator. If the callable raises an exception, this is propagated normally; in particular, the function is allowed to raise StopIteration as an alternative way to end the iteration. (This functionality is available from the C API as PyCallIter_New(callable, sentinel).)

Iterator objects returned by either form of iter() have a next() method. This method either returns the next value in the iteration, or raises StopIteration (or a derived exception class) to signal the end of the iteration. Any other exception should be considered to signify an error and should be propagated normally, not taken to mean the end of the iteration.

Classes can define how they are iterated over by defining an __iter__() method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self.

The two methods correspond to two distinct protocols:

An object can be iterated over with for if it implements __iter__() or __getitem__().
An object can function as an iterator if it implements next().

Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences; in particular so that code receiving an iterator can use a for-loop over the iterator.

Dictionary Iterators

Dictionaries implement a sq_contains slot that implements the same test as the has_key() method. This means that we can write
```
if k in dict: ...
```
which is equivalent to
```
if dict.has_key(k): ...
```
Dictionaries implement a tp_iter slot that returns an efficient iterator that iterates over the keys of the dictionary. During such an iteration, the dictionary should not be modified, except that setting the value for an existing key is allowed (deletions or additions are not, nor is the update() method). This means that we can write
```
for k in dict: ...
```
which is equivalent to, but much faster than
```
for k in dict.keys(): ...
```
as long as the restriction on modifications to the dictionary (either by the loop or by another thread) are not violated.
Add methods to dictionaries that return different kinds of iterators explicitly:
```
for key in dict.iterkeys(): ...

for value in dict.itervalues(): ...

for key, value in dict.iteritems(): ...
```
This means that for x in dict is shorthand for for x in dict.iterkeys().

Other mappings, if they support iterators at all, should also iterate over the keys. However, this should not be taken as an absolute rule; specific applications may have different requirements.

File Iterators

The following proposal is useful because it provides us with a good answer to the complaint that the common idiom to iterate over the lines of a file is ugly and slow.

Files implement a tp_iter slot that is equivalent to iter(f.readline, ""). This means that we can write

for line in file:
    ...

as a shorthand for

for line in iter(file.readline, ""):
    ...

which is equivalent to, but faster than

while 1:
    line = file.readline()
    if not line:
        break
    ...

This also shows that some iterators are destructive: they consume all the values and a second iterator cannot easily be created that iterates independently over the same values. You could open the file for a second time, or seek() to the beginning, but these solutions don’t work for all file types, e.g. they don’t work when the open file object really represents a pipe or a stream socket.

Because the file iterator uses an internal buffer, mixing this with other file operations (e.g. file.readline()) doesn’t work right. Also, the following code:

for line in file:
    if line == "\n":
        break
for line in file:
   print line,

doesn’t work as you might expect, because the iterator created by the second for-loop doesn’t take the buffer read-ahead by the first for-loop into account. A correct way to write this is:

it = iter(file)
for line in it:
    if line == "\n":
        break
for line in it:
    print line,

(The rationale for these restrictions are that for line in file ought to become the recommended, standard way to iterate over the lines of a file, and this should be as fast as can be. The iterator version is considerable faster than calling readline(), due to the internal buffer in the iterator.)

Rationale

If all the parts of the proposal are included, this addresses many concerns in a consistent and flexible fashion. Among its chief virtues are the following four – no, five – no, six – points:

It provides an extensible iterator interface.
It allows performance enhancements to list iteration.
It allows big performance enhancements to dictionary iteration.
It allows one to provide an interface for just iteration without pretending to provide random access to elements.
It is backward-compatible with all existing user-defined classes and extension objects that emulate sequences and mappings, even mappings that only implement a subset of {__getitem__, keys, values, items}.
It makes code iterating over non-sequence collections more concise and readable.

Resolved Issues

The following topics have been decided by consensus or BDFL pronouncement.

Two alternative spellings for next() have been proposed but rejected: __next__(), because it corresponds to a type object slot (tp_iternext); and __call__(), because this is the only operation.
Arguments against __next__(): while many iterators are used in for loops, it is expected that user code will also call next() directly, so having to write __next__() is ugly; also, a possible extension of the protocol would be to allow for prev(), current() and reset() operations; surely we don’t want to use __prev__(), __current__(), __reset__().

Arguments against __call__() (the original proposal): taken out of context, x() is not very readable, while x.next() is clear; there’s a danger that every special-purpose object wants to use __call__() for its most common operation, causing more confusion than clarity.

(In retrospect, it might have been better to go for __next__() and have a new built-in, next(it), which calls it.__next__(). But alas, it’s too late; this has been deployed in Python 2.2 since December 2001.)
Some folks have requested the ability to restart an iterator. This should be dealt with by calling iter() on a sequence repeatedly, not by the iterator protocol itself. (See also requested extensions below.)
It has been questioned whether an exception to signal the end of the iteration isn’t too expensive. Several alternatives for the StopIteration exception have been proposed: a special value End to signal the end, a function end() to test whether the iterator is finished, even reusing the IndexError exception.
- A special value has the problem that if a sequence ever contains that special value, a loop over that sequence will end prematurely without any warning. If the experience with null-terminated C strings hasn’t taught us the problems this can cause, imagine the trouble a Python introspection tool would have iterating over a list of all built-in names, assuming that the special End value was a built-in name!
- Calling an end() function would require two calls per iteration. Two calls is much more expensive than one call plus a test for an exception. Especially the time-critical for loop can test very cheaply for an exception.
- Reusing IndexError can cause confusion because it can be a genuine error, which would be masked by ending the loop prematurely.
Some have asked for a standard iterator type. Presumably all iterators would have to be derived from this type. But this is not the Python way: dictionaries are mappings because they support __getitem__() and a handful other operations, not because they are derived from an abstract mapping type.
Regarding if key in dict: there is no doubt that the dict.has_key(x) interpretation of x in dict is by far the most useful interpretation, probably the only useful one. There has been resistance against this because x in list checks whether x is present among the values, while the proposal makes x in dict check whether x is present among the keys. Given that the symmetry between lists and dictionaries is very weak, this argument does not have much weight.
The name iter() is an abbreviation. Alternatives proposed include iterate(), traverse(), but these appear too long. Python has a history of using abbrs for common builtins, e.g. repr(), str(), len().
Resolution: iter() it is.
Using the same name for two different operations (getting an iterator from an object and making an iterator for a function with a sentinel value) is somewhat ugly. I haven’t seen a better name for the second operation though, and since they both return an iterator, it’s easy to remember.
Resolution: the builtin iter() takes an optional argument, which is the sentinel to look for.
Once a particular iterator object has raised StopIteration, will it also raise StopIteration on all subsequent next() calls? Some say that it would be useful to require this, others say that it is useful to leave this open to individual iterators. Note that this may require an additional state bit for some iterator implementations (e.g. function-wrapping iterators).
Resolution: once StopIteration is raised, calling it.next() continues to raise StopIteration.

Note: this was in fact not implemented in Python 2.2; there are many cases where an iterator’s next() method can raise StopIteration on one call but not on the next. This has been remedied in Python 2.3.
It has been proposed that a file object should be its own iterator, with a next() method returning the next line. This has certain advantages, and makes it even clearer that this iterator is destructive. The disadvantage is that this would make it even more painful to implement the “sticky StopIteration” feature proposed in the previous bullet.
Resolution: tentatively rejected (though there are still people arguing for this).
Some folks have requested extensions of the iterator protocol, e.g. prev() to get the previous item, current() to get the current item again, finished() to test whether the iterator is finished, and maybe even others, like rewind(), __len__(), position().
While some of these are useful, many of these cannot easily be implemented for all iterator types without adding arbitrary buffering, and sometimes they can’t be implemented at all (or not reasonably). E.g. anything to do with reversing directions can’t be done when iterating over a file or function. Maybe a separate PEP can be drafted to standardize the names for such operations when they are implementable.

Resolution: rejected.
There has been a long discussion about whether
```
for x in dict: ...
```
should assign x the successive keys, values, or items of the dictionary. The symmetry between if x in y and for x in y suggests that it should iterate over keys. This symmetry has been observed by many independently and has even been used to “explain” one using the other. This is because for sequences, if x in y iterates over y comparing the iterated values to x. If we adopt both of the above proposals, this will also hold for dictionaries.

The argument against making for x in dict iterate over the keys comes mostly from a practicality point of view: scans of the standard library show that there are about as many uses of for x in dict.items() as there are of for x in dict.keys(), with the items() version having a small majority. Presumably many of the loops using keys() use the corresponding value anyway, by writing dict[x], so (the argument goes) by making both the key and value available, we could support the largest number of cases. While this is true, I (Guido) find the correspondence between for x in dict and if x in dict too compelling to break, and there’s not much overhead in having to write dict[x] to explicitly get the value.

For fast iteration over items, use for key, value in dict.iteritems(). I’ve timed the difference between
```
for key in dict: dict[key]
```
and
```
for key, value in dict.iteritems(): pass
```
and found that the latter is only about 7% faster.

Resolution: By BDFL pronouncement, for x in dict iterates over the keys, and dictionaries have iteritems(), iterkeys(), and itervalues() to return the different flavors of dictionary iterators.

Mailing Lists

The iterator protocol has been discussed extensively in a mailing list on SourceForge:

http://lists.sourceforge.net/lists/listinfo/python-iterators

Initially, some of the discussion was carried out at Yahoo; archives are still accessible:

http://groups.yahoo.com/group/python-iter

Copyright

This document is in the public domain.

Source: https://github.com/python/peps/blob/main/peps/pep-0234.rst

Last modified: 2025-02-01 08:55:40 GMT

PEP 234 – IteratorsPEP 234 – 迭代器

PEP 234 – Iterators
PEP 234 – 迭代器