ICS 33 Spring 2024 2024 年春季 ICS 33
Notes and Examples: The Python Data Model
笔记和示例:Python 数据模型
Background 背景
As we've explored some features of Python that you may not have encountered before, a recurring theme has emerged. For the most part, the functions and classes built into Python and its standard library don't possess special abilities that aren't available within the code you write. Objects of pre-existing types can be context managers, but objects of your types can, too. Objects of many pre-existing types can be iterated using for
loops, but so can objects of your types. You have to be aware of the mechanisms that make these things possible, but the important thing is that they are possible; most mechanisms the underlie Python are exposed and documented, which means your code can hook into them just as well as built-in code can. This is among the hallmarks of a well-designed programming language.
当我们探索一些您以前可能没有遇到过的 Python 功能时,一个重要主题浮现了。在大多数情况下,Python 及其标准库中内置的函数和类并不具备您在编写的代码中无法获得的特殊能力。现有类型的对象可以是上下文管理器,但您自己定义的类型的对象也可以是。许多现有类型的对象可以使用 for
循环进行迭代,但您自己定义的类型的对象也可以。您必须了解使这些事情成为可能的机制,但重要的是它们是可能的;Python 的大多数基础机制都是公开且有文档记录的,这意味着您的代码可以像内置代码一样连接到它们。这是设计良好的编程语言的特征之一。
Collectively, the mechanisms that govern how Python objects interact with each other within a running Python program are known as the Python data model. Both in this course and your prior coursework, you'll already have seen some parts of this model, even if you've never heard it put into these terms before. Here are a few familiar ideas from the Python data model.
总的来说,控制 Python 对象在运行中的 Python 程序中如何相互交互的机制被称为 Python 数据模型。在本课程和您之前的课程中,您可能已经看到了这个模型的一些部分,即使您以前从未听说过这些术语。以下是 Python 数据模型中的一些熟悉概念。
__repr__
method.__repr__
方法确定的。__init__
method. Any arguments passed to its constructor are forwarded to its __init__
method.__init__
方法来初始化。传递给其构造函数的任何参数都将转发到其 __init__
方法。with
statement is evaluated, its result becomes a context manager, at which point its __enter__
method is called. When the with
statement is exited, its __exit__
method is called.with
语句的上下文表达式后,其结果变为上下文管理器,此时将调用其 __enter__
方法。当退出 with
语句时,将调用其 __exit__
方法。__iter__
method is called, which returns an iterator. That iterator, in turn, provides a __next__
method that is called to produce each value in the iteration, one at a time.__iter__
方法,该方法返回一个迭代器。迭代器反过来提供一个 __next__
方法,用于逐个生成迭代中的每个值。self
) added before the others to represent the object on which the method was called. (There are some technical details that make this happen, too, and this is also something that you can modify when the need arises. Very few aspects of Python are magical.)self
的参数中)被添加到其他参数之前,以表示调用该方法的对象。(也有一些技术细节会使这种情况发生,并且这也是您在需要时可以修改的内容。Python 的很少一部分是神奇的。)As we've seen before, all of the mechanisms described above rely on what we call protocols, which rely on the presence of one or more dunder methods, such as __init__
, __enter__
, __exit__
, __iter__
, __next__
, and __get__
. Virtually every time we interact with any object for any purpose in Python, at least one dunder method is called behind the scenes, even if we rarely call them ourselves. These dunder methods form the basis of how objects interact; their presence, alongside the fact that their meanings are documented and well-understood by seasoned Python programmers, ensures that we can modify most of what happens during those interactions when our designs require it. We can't change the fact that iterating an object causes __iter__
and __next__
methods to be called, but we can change what happens when they are, which means we can make iteration behave in any way we'd like. We can't change that the with
statement requires __enter__
and __exit__
methods, but, by providing our own __enter__
and __exit__
methods, we can apply the concept of context management to any problem for which it's suited.
正如我们之前所看到的,上述所有机制都依赖于我们所称的协议,这些协议依赖于一个或多个 dunder 方法的存在,比如 __init__
, __enter__
, __exit__
, __iter__
, __next__
和 __get__
。在 Python 中,几乎每次我们与任何对象进行交互以任何目的,至少会在幕后调用一个 dunder 方法,即使我们很少自己调用它们。这些 dunder 方法构成了对象相互作用的基础;它们的存在,以及它们的含义被经验丰富的 Python 程序员记录和理解,确保了我们可以在需要时修改大部分交互过程中发生的事情。我们无法改变迭代对象会导致调用 __iter__
和 __next__
方法的事实,但我们可以改变它们被调用时发生的事情,这意味着我们可以使迭代行为按我们希望的方式进行。我们无法改变 with
语句需要 __enter__
和 __exit__
方法的事实,但是,通过提供我们自己的 __enter__
和 __exit__
方法,我们可以将上下文管理的概念应用于任何适合的问题。
So, the key to improving our ability to design Python programs — writing programs that are "Pythonic," in the sense that using our functions and classes feels just like using the ones that are built in — is understanding as much of the Python data model as we can. We don't need every detail all the time, but every available detail is applicable to a problem we might have. When we solve problems the way Python does, we find that our functions and classes naturally "plug into" the things that are built in, and vice versa. When an entire community of programmers uses a programming language the same way, the community's ability to solve problems rapidly increases, because common problems are solved once and solved well, with the tools used to solve them combining naturally in ways that no one considered originally. We can stand on the shoulders of giants without losing our balance.
因此,提高我们设计 Python 程序的能力的关键在于编写“Pythonic”程序,即使用我们的函数和类感觉就像使用内置函数一样,这意味着我们需要尽可能多地了解 Python 数据模型。我们并不总是需要每一个细节,但每一个可用的细节都适用于我们可能遇到的问题。当我们像 Python 一样解决问题时,我们会发现我们的函数和类自然地“插入”到内置的东西中,反之亦然。当整个程序员社区以相同的方式使用编程语言时,社区解决问题的能力会迅速增加,因为常见问题只需解决一次,而且解决得很好,用于解决问题的工具会自然地结合在一起,这是没有人最初考虑到的。我们可以站在巨人的肩膀上而不失去平衡。
The price to be paid is that we have to learn the details. The payoff, though, is immense, so we'd be well-served to spend the time learning them. And, fortunately, the details rarely change, except to the extent that new details are added; once we know how iterators work, for example, that's how they'll likely continue to work for as long as we use Python. So, we at least want to be aware of what's available and the common threads that tie the features of Python's data model together. We can always look the details up again when we need them, even if we've forgotten some of them by then, but it's a lot harder to look things up when we don't know what we're looking for.
要付出的代价是我们必须学习细节。然而,回报是巨大的,所以我们最好花时间学习它们。幸运的是,细节很少改变,除非添加新的细节;例如,一旦我们知道迭代器是如何工作的,那么它们在我们使用 Python 的整个过程中很可能会继续工作。因此,我们至少希望了解可用的内容以及将 Python 数据模型的特性联系在一起的共同线索。即使到那时我们已经忘记了一些细节,我们仍然可以在需要时再次查找细节,但是当我们不知道自己在寻找什么时,查找事物就会变得更加困难。
So, let's dive in and see what else we can find in the Python data model.
那么,让我们深入了解并看看在 Python 数据模型中还能找到什么。
Lengths 长度
In Python, we say that an object is sized if it can be asked for a length. The usual way to ask is to pass the object as an argument to the built-in len
function. Strings, lists, tuples, ranges, sets, and dictionaries are all examples of sized objects, though not all objects are sized; integers, for example, are not.
在 Python 中,如果一个对象可以被要求长度,我们称其为可大小化。通常询问的方式是将对象作为参数传递给内置的 len
函数。字符串、列表、元组、范围、集合和字典都是可大小化对象的示例,尽管并非所有对象都是可大小化的;例如,整数就不是。
>>> len('Boo')
3
>>> len([1, 2, 3, 4, 5])
5
>>> len(18)
Traceback (most recent call last):
...
TypeError: object of type 'int' has no len()
The MyRange
class we wrote in a previous lecture is a good example of a class whose objects ought to be sized — if MyRange(10)
comprises ten integers, then we could reasonably say its length is 10 — but this feature was missing from our implementation.
我们在之前的讲座中编写的 MyRange
类是一个很好的例子,它的对象应该具有大小 — 如果 MyRange(10)
包含十个整数,那么我们可以合理地说它的长度是 10 — 但这个特性在我们的实现中缺失了。
>>> len(MyRange(10))
Traceback (most recent call last):
...
TypeError: object of type 'MyRange' has no len()
Fortunately, there is a simple protocol by which we can add this feature to a class. All we need to do is write one extra method in our class:
幸运的是,我们可以通过一个简单的协议将这个功能添加到一个类中。我们只需要在我们的类中写一个额外的方法:
__len__(self)
, which returns an integer specifying the length of self
.__len__(self)
,返回一个整数,指定 self
的长度。Since a MyRange
doesn't actually store any of its values, we'd need to iterate them if we wanted to count them, which means would could build a list of those values and then ask that list for its length.
由于 MyRange
实际上不存储任何值,如果我们想要计算它们,我们需要对它们进行迭代,这意味着我们可以构建一个包含这些值的列表,然后询问该列表的长度。
class MyRange:
...
def __len__(self):
return len([x for x in self])
But we should be aware of the costs of our solutions; this works, but could we do substantially better? If there are n values in the range, this requires O(n) time to iterate through them, as well as O(n) memory to store them in a list. But if all we want to know is how many values are in our range, we have no need to store them; we just need to count them. What if we used a generator comprehension instead? Generators have no length, which rules out len(x for x in self)
, but we could transform each value into a 1 and then sum them up.
但我们应该意识到我们解决方案的成本;这个方法有效,但我们能做得更好吗?如果在范围内有 n 个值,这需要 O(n)的时间来遍历它们,同时需要 O(n)的内存来将它们存储在列表中。但如果我们只想知道我们的范围内有多少个值,我们就不需要将它们存储起来;我们只需要计数。如果我们使用生成器推导呢?生成器没有长度,这排除了 len(x for x in self)
,但我们可以将每个值转换为 1 然后将它们相加。
class MyRange:
...
def __len__(self):
return sum(1 for x in self)
If there are ten values in our range, we'll be summing the ten 1's that we generated, so this should produce the right answer. This technique reduces our memory usage to O(1), because we're now generating one value at a time, ignoring it (in favor of the value 1), and then adding 1 to a running sum. This is roughly equivalent to having written a loop instead.
如果我们的范围中有十个值,我们将对生成的十个 1 进行求和,因此这应该会得出正确的答案。这种技术将我们的内存使用量降低到 O(1),因为我们现在一次只生成一个值,忽略它(而选择值 1),然后将 1 添加到一个运行总和中。这大致相当于编写了一个循环。
class MyRange:
...
def __len__(self):
count = 0
for value in self:
count += 1
return count
So, this is an improvement from a memory perpsecitve, but we're still spending O(n) time, because we're still iterating the values in our range from beginning to end. A larger improvement would be to eliminate the iteration of the values altogether, though this would only be possible if we could find some other to deduce how many there are. Fortunately, the values in a MyRange
follow a straightforward pattern, so we can instead calculate the length of a pattern with an arithmetic formula, by dividing the difference between stop
and start
by step
, then applying a little bit of finesse to handle the edge cases properly.
所以,从记忆的角度来看,这是一个改进,但我们仍然花费 O(n)的时间,因为我们仍然在从头到尾迭代我们范围内的值。一个更大的改进是完全消除值的迭代,尽管这只有在我们能找到其他方法来推断有多少值时才可能。幸运的是, MyRange
中的值遵循一个简单的模式,所以我们可以通过用算术公式计算模式的长度来代替,方法是将 stop
和 start
之间的差异除以 step
,然后应用一点技巧来正确处理边缘情况。
class MyRange:
...
def __len__(self):
return max(0, math.ceil((self._stop - self._start) / self._step))
This version runs in O(1) time and uses O(1) memory. It's always made up of one subtraction, one division, one ceiling operation, and determining the maximum of exactly two integers. Whether the range is extremely long or very short, the sequence of operations is always the same, so its cost remains constant, regardless of the range's length.
这个版本在 O(1)时间内运行,并使用 O(1)内存。它总是由一个减法、一个除法、一个向上取整操作和确定两个整数的最大值组成。无论范围是非常长还是非常短,操作序列始终相同,因此其成本保持恒定,不受范围长度的影响。
Note, too, that if MyRange
also supported negative step
values, as well — ours didn't — then we'd need to adjust our formula some more, but it would still be possible to calculate a length in both constant time and memory.
请注意,如果 MyRange
也支持负 step
值,那么我们需要进一步调整我们的公式,但仍然可以在恒定时间和内存中计算长度。
Truthiness 真实性
There are many situations in Python where objects are treated as truth values, which is to say that they're considered either to be truthy (i.e., treated as though they're a boolean True
) or falsy (i.e., like a boolean False
). This is why the conditional expression of an if
statement or a while
loop can evaluate to any type of object, or why an iterable containing any types of objects can be passed to the built-in functions any
or all
.
在 Python 中,有许多情况下将对象视为真值,也就是说它们被视为真值(即,被视为布尔值 True
)或假值(即,像布尔值 False
)。这就是为什么 if
语句的条件表达式或 while
循环可以评估为任何类型的对象,或者为什么包含任何类型对象的可迭代对象可以传递给内置函数 any
或 all
。
Making that feature work requires a decision on the fundamental question: Which objects are considered truthy and which are considered falsy? The design of Python answers that question for its built-in types, including rules such as these.
使该功能正常工作需要对基本问题做出决定:哪些对象被视为真实,哪些被视为虚假?Python 的设计回答了这个问题,包括其内置类型的规则。
None
is falsy. None
是假值。True
is truthy, while False
is falsy.True
是真值,而 False
是假值。But what about objects of the classes we write? Under what conditions are they truthy? Under what conditions are they falsy? And, most importantly, can we decide those conditions, instead of leaving it to Python to decide?
那么我们编写的类的对象呢?它们在什么条件下为真?在什么条件下为假?最重要的是,我们能否决定这些条件,而不是让 Python 来决定?
>>> class Person:
... def __init__(self, name):
... self._name = name
...
>>> p1 = Person('Alex')
>>> bool(p1)
True # A Person with a non-empty name is truthy.
>>> p2 = Person('')
>>> bool(p2)
True # A Person with an empty name is also truthy.
From our experimentation, it appears that objects of our classes are always truthy, but there's more to the story than meets the eye, though. Given what we know already about the Python data model, we can reasonably expect that one or more dunder methods will allow us to alter this outcome.
根据我们的实验,我们的类的对象似乎总是真实的,但事实并非如此简单。考虑到我们已经了解的 Python 数据模型,我们可以合理地期望一个或多个 dunder 方法将允许我们改变这种结果。
How lengths impact truthiness
长度如何影响真实性
We saw previously that we can give objects a length by writing a __len__
method in their class. We've also seen that empty strings and empty lists — whose lengths are zero — are considered to be falsy. What happens to objects of our classes when they have lengths?
我们之前看到,通过在它们的类中编写一个 __len__
方法,我们可以为对象赋予长度。我们还看到,空字符串和空列表的长度为零,被认为是虚假的。当我们的类的对象具有长度时会发生什么?
>>> len(MyRange(10))
10
>>> bool(MyRange(10))
True # A MyRange with a non-zero length is truthy.
>>> len(MyRange(5, 5))
0
>>> bool(MyRange(5, 5))
False # A MyRange with a zero length is falsy.
For objects that are sized (i.e., those that implement a __len__
method), their lengths can be used to determine truthiness. If calculating lengths is inexpensive, and if we're happy with that behavior — which is in line with objects that are built into Python, so we'd need a good reason to feel otherwise about it — then we're done. (This is one reason why implementing our methods efficiently is so important; it has a compounding benefit, since one method can often form the basis of others, as well, so that one fast operation becomes many fast operations.)
对于具有大小的对象(即实现 __len__
方法的对象),它们的长度可以用来确定真实性。如果计算长度不费力,并且我们对这种行为感到满意——这符合 Python 内置对象的行为,因此我们需要一个充分的理由来改变这种看法——那么我们就完成了。(这就是为什么高效实现我们的方法如此重要的原因之一;它具有复利效益,因为一个方法通常也可以成为其他方法的基础,这样一个快速操作就变成了许多快速操作。)
Still, not all objects are sized, but we might nonetheless want to control their truthiness. Or, we might be able to implement a way to determine truthiness that's cheaper than we're able to calculate a length. What do we do then?
然而,并非所有对象都有大小,但我们可能仍然希望控制它们的真实性。或者,我们可能能够实现一种比计算长度更便宜的确定真实性的方法。那么我们该怎么办呢?
Directly overriding truthiness
直接覆盖真值
Adding a __bool__(self)
method to a class directly overrides how its truthiness is determined, independent of whether it has a length. This means that determining the truthiness of an object is really a process that has as many as three steps.
向类添加一个 __bool__(self)
方法会直接覆盖其真值的确定方式,而不受其长度的影响。这意味着确定对象的真值实际上是一个包含多达三个步骤的过程。
__bool__
method, its result is used to determine the object's truthiness.__bool__
方法,其结果将用于确定对象的真值。__len__
method, its result is used instead — with zero being falsy and anything non-zero being truthy.__len__
方法,则使用其结果 — 零为假,非零为真。This explains why objects of our previous Person
class were always truthy: In the absence of __bool__
or __len__
methods in a class, this is Python's default. So, if we want to override that default, we'll need at least one of those methods.
这解释了为什么我们之前的 Person
类的对象始终为真:在类中没有 __bool__
或 __len__
方法的情况下,这是 Python 的默认行为。因此,如果我们想要覆盖该默认行为,我们至少需要其中一个方法。
>>> class Person:
... def __init__(self, name):
... self._name = name
... def __bool__(self):
... return self._name == 'Boo'
...
>>> p1 = Person('Boo')
>>> bool(p1)
True # Boo is truthy
>>> p2 = Person('Alex')
>>> bool(p2)
False # Everyone else is falsy
This is an aspect of Python's data model that we'll see play out repeatedly. It's often the case that providing one operation (in this case, a length) will automatically supply a default behavior for others (in this case, truthiness), though we can do something other than that default when it's appropriate from the perspectives of correctness or performance. This makes the common situations easier to implement, while still allowing us to implement things more carefully when we need to.
这是 Python 数据模型的一个方面,我们会反复看到这种情况。通常情况下,提供一个操作(在这种情况下是长度)会自动为其他操作(在这种情况下是真实性)提供默认行为,尽管在正确性或性能的角度来看,当适当时我们可以做一些与默认行为不同的事情。这使得常见情况更容易实现,同时仍然允许我们在需要时更加谨慎地实现事物。
Indexing 索引
Some kinds of objects in Python can be indexed, which generally means that we can think of them as containing other objects, but that they give us a way to uniquely identify each of those objects so that we can ask for them individually and know definitively which one we'll get back.
在 Python 中,一些类型的对象可以被索引,这通常意味着我们可以将它们视为包含其他对象,但它们为我们提供了一种独特标识每个对象的方式,以便我们可以单独请求它们并确切地知道我们将得到哪一个。
The simplest example of indexing is asking a list for one of its elements given an index. Since lists are designed around an indexing scheme where the first element has the index 0, the second element has the index 1, and so on, then when we ask for the element at a particular index, it's clear which one we're asking for. Strings and ranges have that same design characteristic, so they can be indexed similarly.
索引的最简单示例是根据索引向列表请求其元素之一。由于列表是围绕索引方案设计的,其中第一个元素的索引为 0,第二个元素的索引为 1,依此类推,因此当我们请求特定索引处的元素时,很明确我们要请求哪一个。字符串和范围具有相同的设计特征,因此它们可以类似地进行索引。
>>> values = [1, 3, 5, 7, 9]
>>> values[4]
9 # ^^^ 4 is the index, in this case, so we want the fifth element.
>>> range(1, 100, 4)[3]
13 # ^^^ Here, we want the fourth value in the range.
>>> 'Boo is happy'[0]
'B' # ^^^ We're looking for a string containing the first character of 'Boo is happy'.
Dictionaries can also be indexed, albeit in a somewhat different way. A dictionary contains unique keys, with a value associated with each of them. So, when you index a dictionary, you're asking a different question: What is the value associated with this key? Still, the syntax is the same, and the underlying idea is, too: Give me the value that's uniquely identified by this index (where, for a dictionary, those indices are really its keys).
字典也可以被索引,尽管方式略有不同。字典包含唯一的键,每个键都关联着一个值。因此,当你索引一个字典时,你在问一个不同的问题:这个键关联的值是什么?尽管如此,语法是相同的,底层思想也是一样的:给我这个索引唯一标识的值(对于字典来说,这些索引实际上就是它的键)。
>>> d = {'A': 27, 'B': 17, 'C': 0}
>>> d['B']
17
For some kinds of objects that allow indexing — though not all kinds — we can also assign into those indexes. Again, the syntax is the same for all such indexed objects, and the underlying idea is also the same, though the implementation details differ from one type of object to another.
对于一些允许索引的对象——尽管不是所有类型的对象都可以——我们也可以分配给这些索引。同样,对于所有这些带索引对象,语法是相同的,底层思想也是相同的,尽管实现细节因对象类型不同而有所不同。
>>> values[3] = 13
>>> values
[1, 3, 5, 13, 9] # One object in the list has been replaced.
>>> d['B'] = 1
>>> d
{'A': 27, 'B': 1, 'C': 0} # The value associated with a key has been replaced.
>>> range(1, 100, 4)[3] = 10
Traceback (most recent call last):
...
TypeError: 'range' object does not support item assignment
# Ranges are immutable, so we can't assign into them.
Those objects that allow assignment into indexes usually also allow deletion of an index, using the del
statement.
那些允许分配到索引的对象通常也允许使用 del
语句删除索引。
>>> del values[3]
>>> values
[1, 3, 5, 9]
>>> del d['A']
>>> d
{'B': 1, 'C': 0}
That many kinds of objects support the same syntax with potentially different implementation details suggests again that dunder methods are being called behind the scenes here.
许多种类的对象支持相同的语法,可能具有不同的实现细节,这表明双下划线方法在幕后被调用。
Dunder methods for implementing indexing
实现索引的 Dunder 方法
When we want objects to support indexing, we add at least one dunder method to their class.
当我们希望对象支持索引时,我们至少要向它们的类添加一个双下划线方法。
__getitem__(self, index)
, which returns the value associated with the specified index.__getitem__(self, index)
,返回与指定索引关联的值。Note that the word "index" does not necessarily mean a non-negative integer, or even an integer at all. It's up to the __getitem__
method to decide what constitutes a valid index and what an index means. (This is what makes it possible to index lists with integers, while being able to index dictionaries with arbitrary hashable keys. Their __getitem__
methods are written differently.)
请注意,“索引”一词不一定意味着非负整数,甚至根本不是整数。决定什么构成有效索引以及索引意味着什么,这取决于 __getitem__
方法。(这就是为什么可以使用整数索引列表,同时能够使用任意可散列键索引字典。它们的 __getitem__
方法编写方式不同。)
If we want to support assigning into an index and deletion of an index, there are additional dunder methods we can add alongside __getitem__
.
如果我们想支持分配到索引和删除索引,我们可以在 __getitem__
旁边添加额外的 dunder 方法。
__setitem__(self, index, value)
, which assigns the specified value into the specified index.__setitem__(self, index, value)
,将指定的值分配到指定的索引中。__delitem__(self, index)
, which deletes the value at the specified index.__delitem__(self, index)
,删除指定索引处的值。Indexing is one feature that Python's built-in range
provides that our MyRange
class doesn't. Rectifying that would be a matter of adding a __getitem__
method to our MyRange
class. (Since ranges are immutable, we wouldn't want to add __setitem__
or __delitem__
.) Like our __len__
method, __getitem__
can calculate its answer in O(1) time using O(1) memory, so it would be best to do so.
索引是 Python 内置的 range
提供的一个功能,而我们的 MyRange
类没有。要纠正这一点,只需向我们的 MyRange
类添加一个 __getitem__
方法即可。(由于范围是不可变的,我们不希望添加 __setitem__
或 __delitem__
。)与我们的 __len__
方法一样, __getitem__
可以在 O(1)时间内使用 O(1)内存计算其答案,因此最好这样做。
class MyRange:
...
def __getitem__(self, index):
if type(index) is not int:
raise TypeError(f'MyRange index must be int, but was {type(index).__name__}')
elif index < 0 or index >= len(self):
raise IndexError('MyRange index was out of range')
return self._start + index * self._step
Since __getitem__
accepts a parameter other than self
, but needs to perform calculations based on that parameter's value, some validation was necessary, so that non-integer indices and out-of-range indices would raise exceptions with descriptive error messages instead of returning invalid answers.
由于 __getitem__
接受除 self
之外的参数,但需要根据该参数的值执行计算,因此需要进行一些验证,以便非整数索引和超出范围的索引会引发异常,并显示描述性错误消息,而不是返回无效答案。
How the presence of indexing impacts other operations
索引的存在如何影响其他操作
When a class has both a __len__
method and a __getitem__
method that accepts non-negative indices, an interesting thing happens: Even without an __iter__
method, its objects become iterable automatically. This is because __len__
and __getitem__
combine together into something called the sequence protocol, which means that objects supporting that combination of methods are what we call sequences.
当一个类同时具有 __len__
方法和 __getitem__
方法来接受非负索引时,一个有趣的事情发生了:即使没有 __iter__
方法,它的对象也会自动变成可迭代的。这是因为 __len__
和 __getitem__
结合在一起形成了一种称为序列协议的东西,这意味着支持这种方法组合的对象被我们称为序列。
If we know that an object is a sequence, we know that it can be iterated without an __iter__
method, via calls to __getitem__
and __len__
. To understand why, let's briefly experiment with a class that includes these methods.
如果我们知道一个对象是一个序列,我们知道它可以在没有 __iter__
方法的情况下被迭代,通过调用 __getitem__
和 __len__
。为了理解这一点,让我们简要地尝试一个包含这些方法的类。
>>> class ThreeSequence:
... def __len__(self):
... return 3
... def __getitem__(self, index):
... if 0 <= index < len(self):
... return index * 3
... else:
... raise IndexError
...
>>> s = ThreeSequence()
>>> s[0]
0 # If s can be indexed with integers, isn't 0 the first index?
>>> s[1]
3 # In that case, isn't 1 the second index?
>>> s[2]
6 # And isn't 2 the third?
>>> len(s)
3 # Doesn't this tell us that s[3] would fail if we tried it?
>>> index = 0
>>> while index < len(s):
... print(s[index])
... index += 1
... # Therefore, isn't this a reliable pattern for iterating such a sequence?
0
3
6
So, as it turns out, when we iterate an object, there's a bit more to the story than we've seen.
所以,事实证明,当我们迭代一个对象时,故事比我们看到的要复杂一些。
__iter__
method, it's called, and its result is the iterator that will be used to manage the iteration.__iter__
方法,则调用该方法,并其结果是将用于管理迭代的迭代器。__len__
and __getitem__
methods, an iterator that runs something akin to the while
loop above is executed instead, except that the values are returned to us individually instead of printed in the Python shell. (One way to implement that ourselves would be with a generator function, though Python handles the details internally for us.)__len__
和 __getitem__
方法,则执行类似于上面 while
循环的迭代器,不同之处在于值被单独返回给我们,而不是在 Python shell 中打印出来。(我们自己实现这一点的一种方法是使用生成器函数,尽管 Python 会为我们处理细节。)In fact, iteration works in the presence of a __getitem__
method that accepts non-negative indexes, even in the absence of a __len__
method, in which case successively larger indexes are passed to __getitem__
until it raises an IndexError
, at which point the iteration is considered to have ended.
实际上,在存在接受非负索引的 __getitem__
方法的情况下,即使没有 __len__
方法,也可以进行迭代,此时会将逐渐增大的索引传递给 __getitem__
直到引发 IndexError
为止,此时迭代被视为已结束。
However, the __len__
method is useful in concert with __getitem__
for another reason: It also provides the automatic ability to iterate an object in reverse, since something akin to the following while
loop can be used instead.
然而, __len__
方法与 __getitem__
结合使用还有另一个原因:它还提供了自动反向迭代对象的能力,因为可以使用类似以下 while
循环。
>>> index = len(s)
>>> while index > 0:
... index -= 1
... print(s[index])
... # This is a reliable pattern for iterating a sequence in reverse, if we know its length.
... # Without knowing the length, Python couldn't efficiently know where to start.
6
3
0
Additionally, objects implementing indexing (with or without a __len__
method) have another similar automatically implemented behavior.
此外,实现索引(无论是否具有 __len__
方法)的对象具有另一种类似的自动实现行为。
>>> [i in s for i in range(8)]
[True, False, False, True, False, False, True, False]
# The 'in' operator can be used to see if they contain a value,
# though this will be done using iteration, which will take linear time
# for each use of the 'in' operator.
When we write a class that implements a sequence, we'll quite often want to provide our own implementations of these three features — iteration, reverse iteration, and "contains" — especially if we can do so more performantly than the default. If so, we could add these three dunder methods to a class.
当我们编写一个实现序列的类时,我们经常会想要提供这三个功能的自定义实现 — 迭代、反向迭代和“包含” — 尤其是如果我们可以比默认实现更高效地实现这些功能。如果可以的话,我们可以将这三个特殊方法添加到一个类中。
__iter__(self)
method we saw before would provide custom iteration, by returning an iterator. (Note that if __iter__(self)
is written as a generator function, then its result will be an iterator automatically.)__iter__(self)
方法将提供自定义迭代,通过返回一个迭代器。(请注意,如果 __iter__(self)
被编写为一个生成器函数,那么它的结果将自动成为一个迭代器。)__reversed__(self)
method would provide reverse iteration. It returns a reverse iterator (i.e., an iterator that produces the values in reverse order).__reversed__(self)
方法将提供反向迭代。它返回一个反向迭代器(即,一个以相反顺序产生值的迭代器)。__contains__(self, value)
method would determine whether a value is part of the sequence, returning True
if so or False
otherwise.__contains__(self, value)
方法将确定值是否为序列的一部分,如果是,则返回 True
,否则返回 False
。MyRange
would benefit from an implementation of __contains__
, for example, since its result could then be determined in constant time using some straightforward arithmetic, rather than iterating every value in a potentially large range. There's no reason it should cost more to evaluate 100000 in MyRange(1000000)
than it does to evaluate 0 in MyRange(1)
, but only a custom __contains__
method will make that possible. On the other hand, the automatic implementations of forward and reverse iteration arising from MyRange
's indexing feature are probably fine.
例如, MyRange
可以受益于实现 __contains__
,因为这样可以使用一些简单的算术在常数时间内确定其结果,而不是在潜在的大范围内迭代每个值。评估 100000 in MyRange(1000000)
的成本不应该比评估 0 in MyRange(1)
的成本更高,但只有自定义的 __contains__
方法才能实现这一点。另一方面,由 MyRange
的索引特性产生的正向和反向迭代的自动实现可能是可以接受的。
So, it's worth knowing what features are provided automatically (and how they're provided automatically), because when these automatic implementations are performant enough for our needs, it means fewer features that we need to build, test, and maintain over time. Those positive decisions compound as programs and teams grow.
因此,了解自动提供的功能(以及它们如何自动提供)是值得的,因为当这些自动实现对我们的需求足够高效时,意味着我们需要构建、测试和维护的功能更少。这些积极的决策会随着程序和团队的增长而增加。
Slicing 切片
Indexing allows us to obtain a single object within another, such as one element of a list or the value associated with a key in a dictionary. A variant of indexing that we've not yet considered is what Python calls slicing, which allows us to take a sequence of objects and obtain a subsequence, containing some of the objects while skipping others. The slice will usually be the same type — so, for example, a slice of a list will be a list, a slice of a string will be a string, and so on.
索引允许我们在另一个对象中获取单个对象,例如列表中的一个元素或字典中与键关联的值。我们尚未考虑的索引变体是 Python 称之为切片的操作,它允许我们获取一系列对象并获得一个子序列,其中包含一些对象而跳过其他对象。切片通常将是相同类型的 — 例如,列表的切片将是一个列表,字符串的切片将是一个字符串,依此类推。
>>> values = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
>>> values[2:6]
[5, 7, 9, 11]
>>> values[2:8:2]
[5, 9, 13]
>>> values[:3]
[1, 3, 5]
>>> values[7:]
[15, 17, 19]
>>> 'Boo is happy today'[:3]
'Boo'
>>> range(0, 20, 2)[3:7]
range(6, 14, 2)
>>> 'Boo'[::]
'Boo'
Syntactically, slicing looks a lot like indexing. If we start with an expression whose value can be sliced, we can follow that expression with brackets, in which we can write either two or three values separated by colons. Those three values, analogous to the values that describe a range
, are named start
, stop
, and step
. In the slice notation, all three values are optional.
从句法上看,切片看起来很像索引。如果我们从一个可以切片的表达式开始,我们可以在该表达式后面加上方括号,在方括号中我们可以写入两个或三个由冒号分隔的值。这三个值,类似于描述一个 range
的值,分别被命名为 start
, stop
和 step
。在切片表示法中,这三个值都是可选的。
This raises the question of what mechanism is used to implement slicing. We've seen previously that indexing is implemented via the __getitem__
dunder method, and we can see that slicing uses a similar bracket-surrounded notation, so is it safe for us to assume that __getitem__
does slicing, too? If so, how do we tell the difference between indexing and slicing? One way to find out is to experiment a bit.
这引发了一个问题,即用什么机制来实现切片。我们之前已经看到,索引是通过 __getitem__
dunder 方法实现的,我们可以看到切片使用了类似的方括号包围的表示法,那么我们可以安全地假设 __getitem__
也实现了切片吗?如果是这样,我们如何区分索引和切片?找出答案的一种方法是进行一些实验。
>>> class Thing:
... def __getitem__(self, index):
... print(f'type(index) = {type(index)}')
... print(f'index = {index}')
... return None
...
>>> t = Thing()
>>> t[4]
type(index) = <class 'int'>
index = 4
>>> t[1:17:6]
type(index) = <class 'slice'>
index = slice(1, 17, 6)
>>> t[1:17]
type(index) = <class 'slice'>
index = slice(1, 17, None)
>>> t[:17]
type(index) = <class 'slice'>
index = slice(None, 17, None)
>>> t[::]
type(index) = <class 'slice'>
index = slice(None, None, None)
From this experimentation, we can deduce a few things:
通过这次实验,我们可以推断出一些事情:
__getitem__
method is called, just like when we perform an indexing operation.__getitem__
方法,就像执行索引操作时一样。__getitem__
is called during a slicing operation, its parameter is a slice
object.__getitem__
时,其参数是 slice
对象。slice
object contains three values, which default to None
when not explicitly specified.slice
对象包含三个值,默认情况下为 None
。So, if we want to implement slicing in a class, we'll need to add some functionality to our __getitem__
method to detect that its parameter is a slice
and, if so, handle it specially. How do we interact with a slice
object?
因此,如果我们想在类中实现切片,我们需要向我们的 __getitem__
方法添加一些功能,以检测其参数是否为 slice
,如果是,则特殊处理。我们如何与 slice
对象交互?
>>> s = slice(1, 17, 6)
>>> s.start, s.stop, s.step
(1, 17, 6) # We can access its start, stop, and step attributes.
>>> s.step = 100
Traceback (most recent call last):
...
AttributeError: readonly attribute
# Like ranges, slices are immutable.
>>> start, stop, step = s.indices(10)
>>> start, stop, step
(1, 10, 6) # We can ask it what the applicable start, stop, and step
# values would be for a given length. In this case, we've asked this
# for a length of 10, which is why the applicable stop is less than
# the original one.
>>> defaulted = slice(None, None, None)
>>> [type(x) for x in (defaulted.start, defaulted.stop, defaulted.step)]
[<class 'NoneType'>, <class 'NoneType'>, <class 'NoneType'>]
# When a slice is constructed with Nones, they aren't defaulted
# to anything; they remain Nones.
>>> dstart, dstop, dstep = defaulted.indices(10)
>>> dstart, dstop, dstep
(0, 10, 1) # Even if the start, stop, and step are all None, the
# indices method returns integer results.
>>> [index for index in range(*defaulted.indices(10))]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# If indices returns a tuple of three values, we can unpack it into
# three arguments (the start, stop, and step), pass them to range (which
# also understands the concept of a start, stop, and step), and we now have
# a range of the indices that make up our slice.
Now that we understand the building blocks available to us, we have an idea of how we might add slicing to our MyRange
class, by reorganizing its __getitem__
method to allow the given index to either be an integer or a slice, then using the slice's indices
method to help us figure out the appropriate result when given a slice.
现在我们了解了可用的构建块,我们可以想象如何将切片添加到我们的 MyRange
类中,通过重新组织其 __getitem__
方法,使给定的索引可以是整数或切片,然后使用切片的 indices
方法来帮助我们在给定切片时找出适当的结果。
class MyRange:
...
def __getitem__(self, index):
if type(index) is int:
if 0 <= index < len(self):
return self._start + index * self._step
else:
raise IndexError('MyRange index was out of range')
elif type(index) is slice:
start, stop, step = index.indices(len(self))
start_value = self._start + start * self._step
stop_value = min(self._start + stop * self._step, self._stop)
step_value = step * self._step
return MyRange(start_value, stop_value, step_value)
else:
raise TypeError(f'MyRange index must be int or slice, but was {type(index).__name__}')
It's also possible to assign to a slice of an object, as well as delete a slice. Implementing support for those operations requires similar modifications to __setitem__
and __delitem__
, whose index
parameter will be a slice
object in these situations.
也可以将对象的一部分分配给一个切片,以及删除一个切片。为了实现对这些操作的支持,需要对 __setitem__
和 __delitem__
进行类似的修改,这些情况下的 index
参数将是一个 slice
对象。
Hashing 哈希
Python draws a distinction between the objects that are hashable and those that aren't. Conceptually, hashable objects have two qualities that others don't.
Python 区分了可哈希和不可哈希的对象。从概念上讲,可哈希的对象具有其他对象所没有的两个特性。
If we want to hash an object, we can call the built-in Python function hash
and pass it the object as an argument. The algorithm used to hash an object is not particularly important to us, but you'll notice how differently objects can hash even when their values are fairly similar to each other; this turns out not to be an accident, for reasons you'll learn more about in ICS 46.
如果我们想对一个对象进行哈希,我们可以调用内置的 Python 函数 hash
并将对象作为参数传递给它。用于对对象进行哈希的算法对我们来说并不特别重要,但您会注意到即使它们的值相互之间相当相似,对象的哈希也可能会有很大不同;这并非偶然,您将在 ICS 46 中了解更多相关原因。
>>> hash(3)
3
>>> hash('Boo')
-6365711242479792522
>>> hash('Boo!')
-6359222305862117936
>>> hash((1, 2))
-3550055125485641917
>>> hash((1, 2, 3))
529344067295497451
If there are objects that are unhashable, we wouldn't expect to be able to pass them to the hash
function. Mutable objects generally won't be hashable, so we wouldn't expect to be able to hash a list. Let's try it.
如果有不可散列的对象,我们就不会期望能够将它们传递给 hash
函数。通常不可变对象不会是可散列的,因此我们不会期望对列表进行哈希。让我们试一试。
>>> hash([1, 2, 3])
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'
How does the hash
function know whether an object is hashable? As you likely expect, there is a dunder method called __hash__
that calculates an object's hash. Hashable objects are the ones that have a __hash__
method; unhashable objects are the ones that don't. The job of the __hash__
method is to combine the information in the object together into a single integer, taking all of that information into account, so that objects that are different in some way will be likely to hash differently. A simple but effective way to do that is to create a tuple containing all of the object's attributes, then pass those to the built-in hash
function. This leads to a simple implementation of __hash__
for our MyRange
class — whose objects are immutable, so we might reasonably expect to be able to hash them, as well.
hash
函数如何知道对象是否可散列?正如你可能期望的那样,有一个名为 __hash__
的双下划线方法来计算对象的哈希值。可散列对象是那些具有 __hash__
方法的对象;不可散列对象是那些没有该方法的对象。 __hash__
方法的作用是将对象中的信息组合成一个整数,考虑所有这些信息,以便在某种程度上不同的对象很可能会有不同的哈希值。一个简单但有效的方法是创建一个包含对象所有属性的元组,然后将其传递给内置的 hash
函数。这导致了一个简单的 __hash__
实现,适用于我们的 MyRange
类 — 其对象是不可变的,因此我们可能合理地期望能够对它们进行哈希处理。
class MyRange:
...
def __hash__(self):
return hash((self._start, self._stop, self._step))
Remember, though, that the reason we want to be able to hash objects is so we can store them in a hash table, which is to say that we want to be able to arrange them in a way that we can use their hashes to find them again easily. But hashes are not guaranteed to be unique; it's possible for two different objects to hash identically. So, just because we find an object that has a particular hash, we can't know whether it's the object we're looking for; we just know that it's an object that ended up in the same place. Because of that, when objects are hashable, there's one other important thing we'll need to be able to do with them: compare them to other objects to see if they're equivalent. To do that, we'll need to dig a little further into the Python data model.
请记住,我们希望能够对对象进行哈希的原因是为了能够将它们存储在哈希表中,换句话说,我们希望能够以一种方式排列它们,以便我们可以利用它们的哈希来轻松地再次找到它们。但是哈希值并不保证是唯一的;两个不同的对象可能具有相同的哈希值。因此,仅仅因为我们找到了一个具有特定哈希值的对象,并不能确定它是否是我们正在寻找的对象;我们只知道它是一个最终出现在同一位置的对象。因此,当对象是可哈希的时候,我们还需要能够执行另一项重要操作:将它们与其他对象进行比较,以查看它们是否等价。为了做到这一点,我们需要进一步深入了解 Python 数据模型。
Comparison operators 比较运算符
Python gives us the ability to compare objects in various ways, and its data model allows us to control how most of those comparisons are implemented when they involve objects of our classes. Before we can implement these kinds of comparisons, we ought to be sure we understand the kinds of comparisons that can be done, because there are some subtleties that we need to take into account. What should it mean for two objects to be "equal"? What should it mean for one object to be "less than" another?
Python 赋予我们以各种方式比较对象的能力,其数据模型允许我们控制大多数涉及我们类对象的比较是如何实现的。在我们能够实现这些类型的比较之前,我们应该确保我们理解可以进行的比较类型,因为有一些微妙之处需要考虑。两个对象“相等”意味着什么?一个对象“小于”另一个意味着什么?
Identity and equivalence
身份和等价性
First, let's be sure we understand Python's idea of equality. When we compare two objects and ask "Are these the same?", what are we actually asking? Is there always one question we're trying to answer, or are there different ones?
首先,让我们确保我们理解 Python 对相等的理念。当我们比较两个对象并问“这些是相同的吗?”时,我们实际上在问什么?我们总是试图回答一个问题,还是有不同的问题?
Like many programming languages, Python's design distinguishes between two ideas of equality: identity and equivalence. Because we might be interested in knowing either of these things, a separate syntax exists for each of them.
像许多编程语言一样,Python 的设计区分了两种相等的概念:标识和等价。因为我们可能对这两者中的任何一个感兴趣,所以为它们各自存在着不同的语法。
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> id(a), id(b)
(1994268245696, 1994268381632)
# The id function returns an object's identity, which is
# unique to each object, even objects that have identical meaning.
>>> a is b
False # The is operator returns True only when two objects have
# the same identity, regardless of whether they have identical meaning.
>>> a == b
True # The == operator returns True when two objects have
# an equivalent meaning, even if they have different identities.
Neither of these operators is definitively better than the other; they're simply meant to solve different problems, so the key is knowing what problem you have, which will allow you to make the right choice for your needs.
这两个运算符中没有一个明显优于另一个;它们只是用来解决不同的问题,所以关键是了解您遇到的问题,这将帮助您为您的需求做出正确的选择。
is
operator (and its opposite, is not
) is likely to be the faster-running of the two, since it's likely implemented behind the scenes as an inexpensive comparison of two memory addresses. So, if identity is truly the only thing you want to check, using is
would be the better choice. (You might have seen me compare types using the is
operator for exactly this reason.)is
运算符(以及它的相反运算符 is not
)可能是这两者中运行速度更快的,因为它很可能在幕后作为两个内存地址的廉价比较来实现。因此,如果标识确实是您想要检查的唯一内容,那么使用 is
将是更好的选择。(您可能已经看到我使用 is
运算符来比较类型,原因就是这个。)==
operator (and its opposite, !=
) is capable of determining whether two objects are equivalent, even if their identities are different (or maybe even if their types are different). Because it asks a more complex question, it can take longer to run, but that's neither here nor there if the problem calls for it; fast is better than slow, but slow is better than incorrect. (In practice, I find that I use ==
more often than is
, because it's more often that I'm interested in equivalence than identity.)==
运算符(以及它的相反, !=
)能够确定两个对象是否等价,即使它们的标识不同(甚至可能它们的类型不同)。因为它提出了一个更复杂的问题,所以运行时间可能会更长,但如果问题需要,那也无关紧要;快比慢好,但慢比错误好。(实际上,我发现我比较经常使用 ==
而不是 is
,因为我更经常对等价性感兴趣而不是标识。)Python provides no way to override the built-in meaning of the is
operator. The only circumstance in which a is b
is True
is when both a
and b
have the same identity. That's all there is to it, and all there can be: Either they're the same object or they aren't.
Python 没有提供覆盖 is
运算符内置含义的方法。唯一的情况是当 a
和 b
具有相同的标识时, a is b
才是 True
。就是这样,也只能这样:要么它们是同一个对象,要么它们不是。
Equivalence, on the other hand, is something that would naturally need to be implemented differently in different classes; after all, what it means for two integers to be equivalent is very different from what it means for two lists of strings to be equivalent. Consequently, the Python data model provides a mechanism for us to specify what it means for two objects of our classes to be equivalent. Before we get to that, though, let's finish our conversation about comparisons in Python.
相等性,另一方面,在不同的类中自然需要以不同的方式实现;毕竟,对于两个整数相等意味着什么,与对于两个字符串列表相等意味着什么是非常不同的。因此,Python 数据模型为我们提供了一种机制,让我们指定我们的类的两个对象相等意味着什么。不过,在我们讨论 Python 中的比较之前,让我们先完成关于比较的对话。
Relational comparisons 关系比较
A common feature of programming languages allows us to compare two objects relationally, which means that we want to compare them on the basis of their natural ordering, so we can determine which (if any) is smaller than the other. Integers, for example, have a natural ordering that most of us learned when we were quite young: 2 is greater than 1, 5 is greater than 1, 6 is less than 17, and so on. So, it's unsurprising to most novice Python programmers to discover that there are operators that perform those kinds of comparisons.
编程语言的一个常见特性允许我们在关系上比较两个对象,这意味着我们希望根据它们的自然顺序来比较它们,以便确定哪个(如果有的话)比另一个小。例如,整数有一个自然顺序,大多数人在很小的时候就学会了:2 大于 1,5 大于 1,6 小于 17,依此类推。因此,对于大多数初学者来说,发现有执行这些比较的运算符并不奇怪。
>>> 2 > 1
True
>>> 17 < 6
False
For some types, their natural ordering is obvious enough that it hardly needs to be explained to us. For other types, there could potentially be more than one reasonable way to order them. For example, what rule causes the following behavior?
对于某些类型,它们的自然顺序是显而易见的,几乎不需要向我们解释。对于其他类型,可能有多种合理的排序方式。例如,是什么规则导致了以下行为?
>>> [1, 2] < [1, 3]
True
>>> [2, 3] < [1, 2]
False
>>> [1, 2] < [1, 2, 3]
True
The answer is that this is a well-known technique called lexicographical ordering, which is a fancy term for a simple idea:
答案是这是一种称为词典排序的众所周知的技术,这是一个简单概念的花哨术语:
(Note that this is the same algorithm we use to sort English words into alphabetical order, comparing one letter at a time until we find a difference, or until one word turns out to be a prefix of the other.)
(请注意,这是我们用来将英语单词按字母顺序排序的相同算法,逐个字母比较直到找到差异,或者直到一个单词被证明是另一个单词的前缀。)
For still other types — most types in a large program fit into this category — there's no natural way to order them at all; they simply don't have a notion of "less than" or "greater than" associated with them, so a sensible design would render such a comparison impossible altogether. (A good software design is as much about disallowing unreasonable things as it is about allowing reasonable ones.)
对于另一些类型——大型程序中的大多数类型都属于这一类——根本没有自然的排序方式;它们根本没有“小于”或“大于”的概念与之相关联,因此一个明智的设计会使这种比较完全不可能。(一个好的软件设计既是关于不允许不合理的事情,也是关于允许合理的事情。)
Since different types of objects will need to handle relational comparisons differently, Python's data model provides hooks for us to control how they behave, too. Now that we understand the problem we're solving in enough detail, all that remains are the implementation details. Bring on the dunder methods!
由于不同类型的对象需要以不同方式处理关系比较,Python 的数据模型为我们提供了控制它们行为的钩子。现在我们已经足够详细地理解了我们要解决的问题,剩下的只是实现细节。来吧,双下划线方法!
Implementing equality comparisons
实现相等比较
If we want to provide a custom implementation of equivalence for the objects of a class, we can do so by adding an __eq__
method to its class.
如果我们想为类的对象提供自定义等价性实现,可以通过向其类添加一个 __eq__
方法来实现。
__eq__(self, other)
, which returns True
if self
is equivalent to other
, False
if self
is inequivalent to other
, or NotImplemented
if the equality operation is not supported for the types of the arguments.__eq__(self, other)
,如果 self
等价于 other
则返回 True
,如果 self
不等价于 other
则返回 False
,如果参数的类型不支持相等操作,则返回 NotImplemented
。(NotImplemented
— like True
, False
, and None
— is a constant value in Python, whose type is NotImplementedType
.)
( NotImplemented
— 如 True
、 False
和 None
— 是 Python 中的常量值,其类型为 NotImplementedType
。)
Notably, if we don't write an __eq__
method, they still provide an implementation of equality automatically, but it's based only on identity (i.e., two objects are equal if and only if they have the same identity). So, if we want anything other than that, we'll need to implement an __eq__
method.
值得注意的是,如果我们不编写 __eq__
方法,它们仍然会自动提供相等性的实现,但仅基于标识(即,仅当两个对象具有相同标识时它们才相等)。因此,如果我们想要其他内容,我们需要实现 __eq__
方法。
In our MyRange
class, we might implement it by checking that the other object is also a MyRange
, and that their _start
, _stop
, and _step
attributes are equivalent.
在我们的 MyRange
类中,我们可以通过检查其他对象也是 MyRange
,并且它们的 _start
, _stop
和 _step
属性是等价的来实现它。
class MyRange:
...
def __eq__(self, other):
if type(other) is MyRange:
return self._start == other._start \
and self._stop == other._stop \
and self._step == other._step
else:
return NotImplemented
Let's try our implementation out.
让我们尝试一下我们的实现。
>>> r1 = MyRange(0, 10)
>>> r2 = MyRange(0, 10)
>>> r3 = MyRange(0, 10, 2)
# r1 and r2 are equivalent but have different identities.
# r3 is not equivalent to either r1 or r2.
>>> r1 is r2
False # As expected, r1 and r2 have different identities.
>>> r1 == r2
True # As expected, r1 and r2 are equivalent, by our new definition.
>>> r1 == r3
False # r1 and r3 are not.
>>> r1 != r2, r1 != r3
(False, True) # Interestingly, the != operator seems to be working, as well.
The last of these expressions points us to something interesting: If we implement equality, we get an implementation of inequality automatically. Thinking about it conceptually, this seems sensible enough; if we've specified the conditions under which two objects are equivalent, then two objects are inequivalent under the opposite conditions. So, Python could reasonably implement r1 != r2
as not r1.__eq__(r2)
.
这些表达式中的最后一个指向了一个有趣的事实:如果我们实现了相等性,那么我们会自动得到不相等性的实现。从概念上考虑,这似乎是合理的;如果我们已经指定了两个对象等价的条件,那么在相反的条件下,这两个对象就是不等价的。因此,Python 可以合理地将 r1 != r2
实现为 not r1.__eq__(r2)
。
Still, as usual, Python provides a way for us to specify inequality, in case we could do it ourselves more performantly.
然而,像往常一样,Python 为我们提供了一种指定不等式的方式,以防我们自己能够更高效地完成。
__ne__(self, other)
, which returns True
if self
is inequivalent to other
or False
otherwise.__ne__(self, other)
,如果 self
不等于 other
,则返回 True
,否则返回 False
。In practice, it will rarely if ever be the case that we'd want it to return a different result from not r1.__eq__(r2)
, but we might sometimes be able to find a more efficient way to calculate that result than checking for equivalence and then negating it.
在实践中,我们很少或几乎不会希望它返回与 not r1.__eq__(r2)
不同的结果,但有时我们可能会找到一种比检查等价性然后取反更有效的计算结果的方法。
Strangely, adding an __ne__
method to a class without an __eq__
method does not automatically provide a meaning of equality. In other words, even though we know that r1.__eq__(r2)
is the same as not r1.__ne__(r2)
, Python does not perform this conversion automatically. So, generally, we can think of __eq__
as necessary when customizing equivalence and __ne__
only as a potential optimization.
奇怪的是,向一个没有 __eq__
方法的类添加一个 __ne__
方法并不会自动提供相等的含义。换句话说,即使我们知道 r1.__eq__(r2)
与 not r1.__ne__(r2)
相同,Python 也不会自动执行此转换。因此,通常情况下,我们可以认为自定义等价性时 __eq__
是必要的,而 __ne__
只是一种潜在的优化。
Clarifying the relationship between equality and hashing
阐明平等和哈希之间的关系
There's one other thing worth noting about equality: For reasons we discussed in our conversation about hashing, classes that provide a __hash__
method must also provide an __eq__
method, because there's a vitally important that must be maintained between the meanings of equivalence and hashing.
关于平等还有一件值得注意的事情:正如我们在关于哈希的对话中讨论的那样,提供 __hash__
方法的类也必须提供 __eq__
方法,因为在等价和哈希的含义之间必须保持一个至关重要的关系。
That implication only applies in one direction: Two objects with the same hash will not necessarily be equivalent, even though we'd prefer that they do. The reason we can't make this happen is a practical one: Part of what we do when we hash an object is simplify it to a value that's lower-fidelity; some information will almost surely be lost, which means inequivalent objects will unavoidably have the same hash sometimes. While we'd like to make that as unlikely as possible, we can't avoid it in general.
这种暗示只适用于一个方向:具有相同哈希的两个对象不一定是等价的,尽管我们希望它们是。我们无法实现这一点的原因是实际的:当我们对一个对象进行哈希时,我们所做的部分工作是将其简化为一个更低保真度的值;一些信息几乎肯定会丢失,这意味着不等价的对象有时会不可避免地具有相同的哈希。虽然我们希望尽可能降低这种情况发生的可能性,但我们无法在一般情况下避免它。
Still, because of that implication, it's quite often the case that taking control over hashing or equality implies that we should take control of the other, as well, because their meanings are necessarily intertwined. Fortunately, there are some additional mechanics that help us to avoid writing classes that accidentally fail to meet these constraints, particularly because objects support both hashability and identity-based equality by default.
然而,由于这种暗示,通常情况下控制哈希或相等性意味着我们应该同时控制另一个,因为它们的含义必然是相互交织在一起的。幸运的是,有一些额外的机制帮助我们避免编写意外未能满足这些约束的类,特别是因为对象默认支持哈希性和基于标识的相等性。
>>> class Thing:
... pass
...
>>> hash(Thing())
107648736801 # Objects are hashable by default
>>> Thing() == Thing()
False # Objects support equality (albeit using only their
# identities) by default.
>>> t1 = Thing()
>>> t2 = Thing()
>>> t1 == t2
# Since t1 and t2 refer to separate objects (with separate
# identities), they are not considered equivalent.
>>> hash(t1) == hash(t2)
False # Inequivalent objects can have different hashes, which
# makes this answer allowable -- though True would be
# allowable, too.
>>> t1 == t1
True
>>> hash(t1) == hash(t1)
True # Equivalent objects must have the same hash, which
# is correct by default.
The provided defaults meet the basic requirement, but this raises the question of what happens if we implement our own custom behavior for one of these operations without implementing the other. To determine the answer to that question, we'll need to think about both operations separately.
提供的默认值满足基本要求,但这引发了一个问题,即如果我们为其中一个操作实现自定义行为而没有实现另一个操作会发生什么。为了确定这个问题的答案,我们需要分别考虑这两个操作。
What happens if we implement hashing without equality?
如果我们实现哈希而没有实现相等会发生什么?
>>> class HashableThing:
... def __hash__(self):
... return 999
...
>>> hash(HashableThing())
999
>>> h1 = HashableThing()
>>> h2 = HashableThing()
>>> h1 == h2, hash(h1) == hash(h2)
(False, True) # Inequivalent objects are allowed to have the same hash.
>>> h1 == h1, hash(h1) == hash(h1)
(True, True) # Equivalent objects must have the same hash.
Essentially, nothing can go wrong here, as long as our __hash__
method is written in a way that uses only what's stored within the object, since the default behavior of equality is to use identity (i.e., no two different objects are equal). What we're trying to avoid is equivalent objects having different hashes; that can't happen when we implement __hash__
, as long as we implement it properly.
本质上,只要我们的 __hash__
方法是以只使用对象内部存储的内容编写的,这里不会出现任何问题,因为相等性的默认行为是使用标识(即,没有两个不同的对象是相等的)。我们要避免的是具有不同哈希的等效对象;只要我们正确实现 __hash__
,这种情况就不会发生。
On the other hand, what happens if we implement equality without hashing?
另一方面,如果我们在不使用哈希的情况下实现相等性,会发生什么?
>>> class UnhashableThing:
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return isinstance(other, UnhashableThing) and self.value == other.value
...
>>> UnhashableThing(13) == UnhashableThing(13)
True # Objects storing the same values are equivalent.
>>> UnhashableThing(11) == UnhashableThing(7)
False # Objects storing different values are inequivalent.
# But without having implemented custom hashing behavior,
# how can we be sure their hashes will be different?
>>> hash(UnhashableThing(11)) == hash(UnhashableThing(7))
Traceback (most recent call last):
...
TypeError: unhashable type: 'UnhashableThing'
# Oh! UnhashableThings aren't hashable at all!
>>> UnhashableThing.__hash__ is None
True # It's because UnhashableThing has no __hash__ method.
As a safety mechanism, when we write an __eq__
method in a class without a __hash__
method being written in that same class, Python automatically sets the value of __hash__
in the class dictionary to None
, specifically to avoid the problem we otherwise would have created: Specifying a way for two objects to be equivalent without having ensured that their hashes would be the same.
作为一种安全机制,当我们在一个类中编写一个没有相应的 __hash__
方法的 __eq__
方法时,Python 会自动将类字典中的 __hash__
的值设置为 None
,以避免我们可能会创建的问题:指定两个对象等效但未确保它们的哈希值相同的方式。
Of course, this mechanism isn't fool-proof, since nothing prevents us from writing __eq__
and __hash__
methods that don't fit together properly, but there is at least a sensible default at work here, which prevents us from accidentally introducing a problem we hadn't considered thoroughly.
当然,这种机制并非百分之百可靠,因为没有任何东西阻止我们编写 __eq__
和 __hash__
方法,而这两者并不完全匹配,但至少在这里有一个合理的默认设置,可以防止我们意外引入未经彻底考虑的问题。
Implementing relational comparisons
实现关系比较
Relational comparisons can also be customized in Python, and dunder methods will again be our tool of choice when performing that customization. But it's wise for us to begin by understanding the default; what happens if we don't customize relational comparisons in a class? If we don't specify the details, under what circumstances are objects "less than" other objects?
在 Python 中,关系比较也可以定制化,当进行这种定制时,dunder 方法将再次成为我们的选择工具。但是,最好先了解默认情况;如果我们不在类中定制关系比较,会发生什么?如果我们不指定细节,对象在什么情况下会“小于”其他对象?
>>> class Thing:
... pass
...
>>> t1 = Thing()
>>> t2 = Thing()
>>> t1 < t2
Traceback (most recent call last):
...
TypeError: '<' not supported between instances of 'Thing' and 'Thing'
Unlike equality comparisons, for which there is a default behavior if you don't specify one, relational comparisons have no default. If a class doesn't describe how its objects are to be compared relationally, they can't be. So, if we want relational comparisons for objects of our classes, we'll need to implement them ourselves. There are four kinds of relational comparisons (<, <=, >, and >=), so we can do this using (up to) four dunder methods that provide their implementations.
与相等比较不同,如果您不指定相等比较的行为,将会有默认行为,但是关系比较没有默认行为。如果一个类没有描述其对象如何在关系上进行比较,那么它们就无法进行比较。因此,如果我们希望对我们类的对象进行关系比较,我们需要自己实现它们。有四种关系比较(<、<=、> 和 >=),因此我们可以使用(最多)四个 dunder 方法来提供它们的实现。
__lt__(self, other)
, which returns True
if self
is less than other
or False
otherwise.__lt__(self, other)
,如果 self
小于 other
则返回 True
,否则返回 False
。__gt__(self, other)
, which returns True
if self
is greater than other
or False
otherwise.__gt__(self, other)
,如果 self
大于 other
,则返回 True
,否则返回 False
。__le__(self, other)
, which returns True
if self
is less than or equal to other
or False
otherwise.__le__(self, other)
,如果 self
小于或等于 other
,则返回 True
,否则返回 False
。__ge__(self, other)
, which returns True
if self
is greater than or equal to other
or False
otherwise.__ge__(self, other)
,如果 self
大于或等于 other
,则返回 True
,否则返回 False
。As with equality, you don't need to implement all four methods to be able to perform all four comparisons, since there are known relationships between their results.
与相等性一样,您不需要实现所有四种方法才能执行所有四种比较,因为它们的结果之间存在已知的关系。
t1.__lt__(t2)
is True
, we expect t2.__gt__(t1)
to be True
.t1.__lt__(t2)
是 True
时,我们期望 t2.__gt__(t1)
是 True
。t1.__le__(t2)
is True
, we expect t2.__ge__(t1)
to be True
.t1.__le__(t2)
是 True
时,我们期望 t2.__ge__(t1)
是 True
。Consequently, an implementation of either __lt__
or __gt__
can be used for both the < and > operators, and an implementation of either __le__
or __ge__
can be used for both the <= and >= operators.
因此,可以使用 __lt__
或 __gt__
的实现来同时用于<和>运算符,可以使用 __le__
或 __ge__
的实现来同时用于<=和>=运算符。
>>> class Thing:
... def __init__(self, value):
... self.value = value
... def __lt__(self, other):
... print(f'__lt__: self.value = {self.value}, other.value = {other.value}')
... return self.value < other.value
...
>>> t1 = Thing(3)
>>> t2 = Thing(9)
>>> t1 < t2
__lt__: self.value = 3, other.value = 9
True # Here, __lt__ was called on t1, with t2 being other.
>>> t1 > t2
__lt__: self.value = 9, other.value = 3
False # Here, __lt__ was called on t2, with t1 being other.
Even though the presence of both __eq__
and __lt__
could theoretically be enough to implement a <= operator, Python doesn't implement that conversion for us. So, if we want relational comparisons, we probably want at least these three dunder methods: __eq__
, __lt__
, and __le__
, with the other three (__ne__
, __gt__
, and __ge__
) providing a mechanism for optimization if we need it.
即使理论上存在 __eq__
和 __lt__
,可能足以实现<=运算符,但 Python 不会为我们实现该转换。因此,如果我们想要关系比较,我们可能至少需要这三个特殊方法: __eq__
, __lt__
和 __le__
,另外三个( __ne__
, __gt__
和 __ge__
)提供了一种优化机制,如果需要的话。
Mixed-type comparisons 混合类型比较
Comparisons can be done between objects of different classes, which may then have different implementations of these same dunder methods. If we evaluate x == y
, but x
and y
have different types, then whose __eq__
method is called? One way to find out is to experiment.
可以在不同类的对象之间进行比较,这些对象可能有不同的这些相同特殊方法的实现。如果我们评估 x == y
,但 x
和 y
有不同的类型,那么调用谁的 __eq__
方法?找出的一种方法是进行实验。
>>> class A:
... def __eq__(self, other):
... print('A.__eq__')
... return True
...
>>> class B:
... def __eq__(self, other):
... print('B.__eq__')
... return True
...
>>> a1 = A()
>>> b1 = B()
>>> a1 == b1
A.__eq__
True
>>> b1 == a1
B.__eq__
True
From our experimentation, it appears that the rule is simple: When we write x == y
, x.__eq__(y)
is called. But what happens if x
's class has no __eq__
method, but y
's does?
从我们的实验中,看起来规则很简单:当我们写 x == y
时,会调用 x.__eq__(y)
。但如果 x
的类没有 __eq__
方法,而 y
的类有呢?
>>> class C:
... pass
...
>>> c1 = C()
>>> c1 == b1
B.__eq__
True
In that case, Python is instead calling b1.__eq__(c1)
instead, so that we can get a better answer than we might get if we relied instead on C
's default equality (which is based only on identity).
在这种情况下,Python 实际上是调用 b1.__eq__(c1)
,这样我们可以得到比依赖于 C
的默认相等性(仅基于标识)得到的更好的答案。
It turns out that there's a little more to the underlying rule than this — we can flesh this out a bit further when we talk about inheritance and subclassing — but this is a good enough understanding for us to proceed with for now: The __eq__
method of the left-hand object is used, unless only the right-hand object has an __eq__
method.
原来,底层规则比这个稍微复杂一点 — 当我们谈论继承和子类化时,我们可以进一步阐述这一点 — 但这对我们现在继续进行是足够好的理解:左侧对象的 __eq__
方法被使用,除非只有右侧对象有 __eq__
方法。
Arithmetic operators 算术运算符
Another set of operators whose meaning we might like to redefine for our classes is the set of arithmetic operators. Consider the kinds of operators we can apply to integers in Python; below are many (but not all) of them.
另一组我们可能希望重新定义类的意义的运算符是算术运算符组。考虑在 Python 中可以应用于整数的运算符类型;以下是许多(但不是全部)运算符。
>>> i = 40
>>> j = 16
>>> +i
40 # The unary plus operator returns its operand unchanged.
>>> -i
-40 # The unary minus operator returns the negation of its operand.
>>> (i + j, i - j, i * j)
(56, 24, 640) # We can add, subtract, and multiply two integers.
>>> (i / j, i // j, i % j)
(2.5, 2, 8) # There are two kinds of division, along with modulo.
>>> i ** 2
1600 # We can exponentiate integers.
Once again, dunder methods form the basis of how these operators are actually implemented, so if we want to define the meaning of these operators when used on objects of our own classes, we can do so by implementing the appropriate dunder methods. They follow a few recurring patterns, so we'll stick with a couple of examples; you can read about the rest of them in the Python Data Model documentation when you need them.
再次,dunder 方法构成了这些运算符实际实现的基础,因此,如果我们想要定义这些运算符在我们自己类的对象上使用时的含义,我们可以通过实现适当的 dunder 方法来实现。它们遵循一些重复的模式,因此我们将坚持使用一些示例;当您需要时,您可以在 Python 数据模型文档中阅读其余部分。
The unary operators +
and -
have a single operand, so that suggests that they would best be implemented by dunder methods that accept a self
parameter and no others.
一元运算符 +
和 -
只有一个操作数,这表明最好通过接受一个 self
参数而不是其他参数的 dunder 方法来实现它们。
__pos__(self)
implements the unary plus operator.__pos__(self)
实现了一元加操作符。__neg__(self)
implements the unary minus operator.__neg__(self)
实现了一元减操作符。>>> class Thing:
... def __init__(self, value):
... self.value = value
... def __repr__(self):
... return f'Thing({self.value})'
... def __neg__(self):
... return Thing(-self.value)
...
>>> t1 = Thing(17)
>>> -t1
Thing(-17) # Our result is negated.
>>> t1
Thing(17) # t1 is unchanged, as it should be.
That last expression points out something important: We don't expect the arithmetic operators to modify their operands, so we need to be sure to return new values instead of modifying the existing ones. This aligns the behavior of our classes with the types that are built into Python.
最后一个表达式指出了一些重要的事情:我们不希望算术运算符修改它们的操作数,因此我们需要确保返回新值而不是修改现有的值。这样可以使我们的类的行为与内置到 Python 中的类型保持一致。
The various binary operators have two operands, so they might instead be implemented as dunder methods taking two parameters: a self
and one other. We saw that pattern already when we implemented the comparison operators, and it recurs here.
各种二进制运算符有两个操作数,因此它们可以作为接受两个参数的 dunder 方法来实现:一个 self
和另一个。当我们实现比较运算符时,我们已经看到了这种模式,并且在这里它再次出现。
__add__(self, other)
returns the sum of self
and other
.__add__(self, other)
返回 self
和 other
的和。__sub__(self, other)
returns the difference when subtracting other
from self
.__sub__(self, other)
返回从 self
减去 other
的差。__mul__(self, other)
returns the product of self
and other
.__mul__(self, other)
返回 self
和 other
的乘积。__truediv__(self, other)
returns the quotient of self
and other
, without taking the floor of the result.__truediv__(self, other)
返回 self
和 other
的商,而不取结果的地板值。__floordiv__(self, other)
returns the floor of the quotient of self
and other
.__floordiv__(self, other)
返回 self
和 other
的商的地板值。__pow__(self, other)
returns the result of raising self
to the power other
.__pow__(self, other)
返回将 self
的 other
次方的结果。Let's try implementing a simple __add__
method as a first example.
让我们尝试实现一个简单的 __add__
方法作为第一个示例。
>>> class Thing:
... def __init__(self, value):
... self.value = value
... def __repr__(self):
... return f'Thing({self.value})'
... def __add__(self, other):
... return Thing(self.value + other.value)
...
>>> t1 = Thing(3)
>>> t2 = Thing(9)
>>> t1 + t2
Thing(12)
In a complete implementation, any of these methods would best return NotImplemented
when the operation is not supported on the types of the arguments. For example, if you want it to be possible to add objects of your class Money
to other objects of the same class, but not others, then you would return NotImplemented
whenever the other
parameter has a type other than Money
. This rule turns out not only to be hygenic, but it also forms the basis of additional automation; if Python knows that an operator isn't implemented, it might be able to try an equivalent alternative instead.
在完整的实现中,当操作不支持参数类型时,任何这些方法中的任何一个都应该最好返回 NotImplemented
。例如,如果您希望能够将您的类 Money
的对象添加到同一类的其他对象,但不是其他类的对象,则当 other
参数的类型不是 Money
时,您应该返回 NotImplemented
。这个规则不仅仅是卫生的,而且它也构成了额外自动化的基础;如果 Python 知道某个运算符没有被实现,它可能会尝试一个等效的替代方案。
Reflected arithmetic operators
反射算术运算符
When we redefine arithmetic operators, the usual rule is that the operation x ? y
turns into a call to the equivalent dunder method, with x
being the first argument (i.e., passed into self
) and y
being the second (i.e., passed into other
). So, for example, when evaluating x + y
, Python attempts to call x.__add__(y)
, the result of which becomes the result of the addition.
当我们重新定义算术运算符时,通常规则是操作 x ? y
会变成对等 dunder 方法的调用, x
是第一个参数(即传递给 self
), y
是第二个参数(即传递给 other
)。因此,例如,在评估 x + y
时,Python 尝试调用 x.__add__(y)
,其结果成为加法的结果。
However, when x.__add__(y)
is not supported, Python makes one more attempt to add x
and y
, by instead calling into a reflected version of the operator instead. In the case of addition, that reflected operation is implemented by the dunder method __radd__
, so if x.__add__(y)
is unsupported, Python attempts to call y.__radd__(x)
instead. The mechanism it uses to determine whether x.__add__(y)
is supported is simple: If the __add__
method doesn't exist, or if it returns NotImplemented
, it's unsupported.
然而,当 x.__add__(y)
不受支持时,Python 会再尝试将 x
和 y
相加,而是调用反射版本的运算符。在加法的情况下,反射操作是通过 dunder 方法 __radd__
实现的,因此如果 x.__add__(y)
不受支持,Python 会尝试调用 y.__radd__(x)
。它用于确定 x.__add__(y)
是否受支持的机制很简单:如果 __add__
方法不存在,或者返回 NotImplemented
,则不受支持。
There's a reason why the reflected arithmetic operators need to be implemented separately: Not all of these operators are commutative. We would expect x + y
and y + x
to return the same result — in many cases, but certainly not all! — but we would not expect the same from x - y
and y - x
. So, if Python is going to reverse the order of the arguments, we'll also need to know that we need to perform the reverse of the usual operation, so a different dunder method is called.
有一个原因需要单独实现反射算术运算符:并非所有这些运算符都是可交换的。我们期望 x + y
和 y + x
返回相同的结果 — 在许多情况下,但肯定不是所有情况! — 但我们不会期望 x - y
和 y - x
返回相同的结果。因此,如果 Python 要颠倒参数的顺序,我们还需要知道我们需要执行与通常操作相反的操作,因此会调用不同的 dunder 方法。
__radd__(self, other)
returns the sum of other
and self
.__radd__(self, other)
返回 other
和 self
的和。__rsub__(self, other)
returns the difference when subtracting self
from other
.__rsub__(self, other)
返回从 other
减去 self
的差。__rmul__(self, other)
returns the product of other
and self
.__rmul__(self, other)
返回 other
和 self
的乘积。__rtruediv__(self, other)
returns the quotient of other
and self
, without taking the floor of the result.__rtruediv__(self, other)
返回 other
和 self
的商,而不取结果的地板值。__rfloordiv__(self, other)
returns the floor of the quotient of other
and self
.__rfloordiv__(self, other)
返回 other
和 self
的商的地板值。__rpow__(self, other)
returns the result of raising other
to the power of self
.__rpow__(self, other)
返回将 other
的 self
次方的结果。So, why is it so important for us to have this ability? Let's suppose that we have our own class called Thing
, and that we want to be able to add Thing
objects to integers. We might start by writing this.
那么,为什么我们有这种能力如此重要呢?假设我们有自己的类叫做 Thing
,我们希望能够将 Thing
对象添加到整数中。我们可能会从编写以下内容开始。
>>> class Thing:
... def __init__(self, value):
... self.value = value
... def __repr__(self):
... return f'Thing({self.value})'
... def __add__(self, other):
... if type(other) is Thing:
... return Thing(self.value + other.value)
... elif type(other) is int:
... return Thing(self.value + other)
... else:
... return NotImplemented
...
>>> t = Thing(17)
>>> t + 1
Thing(18) # So far, so good!
>>> 1 + t
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +: 'int' and 'Thing'
# This is why we need __radd__!
At this point, there are two ways we might think about solving the problem.
在这一点上,我们可能有两种解决问题的方式。
__add__
method in the int
class, so that it is able to add integers to Thing
s.int
类中的 __add__
方法,使其能够将整数添加到 Thing
中。__radd__
method in the Thing
class, which performs reflected addition of an integer to a Thing
.Thing
类中编写一个 __radd__
方法,执行将整数反射添加到 Thing
的操作。And, of course, the first option isn't actually a choice we have, because we can't (and shouldn't) modify Python's built-in int
class. So, ultimately, our only option is reflected addition.
当然,第一个选项实际上并不是我们可以选择的,因为我们不能(也不应该)修改 Python 的内置 int
类。因此,最终,我们唯一的选择是反射性的加法。
>>> class Thing:
... def __init__(self, value):
... self.value = value
... def __repr__(self):
... return f'Thing({self.value})'
... def _add_values(self, other):
... if type(other) is Thing:
... return self.value + other.value
... elif type(other) is int:
... return self.value + other
... else:
... return None
... def __add__(self, other):
... new_value = self._add_values(other)
... return Thing(new_value) if new_value is not None else NotImplemented
... def __radd__(self, other):
... new_value = self._add_values(other)
... return Thing(new_value) if new_value is not None else NotImplemented
...
>>> t = Thing(17)
>>> t + 1
Thing(18)
>>> 1 + t
Thing(18) # Problem solved!
Augmented arithmetic operators
增强算术运算符
There is one more variation on arithmetic operators that we need to consider, as well. Python provides augmented arithmetic operators, which have the job of modifying an existing object rather than building a new one. For immutable objects, this is a distinction without a difference, but when objects are mutable, there can be a substantial performance benefit in implementing the augmented arithmetic operators differently from the others. Lists provide a good example of this.
我们还需要考虑算术运算符的另一种变体。Python 提供了增强型算术运算符,其作用是修改现有对象而不是构建新对象。对于不可变对象来说,这是一个没有区别的区分,但对于可变对象来说,通过以不同方式实现增强型算术运算符可能会带来显著的性能优势。列表提供了一个很好的例子。
>>> values = [1, 2, 3, 4, 5]
>>> values + [6, 7, 8]
[1, 2, 3, 4, 5, 6, 7, 8]
>>> values
[1, 2, 3, 4, 5] # The + operator left values intact.
>>> values += [6, 7, 8]
>>> values
[1, 2, 3, 4, 5, 6, 7, 8] # But += changed it.
You've probably been previously acquainted with the conceptual difference between values + [6, 7, 8]
and values += [6, 7, 8]
, but a little asymptotic analysis tells us that there's a significant performance difference here, as well.
你可能之前已经了解了 values + [6, 7, 8]
和 values += [6, 7, 8]
之间的概念差异,但一些渐近分析告诉我们,在这里也存在着显著的性能差异。
values + additional
must leave values
intact, meaning that it must build an entirely new list. This means all of the elements of values
need to be copied into that new list, followed by all of the ones that we're adding to the end of it. If there are n elements in values
and m elements in additional
, we'll spend O(n + m) time on this operation (i.e., linear, but proportional to the sum of the lengths of the two lists).values + additional
必须保持 values
不变,这意味着它必须构建一个全新的列表。这意味着 values
的所有元素都需要被复制到这个新列表中,然后再加入到其末尾的所有元素。如果 values
中有 n 个元素, additional
中有 m 个元素,那么我们将在这个操作上花费 O(n + m)的时间(即线性的,但与两个列表长度之和成比例)。values += additional
appends each value in additional
to the end of values
directly. This means it doesn't matter how many elements are in values
, because none of them will need to be relocated; it only matters how many elements are in additional
. If there are n elements in values
and m elements in additional
, we'll spend O(m) time on this operation. It's still linear, but it's linear with respect to the length of one list, rather than the sum of the lengths of both. This difference can be quite large in practice, as it's not uncommon to add a small number of elements to a large list.values += additional
将 additional
中的每个值直接附加到 values
的末尾。这意味着 values
中有多少个元素都无关紧要,因为它们都不需要被重新定位;只有 additional
中有多少个元素才重要。如果 values
中有 n 个元素, additional
中有 m 个元素,我们将在此操作上花费 O(m)的时间。这仍然是线性的,但是它是相对于一个列表的长度而不是两者长度之和的线性。在实践中,这种差异可能相当大,因为向大列表添加少量元素并不罕见。So, there can certainly be a performance benefit in implementing the augmented arithmetic operators separately from the others.
因此,从其他操作符中单独实现增强算术运算符肯定会带来性能优势。
For immutable objects, we won't want to implement them, because the default behavior — make a new object with the new value — is exactly what we'd want. For example, our most recent Thing
class supports +=
already, since Python will automatically turn t += 3
into the equivalent t = t + 3
instead.
对于不可变对象,我们不希望实现它们,因为默认行为——使用新值创建一个新对象——正是我们想要的。例如,我们最近的 Thing
类已经支持 +=
,因为 Python 会自动将 t += 3
转换为等效的 t = t + 3
。
>>> t = Thing(17)
>>> id(t)
2008047601600
>>> t += 3
>>> t
Thing(20)
>>> id(t)
2008047602800 # The id has changed here, because t + 3 built a new Thing object.
If Thing
objects are intended to be mutable — we never decided about that, since we're just noodling — then we might instead want to implement augmented addition, so they'd be modified directly. (This is especially true if constructing Thing
objects was more expensive than filling in a single integer attribute; it's for types like lists where this distinction matters the most.) Augmented arithmetic operators, like the others, are implemented using dunder methods.
如果 Thing
对象打算是可变的 - 我们从未决定过这一点,因为我们只是在思考 - 那么我们可能希望实现增强加法,这样它们就可以直接被修改。(特别是如果构建 Thing
对象比填充单个整数属性更昂贵的话;对于像列表这样的类型,这种区别最为重要。)增强算术运算符和其他运算符一样,是使用特殊方法来实现的。
__iadd__(self, other)
, which adds other
to self
in-place (i.e., modifying self
).__iadd__(self, other)
,将 other
添加到 self
中(即直接修改 self
)。__isub__(self, other)
, which subtracts other
from self
in-place.__isub__(self, other)
,从 self
中减去 other
(即就地修改{{3}})。__imul__(self, other)
, which multiplies self
by other
in-place.__imul__(self, other)
,将 self
就地乘以 other
。__itruediv__(self, other)
, which divides self
by other
in-place, without taking the floor of the result.__itruediv__(self, other)
,将 self
就地除以 other
,而不取结果的地板值。__ifloordiv__(self, other)
, which floor-divides self
by other
in-place.__ifloordiv__(self, other)
,将 self
就地地板除以 other
。__ipow__(self, other)
, which raises self
to the power of other
in-place.__ipow__(self, other)
,将 self
提升到 other
的幂。These methods return the updated result, which will usually just be self
(albeit with modifications), or they'll return NotImplemented
when the types of the operands are not supported.
这些方法返回更新后的结果,通常只是 self
(尽管有修改),或者在操作数的类型不受支持时返回 NotImplemented
。
Implementing augmented addition in our Thing
class might look like this, then.
在我们的 Thing
类中实现增强加法可能看起来像这样。
class Thing:
...
def __iadd__(self, other):
new_value = self._add_values(other)
if new_value is not None:
self.value = new_value
return self
else:
return NotImplemented
Note, too, that there are no reflected versions of the augmented arithmetic opeartors, for the simple reason that the object on the left-hand side of the operation is always the one being modified, so we would expect its class to be the one to know how to implement that modification.
请注意,增强算术运算符没有反射版本,原因很简单,操作的左侧对象始终是被修改的对象,因此我们期望它的类知道如何实现该修改。