这是用户在 2024-5-8 10:42 为 https://ics.uci.edu/~thornton/ics33/Notes/PythonDataModel/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

ICS 33 Spring 2024 2024 年春季 ICS 33
Notes and Examples: The Python Data Model
笔记和示例:Python 数据模型


Background 背景

As we've explored some features of Python that you may not have encountered before, a recurring theme has emerged. For the most part, the functions and classes built into Python and its standard library don't possess special abilities that aren't available within the code you write. Objects of pre-existing types can be context managers, but objects of your types can, too. Objects of many pre-existing types can be iterated using for loops, but so can objects of your types. You have to be aware of the mechanisms that make these things possible, but the important thing is that they are possible; most mechanisms the underlie Python are exposed and documented, which means your code can hook into them just as well as built-in code can. This is among the hallmarks of a well-designed programming language.
当我们探索一些您以前可能没有遇到过的 Python 功能时,一个重要主题浮现了。在大多数情况下,Python 及其标准库中内置的函数和类并不具备您在编写的代码中无法获得的特殊能力。现有类型的对象可以是上下文管理器,但您自己定义的类型的对象也可以是。许多现有类型的对象可以使用 for 循环进行迭代,但您自己定义的类型的对象也可以。您必须了解使这些事情成为可能的机制,但重要的是它们是可能的;Python 的大多数基础机制都是公开且有文档记录的,这意味着您的代码可以像内置代码一样连接到它们。这是设计良好的编程语言的特征之一。

Collectively, the mechanisms that govern how Python objects interact with each other within a running Python program are known as the Python data model. Both in this course and your prior coursework, you'll already have seen some parts of this model, even if you've never heard it put into these terms before. Here are a few familiar ideas from the Python data model.
总的来说,控制 Python 对象在运行中的 Python 程序中如何相互交互的机制被称为 Python 数据模型。在本课程和您之前的课程中,您可能已经看到了这个模型的一些部分,即使您以前从未听说过这些术语。以下是 Python 数据模型中的一些熟悉概念。

As we've seen before, all of the mechanisms described above rely on what we call protocols, which rely on the presence of one or more dunder methods, such as __init__, __enter__, __exit__, __iter__, __next__, and __get__. Virtually every time we interact with any object for any purpose in Python, at least one dunder method is called behind the scenes, even if we rarely call them ourselves. These dunder methods form the basis of how objects interact; their presence, alongside the fact that their meanings are documented and well-understood by seasoned Python programmers, ensures that we can modify most of what happens during those interactions when our designs require it. We can't change the fact that iterating an object causes __iter__ and __next__ methods to be called, but we can change what happens when they are, which means we can make iteration behave in any way we'd like. We can't change that the with statement requires __enter__ and __exit__ methods, but, by providing our own __enter__ and __exit__ methods, we can apply the concept of context management to any problem for which it's suited.
正如我们之前所看到的,上述所有机制都依赖于我们所称的协议,这些协议依赖于一个或多个 dunder 方法的存在,比如 __init____enter____exit____iter____next____get__ 。在 Python 中,几乎每次我们与任何对象进行交互以任何目的,至少会在幕后调用一个 dunder 方法,即使我们很少自己调用它们。这些 dunder 方法构成了对象相互作用的基础;它们的存在,以及它们的含义被经验丰富的 Python 程序员记录和理解,确保了我们可以在需要时修改大部分交互过程中发生的事情。我们无法改变迭代对象会导致调用 __iter____next__ 方法的事实,但我们可以改变它们被调用时发生的事情,这意味着我们可以使迭代行为按我们希望的方式进行。我们无法改变 with 语句需要 __enter____exit__ 方法的事实,但是,通过提供我们自己的 __enter____exit__ 方法,我们可以将上下文管理的概念应用于任何适合的问题。

So, the key to improving our ability to design Python programs — writing programs that are "Pythonic," in the sense that using our functions and classes feels just like using the ones that are built in — is understanding as much of the Python data model as we can. We don't need every detail all the time, but every available detail is applicable to a problem we might have. When we solve problems the way Python does, we find that our functions and classes naturally "plug into" the things that are built in, and vice versa. When an entire community of programmers uses a programming language the same way, the community's ability to solve problems rapidly increases, because common problems are solved once and solved well, with the tools used to solve them combining naturally in ways that no one considered originally. We can stand on the shoulders of giants without losing our balance.
因此,提高我们设计 Python 程序的能力的关键在于编写“Pythonic”程序,即使用我们的函数和类感觉就像使用内置函数一样,这意味着我们需要尽可能多地了解 Python 数据模型。我们并不总是需要每一个细节,但每一个可用的细节都适用于我们可能遇到的问题。当我们像 Python 一样解决问题时,我们会发现我们的函数和类自然地“插入”到内置的东西中,反之亦然。当整个程序员社区以相同的方式使用编程语言时,社区解决问题的能力会迅速增加,因为常见问题只需解决一次,而且解决得很好,用于解决问题的工具会自然地结合在一起,这是没有人最初考虑到的。我们可以站在巨人的肩膀上而不失去平衡。

The price to be paid is that we have to learn the details. The payoff, though, is immense, so we'd be well-served to spend the time learning them. And, fortunately, the details rarely change, except to the extent that new details are added; once we know how iterators work, for example, that's how they'll likely continue to work for as long as we use Python. So, we at least want to be aware of what's available and the common threads that tie the features of Python's data model together. We can always look the details up again when we need them, even if we've forgotten some of them by then, but it's a lot harder to look things up when we don't know what we're looking for.
要付出的代价是我们必须学习细节。然而,回报是巨大的,所以我们最好花时间学习它们。幸运的是,细节很少改变,除非添加新的细节;例如,一旦我们知道迭代器是如何工作的,那么它们在我们使用 Python 的整个过程中很可能会继续工作。因此,我们至少希望了解可用的内容以及将 Python 数据模型的特性联系在一起的共同线索。即使到那时我们已经忘记了一些细节,我们仍然可以在需要时再次查找细节,但是当我们不知道自己在寻找什么时,查找事物就会变得更加困难。

So, let's dive in and see what else we can find in the Python data model.
那么,让我们深入了解并看看在 Python 数据模型中还能找到什么。


Lengths 长度

In Python, we say that an object is sized if it can be asked for a length. The usual way to ask is to pass the object as an argument to the built-in len function. Strings, lists, tuples, ranges, sets, and dictionaries are all examples of sized objects, though not all objects are sized; integers, for example, are not.
在 Python 中,如果一个对象可以被要求长度,我们称其为可大小化。通常询问的方式是将对象作为参数传递给内置的 len 函数。字符串、列表、元组、范围、集合和字典都是可大小化对象的示例,尽管并非所有对象都是可大小化的;例如,整数就不是。

>>> len('Boo')
    3
>>> len([1, 2, 3, 4, 5])
    5
>>> len(18)
    Traceback (most recent call last):
      ...
    TypeError: object of type 'int' has no len()

The MyRange class we wrote in a previous lecture is a good example of a class whose objects ought to be sized — if MyRange(10) comprises ten integers, then we could reasonably say its length is 10 — but this feature was missing from our implementation.
我们在之前的讲座中编写的 MyRange 类是一个很好的例子,它的对象应该具有大小 — 如果 MyRange(10) 包含十个整数,那么我们可以合理地说它的长度是 10 — 但这个特性在我们的实现中缺失了。

>>> len(MyRange(10))
    Traceback (most recent call last):
      ...
    TypeError: object of type 'MyRange' has no len()

Fortunately, there is a simple protocol by which we can add this feature to a class. All we need to do is write one extra method in our class:
幸运的是,我们可以通过一个简单的协议将这个功能添加到一个类中。我们只需要在我们的类中写一个额外的方法:

Since a MyRange doesn't actually store any of its values, we'd need to iterate them if we wanted to count them, which means would could build a list of those values and then ask that list for its length.
由于 MyRange 实际上不存储任何值,如果我们想要计算它们,我们需要对它们进行迭代,这意味着我们可以构建一个包含这些值的列表,然后询问该列表的长度。

class MyRange:
    ...

    def __len__(self):
        return len([x for x in self])

But we should be aware of the costs of our solutions; this works, but could we do substantially better? If there are n values in the range, this requires O(n) time to iterate through them, as well as O(n) memory to store them in a list. But if all we want to know is how many values are in our range, we have no need to store them; we just need to count them. What if we used a generator comprehension instead? Generators have no length, which rules out len(x for x in self), but we could transform each value into a 1 and then sum them up.
但我们应该意识到我们解决方案的成本;这个方法有效,但我们能做得更好吗?如果在范围内有 n 个值,这需要 O(n)的时间来遍历它们,同时需要 O(n)的内存来将它们存储在列表中。但如果我们只想知道我们的范围内有多少个值,我们就不需要将它们存储起来;我们只需要计数。如果我们使用生成器推导呢?生成器没有长度,这排除了 len(x for x in self) ,但我们可以将每个值转换为 1 然后将它们相加。

class MyRange:
    ...

    def __len__(self):
        return sum(1 for x in self)

If there are ten values in our range, we'll be summing the ten 1's that we generated, so this should produce the right answer. This technique reduces our memory usage to O(1), because we're now generating one value at a time, ignoring it (in favor of the value 1), and then adding 1 to a running sum. This is roughly equivalent to having written a loop instead.
如果我们的范围中有十个值,我们将对生成的十个 1 进行求和,因此这应该会得出正确的答案。这种技术将我们的内存使用量降低到 O(1),因为我们现在一次只生成一个值,忽略它(而选择值 1),然后将 1 添加到一个运行总和中。这大致相当于编写了一个循环。

class MyRange:
    ...

    def __len__(self):
        count = 0

        for value in self:
            count += 1

        return count

So, this is an improvement from a memory perpsecitve, but we're still spending O(n) time, because we're still iterating the values in our range from beginning to end. A larger improvement would be to eliminate the iteration of the values altogether, though this would only be possible if we could find some other to deduce how many there are. Fortunately, the values in a MyRange follow a straightforward pattern, so we can instead calculate the length of a pattern with an arithmetic formula, by dividing the difference between stop and start by step, then applying a little bit of finesse to handle the edge cases properly.
所以,从记忆的角度来看,这是一个改进,但我们仍然花费 O(n)的时间,因为我们仍然在从头到尾迭代我们范围内的值。一个更大的改进是完全消除值的迭代,尽管这只有在我们能找到其他方法来推断有多少值时才可能。幸运的是, MyRange 中的值遵循一个简单的模式,所以我们可以通过用算术公式计算模式的长度来代替,方法是将 stopstart 之间的差异除以 step ,然后应用一点技巧来正确处理边缘情况。

class MyRange:
    ...

    def __len__(self):
        return max(0, math.ceil((self._stop - self._start) / self._step))

This version runs in O(1) time and uses O(1) memory. It's always made up of one subtraction, one division, one ceiling operation, and determining the maximum of exactly two integers. Whether the range is extremely long or very short, the sequence of operations is always the same, so its cost remains constant, regardless of the range's length.
这个版本在 O(1)时间内运行,并使用 O(1)内存。它总是由一个减法、一个除法、一个向上取整操作和确定两个整数的最大值组成。无论范围是非常长还是非常短,操作序列始终相同,因此其成本保持恒定,不受范围长度的影响。

Note, too, that if MyRange also supported negative step values, as well — ours didn't — then we'd need to adjust our formula some more, but it would still be possible to calculate a length in both constant time and memory.
请注意,如果 MyRange 也支持负 step 值,那么我们需要进一步调整我们的公式,但仍然可以在恒定时间和内存中计算长度。


Truthiness 真实性

There are many situations in Python where objects are treated as truth values, which is to say that they're considered either to be truthy (i.e., treated as though they're a boolean True) or falsy (i.e., like a boolean False). This is why the conditional expression of an if statement or a while loop can evaluate to any type of object, or why an iterable containing any types of objects can be passed to the built-in functions any or all.
在 Python 中,有许多情况下将对象视为真值,也就是说它们被视为真值(即,被视为布尔值 True )或假值(即,像布尔值 False )。这就是为什么 if 语句的条件表达式或 while 循环可以评估为任何类型的对象,或者为什么包含任何类型对象的可迭代对象可以传递给内置函数 anyall

Making that feature work requires a decision on the fundamental question: Which objects are considered truthy and which are considered falsy? The design of Python answers that question for its built-in types, including rules such as these.
使该功能正常工作需要对基本问题做出决定:哪些对象被视为真实,哪些被视为虚假?Python 的设计回答了这个问题,包括其内置类型的规则。

But what about objects of the classes we write? Under what conditions are they truthy? Under what conditions are they falsy? And, most importantly, can we decide those conditions, instead of leaving it to Python to decide?
那么我们编写的类的对象呢?它们在什么条件下为真?在什么条件下为假?最重要的是,我们能否决定这些条件,而不是让 Python 来决定?

>>> class Person:
...     def __init__(self, name):
...         self._name = name
...
>>> p1 = Person('Alex')
>>> bool(p1)
    True       # A Person with a non-empty name is truthy.
>>> p2 = Person('')
>>> bool(p2)
    True       # A Person with an empty name is also truthy.

From our experimentation, it appears that objects of our classes are always truthy, but there's more to the story than meets the eye, though. Given what we know already about the Python data model, we can reasonably expect that one or more dunder methods will allow us to alter this outcome.
根据我们的实验,我们的类的对象似乎总是真实的,但事实并非如此简单。考虑到我们已经了解的 Python 数据模型,我们可以合理地期望一个或多个 dunder 方法将允许我们改变这种结果。

How lengths impact truthiness
长度如何影响真实性

We saw previously that we can give objects a length by writing a __len__ method in their class. We've also seen that empty strings and empty lists — whose lengths are zero — are considered to be falsy. What happens to objects of our classes when they have lengths?
我们之前看到,通过在它们的类中编写一个 __len__ 方法,我们可以为对象赋予长度。我们还看到,空字符串和空列表的长度为零,被认为是虚假的。当我们的类的对象具有长度时会发生什么?

>>> len(MyRange(10))
    10
>>> bool(MyRange(10))
    True            # A MyRange with a non-zero length is truthy.
>>> len(MyRange(5, 5))
    0
>>> bool(MyRange(5, 5))
    False           # A MyRange with a zero length is falsy.

For objects that are sized (i.e., those that implement a __len__ method), their lengths can be used to determine truthiness. If calculating lengths is inexpensive, and if we're happy with that behavior — which is in line with objects that are built into Python, so we'd need a good reason to feel otherwise about it — then we're done. (This is one reason why implementing our methods efficiently is so important; it has a compounding benefit, since one method can often form the basis of others, as well, so that one fast operation becomes many fast operations.)
对于具有大小的对象(即实现 __len__ 方法的对象),它们的长度可以用来确定真实性。如果计算长度不费力,并且我们对这种行为感到满意——这符合 Python 内置对象的行为,因此我们需要一个充分的理由来改变这种看法——那么我们就完成了。(这就是为什么高效实现我们的方法如此重要的原因之一;它具有复利效益,因为一个方法通常也可以成为其他方法的基础,这样一个快速操作就变成了许多快速操作。)

Still, not all objects are sized, but we might nonetheless want to control their truthiness. Or, we might be able to implement a way to determine truthiness that's cheaper than we're able to calculate a length. What do we do then?
然而,并非所有对象都有大小,但我们可能仍然希望控制它们的真实性。或者,我们可能能够实现一种比计算长度更便宜的确定真实性的方法。那么我们该怎么办呢?

Directly overriding truthiness
直接覆盖真值

Adding a __bool__(self) method to a class directly overrides how its truthiness is determined, independent of whether it has a length. This means that determining the truthiness of an object is really a process that has as many as three steps.
向类添加一个 __bool__(self) 方法会直接覆盖其真值的确定方式,而不受其长度的影响。这意味着确定对象的真值实际上是一个包含多达三个步骤的过程。

This explains why objects of our previous Person class were always truthy: In the absence of __bool__ or __len__ methods in a class, this is Python's default. So, if we want to override that default, we'll need at least one of those methods.
这解释了为什么我们之前的 Person 类的对象始终为真:在类中没有 __bool____len__ 方法的情况下,这是 Python 的默认行为。因此,如果我们想要覆盖该默认行为,我们至少需要其中一个方法。

>>> class Person:
...     def __init__(self, name):
...         self._name = name
...     def __bool__(self):
...         return self._name == 'Boo'
...
>>> p1 = Person('Boo')
>>> bool(p1)
    True          # Boo is truthy
>>> p2 = Person('Alex')
>>> bool(p2)
    False         # Everyone else is falsy

This is an aspect of Python's data model that we'll see play out repeatedly. It's often the case that providing one operation (in this case, a length) will automatically supply a default behavior for others (in this case, truthiness), though we can do something other than that default when it's appropriate from the perspectives of correctness or performance. This makes the common situations easier to implement, while still allowing us to implement things more carefully when we need to.
这是 Python 数据模型的一个方面,我们会反复看到这种情况。通常情况下,提供一个操作(在这种情况下是长度)会自动为其他操作(在这种情况下是真实性)提供默认行为,尽管在正确性或性能的角度来看,当适当时我们可以做一些与默认行为不同的事情。这使得常见情况更容易实现,同时仍然允许我们在需要时更加谨慎地实现事物。


Indexing 索引

Some kinds of objects in Python can be indexed, which generally means that we can think of them as containing other objects, but that they give us a way to uniquely identify each of those objects so that we can ask for them individually and know definitively which one we'll get back.
在 Python 中,一些类型的对象可以被索引,这通常意味着我们可以将它们视为包含其他对象,但它们为我们提供了一种独特标识每个对象的方式,以便我们可以单独请求它们并确切地知道我们将得到哪一个。

The simplest example of indexing is asking a list for one of its elements given an index. Since lists are designed around an indexing scheme where the first element has the index 0, the second element has the index 1, and so on, then when we ask for the element at a particular index, it's clear which one we're asking for. Strings and ranges have that same design characteristic, so they can be indexed similarly.
索引的最简单示例是根据索引向列表请求其元素之一。由于列表是围绕索引方案设计的,其中第一个元素的索引为 0,第二个元素的索引为 1,依此类推,因此当我们请求特定索引处的元素时,很明确我们要请求哪一个。字符串和范围具有相同的设计特征,因此它们可以类似地进行索引。

>>> values = [1, 3, 5, 7, 9]
>>> values[4]
    9   # ^^^ 4 is the index, in this case, so we want the fifth element.
>>> range(1, 100, 4)[3]
    13            # ^^^ Here, we want the fourth value in the range.
>>> 'Boo is happy'[0]
    'B'         # ^^^ We're looking for a string containing the first character of 'Boo is happy'.

Dictionaries can also be indexed, albeit in a somewhat different way. A dictionary contains unique keys, with a value associated with each of them. So, when you index a dictionary, you're asking a different question: What is the value associated with this key? Still, the syntax is the same, and the underlying idea is, too: Give me the value that's uniquely identified by this index (where, for a dictionary, those indices are really its keys).
字典也可以被索引,尽管方式略有不同。字典包含唯一的键,每个键都关联着一个值。因此,当你索引一个字典时,你在问一个不同的问题:这个键关联的值是什么?尽管如此,语法是相同的,底层思想也是一样的:给我这个索引唯一标识的值(对于字典来说,这些索引实际上就是它的键)。

>>> d = {'A': 27, 'B': 17, 'C': 0}
>>> d['B']
    17

For some kinds of objects that allow indexing — though not all kinds — we can also assign into those indexes. Again, the syntax is the same for all such indexed objects, and the underlying idea is also the same, though the implementation details differ from one type of object to another.
对于一些允许索引的对象——尽管不是所有类型的对象都可以——我们也可以分配给这些索引。同样,对于所有这些带索引对象,语法是相同的,底层思想也是相同的,尽管实现细节因对象类型不同而有所不同。

>>> values[3] = 13
>>> values
    [1, 3, 5, 13, 9]             # One object in the list has been replaced.
>>> d['B'] = 1
>>> d
    {'A': 27, 'B': 1, 'C': 0}    # The value associated with a key has been replaced.
>>> range(1, 100, 4)[3] = 10
    Traceback (most recent call last):
      ...
    TypeError: 'range' object does not support item assignment
                                 # Ranges are immutable, so we can't assign into them.

Those objects that allow assignment into indexes usually also allow deletion of an index, using the del statement.
那些允许分配到索引的对象通常也允许使用 del 语句删除索引。

>>> del values[3]
>>> values
    [1, 3, 5, 9]
>>> del d['A']
>>> d
    {'B': 1, 'C': 0}

That many kinds of objects support the same syntax with potentially different implementation details suggests again that dunder methods are being called behind the scenes here.
许多种类的对象支持相同的语法,可能具有不同的实现细节,这表明双下划线方法在幕后被调用。

Dunder methods for implementing indexing
实现索引的 Dunder 方法

When we want objects to support indexing, we add at least one dunder method to their class.
当我们希望对象支持索引时,我们至少要向它们的类添加一个双下划线方法。

Note that the word "index" does not necessarily mean a non-negative integer, or even an integer at all. It's up to the __getitem__ method to decide what constitutes a valid index and what an index means. (This is what makes it possible to index lists with integers, while being able to index dictionaries with arbitrary hashable keys. Their __getitem__ methods are written differently.)
请注意,“索引”一词不一定意味着非负整数,甚至根本不是整数。决定什么构成有效索引以及索引意味着什么,这取决于 __getitem__ 方法。(这就是为什么可以使用整数索引列表,同时能够使用任意可散列键索引字典。它们的 __getitem__ 方法编写方式不同。)

If we want to support assigning into an index and deletion of an index, there are additional dunder methods we can add alongside __getitem__.
如果我们想支持分配到索引和删除索引,我们可以在 __getitem__ 旁边添加额外的 dunder 方法。

Indexing is one feature that Python's built-in range provides that our MyRange class doesn't. Rectifying that would be a matter of adding a __getitem__ method to our MyRange class. (Since ranges are immutable, we wouldn't want to add __setitem__ or __delitem__.) Like our __len__ method, __getitem__ can calculate its answer in O(1) time using O(1) memory, so it would be best to do so.
索引是 Python 内置的 range 提供的一个功能,而我们的 MyRange 类没有。要纠正这一点,只需向我们的 MyRange 类添加一个 __getitem__ 方法即可。(由于范围是不可变的,我们不希望添加 __setitem____delitem__ 。)与我们的 __len__ 方法一样, __getitem__ 可以在 O(1)时间内使用 O(1)内存计算其答案,因此最好这样做。

class MyRange:
    ...

    def __getitem__(self, index):
        if type(index) is not int:
            raise TypeError(f'MyRange index must be int, but was {type(index).__name__}')
        elif index < 0 or index >= len(self):
            raise IndexError('MyRange index was out of range')

        return self._start + index * self._step

Since __getitem__ accepts a parameter other than self, but needs to perform calculations based on that parameter's value, some validation was necessary, so that non-integer indices and out-of-range indices would raise exceptions with descriptive error messages instead of returning invalid answers.
由于 __getitem__ 接受除 self 之外的参数,但需要根据该参数的值执行计算,因此需要进行一些验证,以便非整数索引和超出范围的索引会引发异常,并显示描述性错误消息,而不是返回无效答案。

How the presence of indexing impacts other operations
索引的存在如何影响其他操作

When a class has both a __len__ method and a __getitem__ method that accepts non-negative indices, an interesting thing happens: Even without an __iter__ method, its objects become iterable automatically. This is because __len__ and __getitem__ combine together into something called the sequence protocol, which means that objects supporting that combination of methods are what we call sequences.
当一个类同时具有 __len__ 方法和 __getitem__ 方法来接受非负索引时,一个有趣的事情发生了:即使没有 __iter__ 方法,它的对象也会自动变成可迭代的。这是因为 __len____getitem__ 结合在一起形成了一种称为序列协议的东西,这意味着支持这种方法组合的对象被我们称为序列。

If we know that an object is a sequence, we know that it can be iterated without an __iter__ method, via calls to __getitem__ and __len__. To understand why, let's briefly experiment with a class that includes these methods.
如果我们知道一个对象是一个序列,我们知道它可以在没有 __iter__ 方法的情况下被迭代,通过调用 __getitem____len__ 。为了理解这一点,让我们简要地尝试一个包含这些方法的类。

>>> class ThreeSequence:
...     def __len__(self):
...         return 3
...     def __getitem__(self, index):
...         if 0 <= index < len(self):
...             return index * 3
...         else:
...             raise IndexError
...
>>> s = ThreeSequence()
>>> s[0]
    0        # If s can be indexed with integers, isn't 0 the first index?
>>> s[1]
    3        # In that case, isn't 1 the second index?
>>> s[2]
    6        # And isn't 2 the third?
>>> len(s)
    3        # Doesn't this tell us that s[3] would fail if we tried it?
>>> index = 0
>>> while index < len(s):
...     print(s[index])
...     index += 1
...          # Therefore, isn't this a reliable pattern for iterating such a sequence?
    0
    3
    6

So, as it turns out, when we iterate an object, there's a bit more to the story than we've seen.
所以,事实证明,当我们迭代一个对象时,故事比我们看到的要复杂一些。

In fact, iteration works in the presence of a __getitem__ method that accepts non-negative indexes, even in the absence of a __len__ method, in which case successively larger indexes are passed to __getitem__ until it raises an IndexError, at which point the iteration is considered to have ended.
实际上,在存在接受非负索引的 __getitem__ 方法的情况下,即使没有 __len__ 方法,也可以进行迭代,此时会将逐渐增大的索引传递给 __getitem__ 直到引发 IndexError 为止,此时迭代被视为已结束。

However, the __len__ method is useful in concert with __getitem__ for another reason: It also provides the automatic ability to iterate an object in reverse, since something akin to the following while loop can be used instead.
然而, __len__ 方法与 __getitem__ 结合使用还有另一个原因:它还提供了自动反向迭代对象的能力,因为可以使用类似以下 while 循环。

>>> index = len(s)
>>> while index > 0:
...     index -= 1
...     print(s[index])
...          # This is a reliable pattern for iterating a sequence in reverse, if we know its length.
...          # Without knowing the length, Python couldn't efficiently know where to start.
    6
    3
    0

Additionally, objects implementing indexing (with or without a __len__ method) have another similar automatically implemented behavior.
此外,实现索引(无论是否具有 __len__ 方法)的对象具有另一种类似的自动实现行为。

>>> [i in s for i in range(8)]
    [True, False, False, True, False, False, True, False]
                   # The 'in' operator can be used to see if they contain a value,
                   # though this will be done using iteration, which will take linear time
                   # for each use of the 'in' operator.

When we write a class that implements a sequence, we'll quite often want to provide our own implementations of these three features — iteration, reverse iteration, and "contains" — especially if we can do so more performantly than the default. If so, we could add these three dunder methods to a class.
当我们编写一个实现序列的类时,我们经常会想要提供这三个功能的自定义实现 — 迭代、反向迭代和“包含” — 尤其是如果我们可以比默认实现更高效地实现这些功能。如果可以的话,我们可以将这三个特殊方法添加到一个类中。

MyRange would benefit from an implementation of __contains__, for example, since its result could then be determined in constant time using some straightforward arithmetic, rather than iterating every value in a potentially large range. There's no reason it should cost more to evaluate 100000 in MyRange(1000000) than it does to evaluate 0 in MyRange(1), but only a custom __contains__ method will make that possible. On the other hand, the automatic implementations of forward and reverse iteration arising from MyRange's indexing feature are probably fine.
例如, MyRange 可以受益于实现 __contains__ ,因为这样可以使用一些简单的算术在常数时间内确定其结果,而不是在潜在的大范围内迭代每个值。评估 100000 in MyRange(1000000) 的成本不应该比评估 0 in MyRange(1) 的成本更高,但只有自定义的 __contains__ 方法才能实现这一点。另一方面,由 MyRange 的索引特性产生的正向和反向迭代的自动实现可能是可以接受的。

So, it's worth knowing what features are provided automatically (and how they're provided automatically), because when these automatic implementations are performant enough for our needs, it means fewer features that we need to build, test, and maintain over time. Those positive decisions compound as programs and teams grow.
因此,了解自动提供的功能(以及它们如何自动提供)是值得的,因为当这些自动实现对我们的需求足够高效时,意味着我们需要构建、测试和维护的功能更少。这些积极的决策会随着程序和团队的增长而增加。


Slicing 切片

Indexing allows us to obtain a single object within another, such as one element of a list or the value associated with a key in a dictionary. A variant of indexing that we've not yet considered is what Python calls slicing, which allows us to take a sequence of objects and obtain a subsequence, containing some of the objects while skipping others. The slice will usually be the same type — so, for example, a slice of a list will be a list, a slice of a string will be a string, and so on.
索引允许我们在另一个对象中获取单个对象,例如列表中的一个元素或字典中与键关联的值。我们尚未考虑的索引变体是 Python 称之为切片的操作,它允许我们获取一系列对象并获得一个子序列,其中包含一些对象而跳过其他对象。切片通常将是相同类型的 — 例如,列表的切片将是一个列表,字符串的切片将是一个字符串,依此类推。

>>> values = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
>>> values[2:6]
    [5, 7, 9, 11]
>>> values[2:8:2]
    [5, 9, 13]
>>> values[:3]
    [1, 3, 5]
>>> values[7:]
    [15, 17, 19]
>>> 'Boo is happy today'[:3]
    'Boo'
>>> range(0, 20, 2)[3:7]
    range(6, 14, 2)
>>> 'Boo'[::]
    'Boo'

Syntactically, slicing looks a lot like indexing. If we start with an expression whose value can be sliced, we can follow that expression with brackets, in which we can write either two or three values separated by colons. Those three values, analogous to the values that describe a range, are named start, stop, and step. In the slice notation, all three values are optional.
从句法上看,切片看起来很像索引。如果我们从一个可以切片的表达式开始,我们可以在该表达式后面加上方括号,在方括号中我们可以写入两个或三个由冒号分隔的值。这三个值,类似于描述一个 range 的值,分别被命名为 startstopstep 。在切片表示法中,这三个值都是可选的。

This raises the question of what mechanism is used to implement slicing. We've seen previously that indexing is implemented via the __getitem__ dunder method, and we can see that slicing uses a similar bracket-surrounded notation, so is it safe for us to assume that __getitem__ does slicing, too? If so, how do we tell the difference between indexing and slicing? One way to find out is to experiment a bit.
这引发了一个问题,即用什么机制来实现切片。我们之前已经看到,索引是通过 __getitem__ dunder 方法实现的,我们可以看到切片使用了类似的方括号包围的表示法,那么我们可以安全地假设 __getitem__ 也实现了切片吗?如果是这样,我们如何区分索引和切片?找出答案的一种方法是进行一些实验。

>>> class Thing:
...     def __getitem__(self, index):
...         print(f'type(index) = {type(index)}')
...         print(f'index = {index}')
...         return None
...
>>> t = Thing()
>>> t[4]
    type(index) = <class 'int'>
    index = 4
>>> t[1:17:6]
    type(index) = <class 'slice'>
    index = slice(1, 17, 6)
>>> t[1:17]
    type(index) = <class 'slice'>
    index = slice(1, 17, None)
>>> t[:17]
    type(index) = <class 'slice'>
    index = slice(None, 17, None)
>>> t[::]
    type(index) = <class 'slice'>
    index = slice(None, None, None)

From this experimentation, we can deduce a few things:
通过这次实验,我们可以推断出一些事情:

So, if we want to implement slicing in a class, we'll need to add some functionality to our __getitem__ method to detect that its parameter is a slice and, if so, handle it specially. How do we interact with a slice object?
因此,如果我们想在类中实现切片,我们需要向我们的 __getitem__ 方法添加一些功能,以检测其参数是否为 slice ,如果是,则特殊处理。我们如何与 slice 对象交互?

>>> s = slice(1, 17, 6)
>>> s.start, s.stop, s.step
    (1, 17, 6)        # We can access its start, stop, and step attributes.
>>> s.step = 100
    Traceback (most recent call last):
      ...
    AttributeError: readonly attribute
                      # Like ranges, slices are immutable.
>>> start, stop, step = s.indices(10)
>>> start, stop, step
    (1, 10, 6)        # We can ask it what the applicable start, stop, and step
                      # values would be for a given length.  In this case, we've asked this
                      # for a length of 10, which is why the applicable stop is less than
                      # the original one.
>>> defaulted = slice(None, None, None)
>>> [type(x) for x in (defaulted.start, defaulted.stop, defaulted.step)]
    [<class 'NoneType'>, <class 'NoneType'>, <class 'NoneType'>]
                      # When a slice is constructed with Nones, they aren't defaulted
                      # to anything; they remain Nones.
>>> dstart, dstop, dstep = defaulted.indices(10)
>>> dstart, dstop, dstep
    (0, 10, 1)        # Even if the start, stop, and step are all None, the
                      # indices method returns integer results.
>>> [index for index in range(*defaulted.indices(10))]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
                      # If indices returns a tuple of three values, we can unpack it into
                      # three arguments (the start, stop, and step), pass them to range (which
                      # also understands the concept of a start, stop, and step), and we now have
                      # a range of the indices that make up our slice.

Now that we understand the building blocks available to us, we have an idea of how we might add slicing to our MyRange class, by reorganizing its __getitem__ method to allow the given index to either be an integer or a slice, then using the slice's indices method to help us figure out the appropriate result when given a slice.
现在我们了解了可用的构建块,我们可以想象如何将切片添加到我们的 MyRange 类中,通过重新组织其 __getitem__ 方法,使给定的索引可以是整数或切片,然后使用切片的 indices 方法来帮助我们在给定切片时找出适当的结果。

class MyRange:
    ...

    def __getitem__(self, index):
        if type(index) is int:
            if 0 <= index < len(self):
                return self._start + index * self._step
            else:
                raise IndexError('MyRange index was out of range')
        elif type(index) is slice:
            start, stop, step = index.indices(len(self))

            start_value = self._start + start * self._step
            stop_value = min(self._start + stop * self._step, self._stop)
            step_value = step * self._step

            return MyRange(start_value, stop_value, step_value)
        else:
            raise TypeError(f'MyRange index must be int or slice, but was {type(index).__name__}')

It's also possible to assign to a slice of an object, as well as delete a slice. Implementing support for those operations requires similar modifications to __setitem__ and __delitem__, whose index parameter will be a slice object in these situations.
也可以将对象的一部分分配给一个切片,以及删除一个切片。为了实现对这些操作的支持,需要对 __setitem____delitem__ 进行类似的修改,这些情况下的 index 参数将是一个 slice 对象。


Hashing 哈希

Python draws a distinction between the objects that are hashable and those that aren't. Conceptually, hashable objects have two qualities that others don't.
Python 区分了可哈希和不可哈希的对象。从概念上讲,可哈希的对象具有其他对象所没有的两个特性。

If we want to hash an object, we can call the built-in Python function hash and pass it the object as an argument. The algorithm used to hash an object is not particularly important to us, but you'll notice how differently objects can hash even when their values are fairly similar to each other; this turns out not to be an accident, for reasons you'll learn more about in ICS 46.
如果我们想对一个对象进行哈希,我们可以调用内置的 Python 函数 hash 并将对象作为参数传递给它。用于对对象进行哈希的算法对我们来说并不特别重要,但您会注意到即使它们的值相互之间相当相似,对象的哈希也可能会有很大不同;这并非偶然,您将在 ICS 46 中了解更多相关原因。

>>> hash(3)
    3
>>> hash('Boo')
    -6365711242479792522
>>> hash('Boo!')
    -6359222305862117936
>>> hash((1, 2))
    -3550055125485641917
>>> hash((1, 2, 3))
    529344067295497451

If there are objects that are unhashable, we wouldn't expect to be able to pass them to the hash function. Mutable objects generally won't be hashable, so we wouldn't expect to be able to hash a list. Let's try it.
如果有不可散列的对象,我们就不会期望能够将它们传递给 hash 函数。通常不可变对象不会是可散列的,因此我们不会期望对列表进行哈希。让我们试一试。

>>> hash([1, 2, 3])
    Traceback (most recent call last):
      ...
    TypeError: unhashable type: 'list'

How does the hash function know whether an object is hashable? As you likely expect, there is a dunder method called __hash__ that calculates an object's hash. Hashable objects are the ones that have a __hash__ method; unhashable objects are the ones that don't. The job of the __hash__ method is to combine the information in the object together into a single integer, taking all of that information into account, so that objects that are different in some way will be likely to hash differently. A simple but effective way to do that is to create a tuple containing all of the object's attributes, then pass those to the built-in hash function. This leads to a simple implementation of __hash__ for our MyRange class — whose objects are immutable, so we might reasonably expect to be able to hash them, as well.
hash 函数如何知道对象是否可散列?正如你可能期望的那样,有一个名为 __hash__ 的双下划线方法来计算对象的哈希值。可散列对象是那些具有 __hash__ 方法的对象;不可散列对象是那些没有该方法的对象。 __hash__ 方法的作用是将对象中的信息组合成一个整数,考虑所有这些信息,以便在某种程度上不同的对象很可能会有不同的哈希值。一个简单但有效的方法是创建一个包含对象所有属性的元组,然后将其传递给内置的 hash 函数。这导致了一个简单的 __hash__ 实现,适用于我们的 MyRange 类 — 其对象是不可变的,因此我们可能合理地期望能够对它们进行哈希处理。

class MyRange:
    ...

    def __hash__(self):
        return hash((self._start, self._stop, self._step))

Remember, though, that the reason we want to be able to hash objects is so we can store them in a hash table, which is to say that we want to be able to arrange them in a way that we can use their hashes to find them again easily. But hashes are not guaranteed to be unique; it's possible for two different objects to hash identically. So, just because we find an object that has a particular hash, we can't know whether it's the object we're looking for; we just know that it's an object that ended up in the same place. Because of that, when objects are hashable, there's one other important thing we'll need to be able to do with them: compare them to other objects to see if they're equivalent. To do that, we'll need to dig a little further into the Python data model.
请记住,我们希望能够对对象进行哈希的原因是为了能够将它们存储在哈希表中,换句话说,我们希望能够以一种方式排列它们,以便我们可以利用它们的哈希来轻松地再次找到它们。但是哈希值并不保证是唯一的;两个不同的对象可能具有相同的哈希值。因此,仅仅因为我们找到了一个具有特定哈希值的对象,并不能确定它是否是我们正在寻找的对象;我们只知道它是一个最终出现在同一位置的对象。因此,当对象是可哈希的时候,我们还需要能够执行另一项重要操作:将它们与其他对象进行比较,以查看它们是否等价。为了做到这一点,我们需要进一步深入了解 Python 数据模型。


Comparison operators 比较运算符

Python gives us the ability to compare objects in various ways, and its data model allows us to control how most of those comparisons are implemented when they involve objects of our classes. Before we can implement these kinds of comparisons, we ought to be sure we understand the kinds of comparisons that can be done, because there are some subtleties that we need to take into account. What should it mean for two objects to be "equal"? What should it mean for one object to be "less than" another?
Python 赋予我们以各种方式比较对象的能力,其数据模型允许我们控制大多数涉及我们类对象的比较是如何实现的。在我们能够实现这些类型的比较之前,我们应该确保我们理解可以进行的比较类型,因为有一些微妙之处需要考虑。两个对象“相等”意味着什么?一个对象“小于”另一个意味着什么?

Identity and equivalence
身份和等价性

First, let's be sure we understand Python's idea of equality. When we compare two objects and ask "Are these the same?", what are we actually asking? Is there always one question we're trying to answer, or are there different ones?
首先,让我们确保我们理解 Python 对相等的理念。当我们比较两个对象并问“这些是相同的吗?”时,我们实际上在问什么?我们总是试图回答一个问题,还是有不同的问题?

Like many programming languages, Python's design distinguishes between two ideas of equality: identity and equivalence. Because we might be interested in knowing either of these things, a separate syntax exists for each of them.
像许多编程语言一样,Python 的设计区分了两种相等的概念:标识和等价。因为我们可能对这两者中的任何一个感兴趣,所以为它们各自存在着不同的语法。

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> id(a), id(b)
    (1994268245696, 1994268381632)
                       # The id function returns an object's identity, which is
                       # unique to each object, even objects that have identical meaning.
>>> a is b
    False              # The is operator returns True only when two objects have
                       # the same identity, regardless of whether they have identical meaning.
>>> a == b
    True               # The == operator returns True when two objects have
                       # an equivalent meaning, even if they have different identities.

Neither of these operators is definitively better than the other; they're simply meant to solve different problems, so the key is knowing what problem you have, which will allow you to make the right choice for your needs.
这两个运算符中没有一个明显优于另一个;它们只是用来解决不同的问题,所以关键是了解您遇到的问题,这将帮助您为您的需求做出正确的选择。

Python provides no way to override the built-in meaning of the is operator. The only circumstance in which a is b is True is when both a and b have the same identity. That's all there is to it, and all there can be: Either they're the same object or they aren't.
Python 没有提供覆盖 is 运算符内置含义的方法。唯一的情况是当 ab 具有相同的标识时, a is b 才是 True 。就是这样,也只能这样:要么它们是同一个对象,要么它们不是。

Equivalence, on the other hand, is something that would naturally need to be implemented differently in different classes; after all, what it means for two integers to be equivalent is very different from what it means for two lists of strings to be equivalent. Consequently, the Python data model provides a mechanism for us to specify what it means for two objects of our classes to be equivalent. Before we get to that, though, let's finish our conversation about comparisons in Python.
相等性,另一方面,在不同的类中自然需要以不同的方式实现;毕竟,对于两个整数相等意味着什么,与对于两个字符串列表相等意味着什么是非常不同的。因此,Python 数据模型为我们提供了一种机制,让我们指定我们的类的两个对象相等意味着什么。不过,在我们讨论 Python 中的比较之前,让我们先完成关于比较的对话。

Relational comparisons 关系比较

A common feature of programming languages allows us to compare two objects relationally, which means that we want to compare them on the basis of their natural ordering, so we can determine which (if any) is smaller than the other. Integers, for example, have a natural ordering that most of us learned when we were quite young: 2 is greater than 1, 5 is greater than 1, 6 is less than 17, and so on. So, it's unsurprising to most novice Python programmers to discover that there are operators that perform those kinds of comparisons.
编程语言的一个常见特性允许我们在关系上比较两个对象,这意味着我们希望根据它们的自然顺序来比较它们,以便确定哪个(如果有的话)比另一个小。例如,整数有一个自然顺序,大多数人在很小的时候就学会了:2 大于 1,5 大于 1,6 小于 17,依此类推。因此,对于大多数初学者来说,发现有执行这些比较的运算符并不奇怪。

>>> 2 > 1
    True
>>> 17 < 6
    False

For some types, their natural ordering is obvious enough that it hardly needs to be explained to us. For other types, there could potentially be more than one reasonable way to order them. For example, what rule causes the following behavior?
对于某些类型,它们的自然顺序是显而易见的,几乎不需要向我们解释。对于其他类型,可能有多种合理的排序方式。例如,是什么规则导致了以下行为?

>>> [1, 2] < [1, 3]
    True
>>> [2, 3] < [1, 2]
    False
>>> [1, 2] < [1, 2, 3]
    True

The answer is that this is a well-known technique called lexicographical ordering, which is a fancy term for a simple idea:
答案是这是一种称为词典排序的众所周知的技术,这是一个简单概念的花哨术语:

(Note that this is the same algorithm we use to sort English words into alphabetical order, comparing one letter at a time until we find a difference, or until one word turns out to be a prefix of the other.)
(请注意,这是我们用来将英语单词按字母顺序排序的相同算法,逐个字母比较直到找到差异,或者直到一个单词被证明是另一个单词的前缀。)

For still other types — most types in a large program fit into this category — there's no natural way to order them at all; they simply don't have a notion of "less than" or "greater than" associated with them, so a sensible design would render such a comparison impossible altogether. (A good software design is as much about disallowing unreasonable things as it is about allowing reasonable ones.)
对于另一些类型——大型程序中的大多数类型都属于这一类——根本没有自然的排序方式;它们根本没有“小于”或“大于”的概念与之相关联,因此一个明智的设计会使这种比较完全不可能。(一个好的软件设计既是关于不允许不合理的事情,也是关于允许合理的事情。)

Since different types of objects will need to handle relational comparisons differently, Python's data model provides hooks for us to control how they behave, too. Now that we understand the problem we're solving in enough detail, all that remains are the implementation details. Bring on the dunder methods!
由于不同类型的对象需要以不同方式处理关系比较,Python 的数据模型为我们提供了控制它们行为的钩子。现在我们已经足够详细地理解了我们要解决的问题,剩下的只是实现细节。来吧,双下划线方法!

Implementing equality comparisons
实现相等比较

If we want to provide a custom implementation of equivalence for the objects of a class, we can do so by adding an __eq__ method to its class.
如果我们想为类的对象提供自定义等价性实现,可以通过向其类添加一个 __eq__ 方法来实现。

(NotImplemented — like True, False, and None — is a constant value in Python, whose type is NotImplementedType.)
NotImplemented — 如 TrueFalseNone — 是 Python 中的常量值,其类型为 NotImplementedType 。)

Notably, if we don't write an __eq__ method, they still provide an implementation of equality automatically, but it's based only on identity (i.e., two objects are equal if and only if they have the same identity). So, if we want anything other than that, we'll need to implement an __eq__ method.
值得注意的是,如果我们不编写 __eq__ 方法,它们仍然会自动提供相等性的实现,但仅基于标识(即,仅当两个对象具有相同标识时它们才相等)。因此,如果我们想要其他内容,我们需要实现 __eq__ 方法。

In our MyRange class, we might implement it by checking that the other object is also a MyRange, and that their _start, _stop, and _step attributes are equivalent.
在我们的 MyRange 类中,我们可以通过检查其他对象也是 MyRange ,并且它们的 _start_stop_step 属性是等价的来实现它。

class MyRange:
    ...

    def __eq__(self, other):
        if type(other) is MyRange:
            return self._start == other._start \
                   and self._stop == other._stop \
                   and self._step == other._step
        else:
            return NotImplemented

Let's try our implementation out.
让我们尝试一下我们的实现。

>>> r1 = MyRange(0, 10)
>>> r2 = MyRange(0, 10)
>>> r3 = MyRange(0, 10, 2)
                    # r1 and r2 are equivalent but have different identities.
                    # r3 is not equivalent to either r1 or r2.
>>> r1 is r2
    False           # As expected, r1 and r2 have different identities.
>>> r1 == r2
    True            # As expected, r1 and r2 are equivalent, by our new definition.
>>> r1 == r3
    False           # r1 and r3 are not.
>>> r1 != r2, r1 != r3
    (False, True)   # Interestingly, the != operator seems to be working, as well.

The last of these expressions points us to something interesting: If we implement equality, we get an implementation of inequality automatically. Thinking about it conceptually, this seems sensible enough; if we've specified the conditions under which two objects are equivalent, then two objects are inequivalent under the opposite conditions. So, Python could reasonably implement r1 != r2 as not r1.__eq__(r2).
这些表达式中的最后一个指向了一个有趣的事实:如果我们实现了相等性,那么我们会自动得到不相等性的实现。从概念上考虑,这似乎是合理的;如果我们已经指定了两个对象等价的条件,那么在相反的条件下,这两个对象就是不等价的。因此,Python 可以合理地将 r1 != r2 实现为 not r1.__eq__(r2)

Still, as usual, Python provides a way for us to specify inequality, in case we could do it ourselves more performantly.
然而,像往常一样,Python 为我们提供了一种指定不等式的方式,以防我们自己能够更高效地完成。

In practice, it will rarely if ever be the case that we'd want it to return a different result from not r1.__eq__(r2), but we might sometimes be able to find a more efficient way to calculate that result than checking for equivalence and then negating it.
在实践中,我们很少或几乎不会希望它返回与 not r1.__eq__(r2) 不同的结果,但有时我们可能会找到一种比检查等价性然后取反更有效的计算结果的方法。

Strangely, adding an __ne__ method to a class without an __eq__ method does not automatically provide a meaning of equality. In other words, even though we know that r1.__eq__(r2) is the same as not r1.__ne__(r2), Python does not perform this conversion automatically. So, generally, we can think of __eq__ as necessary when customizing equivalence and __ne__ only as a potential optimization.
奇怪的是,向一个没有 __eq__ 方法的类添加一个 __ne__ 方法并不会自动提供相等的含义。换句话说,即使我们知道 r1.__eq__(r2)not r1.__ne__(r2) 相同,Python 也不会自动执行此转换。因此,通常情况下,我们可以认为自定义等价性时 __eq__ 是必要的,而 __ne__ 只是一种潜在的优化。

Clarifying the relationship between equality and hashing
阐明平等和哈希之间的关系

There's one other thing worth noting about equality: For reasons we discussed in our conversation about hashing, classes that provide a __hash__ method must also provide an __eq__ method, because there's a vitally important that must be maintained between the meanings of equivalence and hashing.
关于平等还有一件值得注意的事情:正如我们在关于哈希的对话中讨论的那样,提供 __hash__ 方法的类也必须提供 __eq__ 方法,因为在等价和哈希的含义之间必须保持一个至关重要的关系。

That implication only applies in one direction: Two objects with the same hash will not necessarily be equivalent, even though we'd prefer that they do. The reason we can't make this happen is a practical one: Part of what we do when we hash an object is simplify it to a value that's lower-fidelity; some information will almost surely be lost, which means inequivalent objects will unavoidably have the same hash sometimes. While we'd like to make that as unlikely as possible, we can't avoid it in general.
这种暗示只适用于一个方向:具有相同哈希的两个对象不一定是等价的,尽管我们希望它们是。我们无法实现这一点的原因是实际的:当我们对一个对象进行哈希时,我们所做的部分工作是将其简化为一个更低保真度的值;一些信息几乎肯定会丢失,这意味着不等价的对象有时会不可避免地具有相同的哈希。虽然我们希望尽可能降低这种情况发生的可能性,但我们无法在一般情况下避免它。

Still, because of that implication, it's quite often the case that taking control over hashing or equality implies that we should take control of the other, as well, because their meanings are necessarily intertwined. Fortunately, there are some additional mechanics that help us to avoid writing classes that accidentally fail to meet these constraints, particularly because objects support both hashability and identity-based equality by default.
然而,由于这种暗示,通常情况下控制哈希或相等性意味着我们应该同时控制另一个,因为它们的含义必然是相互交织在一起的。幸运的是,有一些额外的机制帮助我们避免编写意外未能满足这些约束的类,特别是因为对象默认支持哈希性和基于标识的相等性。

>>> class Thing:
...     pass
...
>>> hash(Thing())
    107648736801    # Objects are hashable by default
>>> Thing() == Thing()
    False           # Objects support equality (albeit using only their
                    # identities) by default.
>>> t1 = Thing()
>>> t2 = Thing()
>>> t1 == t2
                    # Since t1 and t2 refer to separate objects (with separate
                    # identities), they are not considered equivalent.
>>> hash(t1) == hash(t2)
    False           # Inequivalent objects can have different hashes, which
                    # makes this answer allowable -- though True would be
                    # allowable, too.
>>> t1 == t1
    True
>>> hash(t1) == hash(t1)
    True            # Equivalent objects must have the same hash, which
                    # is correct by default.

The provided defaults meet the basic requirement, but this raises the question of what happens if we implement our own custom behavior for one of these operations without implementing the other. To determine the answer to that question, we'll need to think about both operations separately.
提供的默认值满足基本要求,但这引发了一个问题,即如果我们为其中一个操作实现自定义行为而没有实现另一个操作会发生什么。为了确定这个问题的答案,我们需要分别考虑这两个操作。

What happens if we implement hashing without equality?
如果我们实现哈希而没有实现相等会发生什么?

>>> class HashableThing:
...     def __hash__(self):
...         return 999
...
>>> hash(HashableThing())
    999
>>> h1 = HashableThing()
>>> h2 = HashableThing()
>>> h1 == h2, hash(h1) == hash(h2)
    (False, True)   # Inequivalent objects are allowed to have the same hash.
>>> h1 == h1, hash(h1) == hash(h1)
    (True, True)    # Equivalent objects must have the same hash.

Essentially, nothing can go wrong here, as long as our __hash__ method is written in a way that uses only what's stored within the object, since the default behavior of equality is to use identity (i.e., no two different objects are equal). What we're trying to avoid is equivalent objects having different hashes; that can't happen when we implement __hash__, as long as we implement it properly.
本质上,只要我们的 __hash__ 方法是以只使用对象内部存储的内容编写的,这里不会出现任何问题,因为相等性的默认行为是使用标识(即,没有两个不同的对象是相等的)。我们要避免的是具有不同哈希的等效对象;只要我们正确实现 __hash__ ,这种情况就不会发生。

On the other hand, what happens if we implement equality without hashing?
另一方面,如果我们在不使用哈希的情况下实现相等性,会发生什么?

>>> class UnhashableThing:
...     def __init__(self, value):
...         self.value = value
...     def __eq__(self, other):
...         return isinstance(other, UnhashableThing) and self.value == other.value
...
>>> UnhashableThing(13) == UnhashableThing(13)
    True     # Objects storing the same values are equivalent.
>>> UnhashableThing(11) == UnhashableThing(7)
    False    # Objects storing different values are inequivalent.
             # But without having implemented custom hashing behavior,
             # how can we be sure their hashes will be different?
>>> hash(UnhashableThing(11)) == hash(UnhashableThing(7))
    Traceback (most recent call last):
      ...
    TypeError: unhashable type: 'UnhashableThing'
             # Oh!  UnhashableThings aren't hashable at all!
>>> UnhashableThing.__hash__ is None
    True     # It's because UnhashableThing has no __hash__ method.

As a safety mechanism, when we write an __eq__ method in a class without a __hash__ method being written in that same class, Python automatically sets the value of __hash__ in the class dictionary to None, specifically to avoid the problem we otherwise would have created: Specifying a way for two objects to be equivalent without having ensured that their hashes would be the same.
作为一种安全机制,当我们在一个类中编写一个没有相应的 __hash__ 方法的 __eq__ 方法时,Python 会自动将类字典中的 __hash__ 的值设置为 None ,以避免我们可能会创建的问题:指定两个对象等效但未确保它们的哈希值相同的方式。

Of course, this mechanism isn't fool-proof, since nothing prevents us from writing __eq__ and __hash__ methods that don't fit together properly, but there is at least a sensible default at work here, which prevents us from accidentally introducing a problem we hadn't considered thoroughly.
当然,这种机制并非百分之百可靠,因为没有任何东西阻止我们编写 __eq____hash__ 方法,而这两者并不完全匹配,但至少在这里有一个合理的默认设置,可以防止我们意外引入未经彻底考虑的问题。

Implementing relational comparisons
实现关系比较

Relational comparisons can also be customized in Python, and dunder methods will again be our tool of choice when performing that customization. But it's wise for us to begin by understanding the default; what happens if we don't customize relational comparisons in a class? If we don't specify the details, under what circumstances are objects "less than" other objects?
在 Python 中,关系比较也可以定制化,当进行这种定制时,dunder 方法将再次成为我们的选择工具。但是,最好先了解默认情况;如果我们不在类中定制关系比较,会发生什么?如果我们不指定细节,对象在什么情况下会“小于”其他对象?

>>> class Thing:
...     pass
...
>>> t1 = Thing()
>>> t2 = Thing()
>>> t1 < t2
    Traceback (most recent call last):
      ...
    TypeError: '<' not supported between instances of 'Thing' and 'Thing'

Unlike equality comparisons, for which there is a default behavior if you don't specify one, relational comparisons have no default. If a class doesn't describe how its objects are to be compared relationally, they can't be. So, if we want relational comparisons for objects of our classes, we'll need to implement them ourselves. There are four kinds of relational comparisons (<, <=, >, and >=), so we can do this using (up to) four dunder methods that provide their implementations.
与相等比较不同,如果您不指定相等比较的行为,将会有默认行为,但是关系比较没有默认行为。如果一个类没有描述其对象如何在关系上进行比较,那么它们就无法进行比较。因此,如果我们希望对我们类的对象进行关系比较,我们需要自己实现它们。有四种关系比较(<、<=、> 和 >=),因此我们可以使用(最多)四个 dunder 方法来提供它们的实现。

As with equality, you don't need to implement all four methods to be able to perform all four comparisons, since there are known relationships between their results.
与相等性一样,您不需要实现所有四种方法才能执行所有四种比较,因为它们的结果之间存在已知的关系。

Consequently, an implementation of either __lt__ or __gt__ can be used for both the < and > operators, and an implementation of either __le__ or __ge__ can be used for both the <= and >= operators.
因此,可以使用 __lt____gt__ 的实现来同时用于<和>运算符,可以使用 __le____ge__ 的实现来同时用于<=和>=运算符。

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __lt__(self, other):
...         print(f'__lt__: self.value = {self.value}, other.value = {other.value}')
...         return self.value < other.value
...
>>> t1 = Thing(3)
>>> t2 = Thing(9)
>>> t1 < t2
    __lt__: self.value = 3, other.value = 9
    True              # Here, __lt__ was called on t1, with t2 being other.
>>> t1 > t2
    __lt__: self.value = 9, other.value = 3
    False             # Here, __lt__ was called on t2, with t1 being other.

Even though the presence of both __eq__ and __lt__ could theoretically be enough to implement a <= operator, Python doesn't implement that conversion for us. So, if we want relational comparisons, we probably want at least these three dunder methods: __eq__, __lt__, and __le__, with the other three (__ne__, __gt__, and __ge__) providing a mechanism for optimization if we need it.
即使理论上存在 __eq____lt__ ,可能足以实现<=运算符,但 Python 不会为我们实现该转换。因此,如果我们想要关系比较,我们可能至少需要这三个特殊方法: __eq____lt____le__ ,另外三个( __ne____gt____ge__ )提供了一种优化机制,如果需要的话。

Mixed-type comparisons 混合类型比较

Comparisons can be done between objects of different classes, which may then have different implementations of these same dunder methods. If we evaluate x == y, but x and y have different types, then whose __eq__ method is called? One way to find out is to experiment.
可以在不同类的对象之间进行比较,这些对象可能有不同的这些相同特殊方法的实现。如果我们评估 x == y ,但 xy 有不同的类型,那么调用谁的 __eq__ 方法?找出的一种方法是进行实验。

>>> class A:
...     def __eq__(self, other):
...         print('A.__eq__')
...         return True
...
>>> class B:
...     def __eq__(self, other):
...         print('B.__eq__')
...         return True
...
>>> a1 = A()
>>> b1 = B()
>>> a1 == b1
    A.__eq__
    True
>>> b1 == a1
    B.__eq__
    True

From our experimentation, it appears that the rule is simple: When we write x == y, x.__eq__(y) is called. But what happens if x's class has no __eq__ method, but y's does?
从我们的实验中,看起来规则很简单:当我们写 x == y 时,会调用 x.__eq__(y) 。但如果 x 的类没有 __eq__ 方法,而 y 的类有呢?

>>> class C:
...     pass
...
>>> c1 = C()
>>> c1 == b1
    B.__eq__
    True

In that case, Python is instead calling b1.__eq__(c1) instead, so that we can get a better answer than we might get if we relied instead on C's default equality (which is based only on identity).
在这种情况下,Python 实际上是调用 b1.__eq__(c1) ,这样我们可以得到比依赖于 C 的默认相等性(仅基于标识)得到的更好的答案。

It turns out that there's a little more to the underlying rule than this — we can flesh this out a bit further when we talk about inheritance and subclassing — but this is a good enough understanding for us to proceed with for now: The __eq__ method of the left-hand object is used, unless only the right-hand object has an __eq__ method.
原来,底层规则比这个稍微复杂一点 — 当我们谈论继承和子类化时,我们可以进一步阐述这一点 — 但这对我们现在继续进行是足够好的理解:左侧对象的 __eq__ 方法被使用,除非只有右侧对象有 __eq__ 方法。


Arithmetic operators 算术运算符

Another set of operators whose meaning we might like to redefine for our classes is the set of arithmetic operators. Consider the kinds of operators we can apply to integers in Python; below are many (but not all) of them.
另一组我们可能希望重新定义类的意义的运算符是算术运算符组。考虑在 Python 中可以应用于整数的运算符类型;以下是许多(但不是全部)运算符。

>>> i = 40
>>> j = 16
>>> +i
    40             # The unary plus operator returns its operand unchanged.
>>> -i
    -40            # The unary minus operator returns the negation of its operand.
>>> (i + j, i - j, i * j)
    (56, 24, 640)  # We can add, subtract, and multiply two integers.
>>> (i / j, i // j, i % j)
    (2.5, 2, 8)    # There are two kinds of division, along with modulo.
>>> i ** 2
    1600           # We can exponentiate integers.

Once again, dunder methods form the basis of how these operators are actually implemented, so if we want to define the meaning of these operators when used on objects of our own classes, we can do so by implementing the appropriate dunder methods. They follow a few recurring patterns, so we'll stick with a couple of examples; you can read about the rest of them in the Python Data Model documentation when you need them.
再次,dunder 方法构成了这些运算符实际实现的基础,因此,如果我们想要定义这些运算符在我们自己类的对象上使用时的含义,我们可以通过实现适当的 dunder 方法来实现。它们遵循一些重复的模式,因此我们将坚持使用一些示例;当您需要时,您可以在 Python 数据模型文档中阅读其余部分。

The unary operators + and - have a single operand, so that suggests that they would best be implemented by dunder methods that accept a self parameter and no others.
一元运算符 +- 只有一个操作数,这表明最好通过接受一个 self 参数而不是其他参数的 dunder 方法来实现它们。

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __repr__(self):
...         return f'Thing({self.value})'
...     def __neg__(self):
...         return Thing(-self.value)
...
>>> t1 = Thing(17)
>>> -t1
    Thing(-17)     # Our result is negated.
>>> t1
    Thing(17)      # t1 is unchanged, as it should be.

That last expression points out something important: We don't expect the arithmetic operators to modify their operands, so we need to be sure to return new values instead of modifying the existing ones. This aligns the behavior of our classes with the types that are built into Python.
最后一个表达式指出了一些重要的事情:我们不希望算术运算符修改它们的操作数,因此我们需要确保返回新值而不是修改现有的值。这样可以使我们的类的行为与内置到 Python 中的类型保持一致。

The various binary operators have two operands, so they might instead be implemented as dunder methods taking two parameters: a self and one other. We saw that pattern already when we implemented the comparison operators, and it recurs here.
各种二进制运算符有两个操作数,因此它们可以作为接受两个参数的 dunder 方法来实现:一个 self 和另一个。当我们实现比较运算符时,我们已经看到了这种模式,并且在这里它再次出现。

Let's try implementing a simple __add__ method as a first example.
让我们尝试实现一个简单的 __add__ 方法作为第一个示例。

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __repr__(self):
...         return f'Thing({self.value})'
...     def __add__(self, other):
...         return Thing(self.value + other.value)
...
>>> t1 = Thing(3)
>>> t2 = Thing(9)
>>> t1 + t2
    Thing(12)

In a complete implementation, any of these methods would best return NotImplemented when the operation is not supported on the types of the arguments. For example, if you want it to be possible to add objects of your class Money to other objects of the same class, but not others, then you would return NotImplemented whenever the other parameter has a type other than Money. This rule turns out not only to be hygenic, but it also forms the basis of additional automation; if Python knows that an operator isn't implemented, it might be able to try an equivalent alternative instead.
在完整的实现中,当操作不支持参数类型时,任何这些方法中的任何一个都应该最好返回 NotImplemented 。例如,如果您希望能够将您的类 Money 的对象添加到同一类的其他对象,但不是其他类的对象,则当 other 参数的类型不是 Money 时,您应该返回 NotImplemented 。这个规则不仅仅是卫生的,而且它也构成了额外自动化的基础;如果 Python 知道某个运算符没有被实现,它可能会尝试一个等效的替代方案。

Reflected arithmetic operators
反射算术运算符

When we redefine arithmetic operators, the usual rule is that the operation x ? y turns into a call to the equivalent dunder method, with x being the first argument (i.e., passed into self) and y being the second (i.e., passed into other). So, for example, when evaluating x + y, Python attempts to call x.__add__(y), the result of which becomes the result of the addition.
当我们重新定义算术运算符时,通常规则是操作 x ? y 会变成对等 dunder 方法的调用, x 是第一个参数(即传递给 self ), y 是第二个参数(即传递给 other )。因此,例如,在评估 x + y 时,Python 尝试调用 x.__add__(y) ,其结果成为加法的结果。

However, when x.__add__(y) is not supported, Python makes one more attempt to add x and y, by instead calling into a reflected version of the operator instead. In the case of addition, that reflected operation is implemented by the dunder method __radd__, so if x.__add__(y) is unsupported, Python attempts to call y.__radd__(x) instead. The mechanism it uses to determine whether x.__add__(y) is supported is simple: If the __add__ method doesn't exist, or if it returns NotImplemented, it's unsupported.
然而,当 x.__add__(y) 不受支持时,Python 会再尝试将 xy 相加,而是调用反射版本的运算符。在加法的情况下,反射操作是通过 dunder 方法 __radd__ 实现的,因此如果 x.__add__(y) 不受支持,Python 会尝试调用 y.__radd__(x) 。它用于确定 x.__add__(y) 是否受支持的机制很简单:如果 __add__ 方法不存在,或者返回 NotImplemented ,则不受支持。

There's a reason why the reflected arithmetic operators need to be implemented separately: Not all of these operators are commutative. We would expect x + y and y + x to return the same result — in many cases, but certainly not all! — but we would not expect the same from x - y and y - x. So, if Python is going to reverse the order of the arguments, we'll also need to know that we need to perform the reverse of the usual operation, so a different dunder method is called.
有一个原因需要单独实现反射算术运算符:并非所有这些运算符都是可交换的。我们期望 x + yy + x 返回相同的结果 — 在许多情况下,但肯定不是所有情况! — 但我们不会期望 x - yy - x 返回相同的结果。因此,如果 Python 要颠倒参数的顺序,我们还需要知道我们需要执行与通常操作相反的操作,因此会调用不同的 dunder 方法。

So, why is it so important for us to have this ability? Let's suppose that we have our own class called Thing, and that we want to be able to add Thing objects to integers. We might start by writing this.
那么,为什么我们有这种能力如此重要呢?假设我们有自己的类叫做 Thing ,我们希望能够将 Thing 对象添加到整数中。我们可能会从编写以下内容开始。

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __repr__(self):
...         return f'Thing({self.value})'
...     def __add__(self, other):
...         if type(other) is Thing:
...             return Thing(self.value + other.value)
...         elif type(other) is int:
...             return Thing(self.value + other)
...         else:
...             return NotImplemented
...
>>> t = Thing(17)
>>> t + 1
    Thing(18)       # So far, so good!
>>> 1 + t
    Traceback (most recent call last):
      ...
    TypeError: unsupported operand type(s) for +: 'int' and 'Thing'
                    # This is why we need __radd__!

At this point, there are two ways we might think about solving the problem.
在这一点上,我们可能有两种解决问题的方式。

And, of course, the first option isn't actually a choice we have, because we can't (and shouldn't) modify Python's built-in int class. So, ultimately, our only option is reflected addition.
当然,第一个选项实际上并不是我们可以选择的,因为我们不能(也不应该)修改 Python 的内置 int 类。因此,最终,我们唯一的选择是反射性的加法。

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __repr__(self):
...         return f'Thing({self.value})'
...     def _add_values(self, other):
...         if type(other) is Thing:
...             return self.value + other.value
...         elif type(other) is int:
...             return self.value + other
...         else:
...             return None
...     def __add__(self, other):
...         new_value = self._add_values(other)
...         return Thing(new_value) if new_value is not None else NotImplemented
...     def __radd__(self, other):
...         new_value = self._add_values(other)
...         return Thing(new_value) if new_value is not None else NotImplemented
...
>>> t = Thing(17)
>>> t + 1
    Thing(18)
>>> 1 + t
    Thing(18)       # Problem solved!

Augmented arithmetic operators
增强算术运算符

There is one more variation on arithmetic operators that we need to consider, as well. Python provides augmented arithmetic operators, which have the job of modifying an existing object rather than building a new one. For immutable objects, this is a distinction without a difference, but when objects are mutable, there can be a substantial performance benefit in implementing the augmented arithmetic operators differently from the others. Lists provide a good example of this.
我们还需要考虑算术运算符的另一种变体。Python 提供了增强型算术运算符,其作用是修改现有对象而不是构建新对象。对于不可变对象来说,这是一个没有区别的区分,但对于可变对象来说,通过以不同方式实现增强型算术运算符可能会带来显著的性能优势。列表提供了一个很好的例子。

>>> values = [1, 2, 3, 4, 5]
>>> values + [6, 7, 8]
    [1, 2, 3, 4, 5, 6, 7, 8]
>>> values
    [1, 2, 3, 4, 5]              # The + operator left values intact.
>>> values += [6, 7, 8]
>>> values
    [1, 2, 3, 4, 5, 6, 7, 8]     # But += changed it.

You've probably been previously acquainted with the conceptual difference between values + [6, 7, 8] and values += [6, 7, 8], but a little asymptotic analysis tells us that there's a significant performance difference here, as well.
你可能之前已经了解了 values + [6, 7, 8]values += [6, 7, 8] 之间的概念差异,但一些渐近分析告诉我们,在这里也存在着显著的性能差异。

So, there can certainly be a performance benefit in implementing the augmented arithmetic operators separately from the others.
因此,从其他操作符中单独实现增强算术运算符肯定会带来性能优势。

For immutable objects, we won't want to implement them, because the default behavior — make a new object with the new value — is exactly what we'd want. For example, our most recent Thing class supports += already, since Python will automatically turn t += 3 into the equivalent t = t + 3 instead.
对于不可变对象,我们不希望实现它们,因为默认行为——使用新值创建一个新对象——正是我们想要的。例如,我们最近的 Thing 类已经支持 += ,因为 Python 会自动将 t += 3 转换为等效的 t = t + 3

>>> t = Thing(17)
>>> id(t)
    2008047601600
>>> t += 3
>>> t
    Thing(20)
>>> id(t)
    2008047602800    # The id has changed here, because t + 3 built a new Thing object.

If Thing objects are intended to be mutable — we never decided about that, since we're just noodling — then we might instead want to implement augmented addition, so they'd be modified directly. (This is especially true if constructing Thing objects was more expensive than filling in a single integer attribute; it's for types like lists where this distinction matters the most.) Augmented arithmetic operators, like the others, are implemented using dunder methods.
如果 Thing 对象打算是可变的 - 我们从未决定过这一点,因为我们只是在思考 - 那么我们可能希望实现增强加法,这样它们就可以直接被修改。(特别是如果构建 Thing 对象比填充单个整数属性更昂贵的话;对于像列表这样的类型,这种区别最为重要。)增强算术运算符和其他运算符一样,是使用特殊方法来实现的。

These methods return the updated result, which will usually just be self (albeit with modifications), or they'll return NotImplemented when the types of the operands are not supported.
这些方法返回更新后的结果,通常只是 self (尽管有修改),或者在操作数的类型不受支持时返回 NotImplemented

Implementing augmented addition in our Thing class might look like this, then.
在我们的 Thing 类中实现增强加法可能看起来像这样。

class Thing:
    ...

    def __iadd__(self, other):
        new_value = self._add_values(other)

        if new_value is not None:
            self.value = new_value
            return self
        else:
            return NotImplemented

Note, too, that there are no reflected versions of the augmented arithmetic opeartors, for the simple reason that the object on the left-hand side of the operation is always the one being modified, so we would expect its class to be the one to know how to implement that modification.
请注意,增强算术运算符没有反射版本,原因很简单,操作的左侧对象始终是被修改的对象,因此我们期望它的类知道如何实现该修改。