这是用户在 2024-5-8 10:42 为 https://ics.uci.edu/~thornton/ics33/Notes/PythonDataModel/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

ICS 33 Spring 2024 2024 年春季 ICS 33
Notes and Examples: The Python Data Model
笔记和示例:Python 数据模型

Background 背景

As we've explored some features of Python that you may not have encountered before, a recurring theme has emerged. For the most part, the functions and classes built into Python and its standard library don't possess special abilities that aren't available within the code you write. Objects of pre-existing types can be context managers, but objects of your types can, too. Objects of many pre-existing types can be iterated using for loops, but so can objects of your types. You have to be aware of the mechanisms that make these things possible, but the important thing is that they are possible; most mechanisms the underlie Python are exposed and documented, which means your code can hook into them just as well as built-in code can. This is among the hallmarks of a well-designed programming language.
当我们探索一些您以前可能没有遇到过的 Python 功能时,一个重要主题浮现了。在大多数情况下,Python 及其标准库中内置的函数和类并不具备您在编写的代码中无法获得的特殊能力。现有类型的对象可以是上下文管理器,但您自己定义的类型的对象也可以是。许多现有类型的对象可以使用 for 循环进行迭代,但您自己定义的类型的对象也可以。您必须了解使这些事情成为可能的机制,但重要的是它们是可能的;Python 的大多数基础机制都是公开且有文档记录的,这意味着您的代码可以像内置代码一样连接到它们。这是设计良好的编程语言的特征之一。

Collectively, the mechanisms that govern how Python objects interact with each other within a running Python program are known as the Python data model. Both in this course and your prior coursework, you'll already have seen some parts of this model, even if you've never heard it put into these terms before. Here are a few familiar ideas from the Python data model.
总的来说,控制 Python 对象在运行中的 Python 程序中如何相互交互的机制被称为 Python 数据模型。在本课程和您之前的课程中,您可能已经看到了这个模型的一些部分,即使您以前从未听说过这些术语。以下是 Python 数据模型中的一些熟悉概念。

As we've seen before, all of the mechanisms described above rely on what we call protocols, which rely on the presence of one or more dunder methods, such as __init__, __enter__, __exit__, __iter__, __next__, and __get__. Virtually every time we interact with any object for any purpose in Python, at least one dunder method is called behind the scenes, even if we rarely call them ourselves. These dunder methods form the basis of how objects interact; their presence, alongside the fact that their meanings are documented and well-understood by seasoned Python programmers, ensures that we can modify most of what happens during those interactions when our designs require it. We can't change the fact that iterating an object causes __iter__ and __next__ methods to be called, but we can change what happens when they are, which means we can make iteration behave in any way we'd like. We can't change that the with statement requires __enter__ and __exit__ methods, but, by providing our own __enter__ and __exit__ methods, we can apply the concept of context management to any problem for which it's suited.
正如我们之前所看到的,上述所有机制都依赖于我们所称的协议,这些协议依赖于一个或多个 dunder 方法的存在,比如 __init____enter____exit____iter____next____get__ 。在 Python 中,几乎每次我们与任何对象进行交互以任何目的,至少会在幕后调用一个 dunder 方法,即使我们很少自己调用它们。这些 dunder 方法构成了对象相互作用的基础;它们的存在,以及它们的含义被经验丰富的 Python 程序员记录和理解,确保了我们可以在需要时修改大部分交互过程中发生的事情。我们无法改变迭代对象会导致调用 __iter____next__ 方法的事实,但我们可以改变它们被调用时发生的事情,这意味着我们可以使迭代行为按我们希望的方式进行。我们无法改变 with 语句需要 __enter____exit__ 方法的事实,但是,通过提供我们自己的 __enter____exit__ 方法,我们可以将上下文管理的概念应用于任何适合的问题。

So, the key to improving our ability to design Python programs — writing programs that are "Pythonic," in the sense that using our functions and classes feels just like using the ones that are built in — is understanding as much of the Python data model as we can. We don't need every detail all the time, but every available detail is applicable to a problem we might have. When we solve problems the way Python does, we find that our functions and classes naturally "plug into" the things that are built in, and vice versa. When an entire community of programmers uses a programming language the same way, the community's ability to solve problems rapidly increases, because common problems are solved once and solved well, with the tools used to solve them combining naturally in ways that no one considered originally. We can stand on the shoulders of giants without losing our balance.
因此,提高我们设计 Python 程序的能力的关键在于编写“Pythonic”程序,即使用我们的函数和类感觉就像使用内置函数一样,这意味着我们需要尽可能多地了解 Python 数据模型。我们并不总是需要每一个细节,但每一个可用的细节都适用于我们可能遇到的问题。当我们像 Python 一样解决问题时,我们会发现我们的函数和类自然地“插入”到内置的东西中,反之亦然。当整个程序员社区以相同的方式使用编程语言时,社区解决问题的能力会迅速增加,因为常见问题只需解决一次,而且解决得很好,用于解决问题的工具会自然地结合在一起,这是没有人最初考虑到的。我们可以站在巨人的肩膀上而不失去平衡。

The price to be paid is that we have to learn the details. The payoff, though, is immense, so we'd be well-served to spend the time learning them. And, fortunately, the details rarely change, except to the extent that new details are added; once we know how iterators work, for example, that's how they'll likely continue to work for as long as we use Python. So, we at least want to be aware of what's available and the common threads that tie the features of Python's data model together. We can always look the details up again when we need them, even if we've forgotten some of them by then, but it's a lot harder to look things up when we don't know what we're looking for.
要付出的代价是我们必须学习细节。然而,回报是巨大的,所以我们最好花时间学习它们。幸运的是,细节很少改变,除非添加新的细节;例如,一旦我们知道迭代器是如何工作的,那么它们在我们使用 Python 的整个过程中很可能会继续工作。因此,我们至少希望了解可用的内容以及将 Python 数据模型的特性联系在一起的共同线索。即使到那时我们已经忘记了一些细节,我们仍然可以在需要时再次查找细节,但是当我们不知道自己在寻找什么时,查找事物就会变得更加困难。

So, let's dive in and see what else we can find in the Python data model.
那么,让我们深入了解并看看在 Python 数据模型中还能找到什么。

Lengths 长度

In Python, we say that an object is sized if it can be asked for a length. The usual way to ask is to pass the object as an argument to the built-in len function. Strings, lists, tuples, ranges, sets, and dictionaries are all examples of sized objects, though not all objects are sized; integers, for example, are not.
在 Python 中,如果一个对象可以被要求长度,我们称其为可大小化。通常询问的方式是将对象作为参数传递给内置的 len 函数。字符串、列表、元组、范围、集合和字典都是可大小化对象的示例,尽管并非所有对象都是可大小化的;例如,整数就不是。

>>> len('Boo')
>>> len([1, 2, 3, 4, 5])
>>> len(18)
    Traceback (most recent call last):
    TypeError: object of type 'int' has no len()

The MyRange class we wrote in a previous lecture is a good example of a class whose objects ought to be sized — if MyRange(10) comprises ten integers, then we could reasonably say its length is 10 — but this feature was missing from our implementation.
我们在之前的讲座中编写的 MyRange 类是一个很好的例子,它的对象应该具有大小 — 如果 MyRange(10) 包含十个整数,那么我们可以合理地说它的长度是 10 — 但这个特性在我们的实现中缺失了。

>>> len(MyRange(10))
    Traceback (most recent call last):
    TypeError: object of type 'MyRange' has no len()

Fortunately, there is a simple protocol by which we can add this feature to a class. All we need to do is write one extra method in our class:

Since a MyRange doesn't actually store any of its values, we'd need to iterate them if we wanted to count them, which means would could build a list of those values and then ask that list for its length.
由于 MyRange 实际上不存储任何值,如果我们想要计算它们,我们需要对它们进行迭代,这意味着我们可以构建一个包含这些值的列表,然后询问该列表的长度。

class MyRange:

    def __len__(self):
        return len([x for x in self])

But we should be aware of the costs of our solutions; this works, but could we do substantially better? If there are n values in the range, this requires O(n) time to iterate through them, as well as O(n) memory to store them in a list. But if all we want to know is how many values are in our range, we have no need to store them; we just need to count them. What if we used a generator comprehension instead? Generators have no length, which rules out len(x for x in self), but we could transform each value into a 1 and then sum them up.
但我们应该意识到我们解决方案的成本;这个方法有效,但我们能做得更好吗?如果在范围内有 n 个值,这需要 O(n)的时间来遍历它们,同时需要 O(n)的内存来将它们存储在列表中。但如果我们只想知道我们的范围内有多少个值,我们就不需要将它们存储起来;我们只需要计数。如果我们使用生成器推导呢?生成器没有长度,这排除了 len(x for x in self) ,但我们可以将每个值转换为 1 然后将它们相加。

class MyRange:

    def __len__(self):
        return sum(1 for x in self)

If there are ten values in our range, we'll be summing the ten 1's that we generated, so this should produce the right answer. This technique reduces our memory usage to O(1), because we're now generating one value at a time, ignoring it (in favor of the value 1), and then adding 1 to a running sum. This is roughly equivalent to having written a loop instead.
如果我们的范围中有十个值,我们将对生成的十个 1 进行求和,因此这应该会得出正确的答案。这种技术将我们的内存使用量降低到 O(1),因为我们现在一次只生成一个值,忽略它(而选择值 1),然后将 1 添加到一个运行总和中。这大致相当于编写了一个循环。

class MyRange:

    def __len__(self):
        count = 0

        for value in self:
            count += 1

        return count

So, this is an improvement from a memory perpsecitve, but we're still spending O(n) time, because we're still iterating the values in our range from beginning to end. A larger improvement would be to eliminate the iteration of the values altogether, though this would only be possible if we could find some other to deduce how many there are. Fortunately, the values in a MyRange follow a straightforward pattern, so we can instead calculate the length of a pattern with an arithmetic formula, by dividing the difference between stop and start by step, then applying a little bit of finesse to handle the edge cases properly.
所以,从记忆的角度来看,这是一个改进,但我们仍然花费 O(n)的时间,因为我们仍然在从头到尾迭代我们范围内的值。一个更大的改进是完全消除值的迭代,尽管这只有在我们能找到其他方法来推断有多少值时才可能。幸运的是, MyRange 中的值遵循一个简单的模式,所以我们可以通过用算术公式计算模式的长度来代替,方法是将 stopstart 之间的差异除以 step ,然后应用一点技巧来正确处理边缘情况。

class MyRange:

    def __len__(self):
        return max(0, math.ceil((self._stop - self._start) / self._step))

This version runs in O(1) time and uses O(1) memory. It's always made up of one subtraction, one division, one ceiling operation, and determining the maximum of exactly two integers. Whether the range is extremely long or very short, the sequence of operations is always the same, so its cost remains constant, regardless of the range's length.
这个版本在 O(1)时间内运行,并使用 O(1)内存。它总是由一个减法、一个除法、一个向上取整操作和确定两个整数的最大值组成。无论范围是非常长还是非常短,操作序列始终相同,因此其成本保持恒定,不受范围长度的影响。

Note, too, that if MyRange also supported negative step values, as well — ours didn't — then we'd need to adjust our formula some more, but it would still be possible to calculate a length in both constant time and memory.
请注意,如果 MyRange 也支持负 step 值,那么我们需要进一步调整我们的公式,但仍然可以在恒定时间和内存中计算长度。

Truthiness 真实性

There are many situations in Python where objects are treated as truth values, which is to say that they're considered either to be truthy (i.e., treated as though they're a boolean True) or falsy (i.e., like a boolean False). This is why the conditional expression of an if statement or a while loop can evaluate to any type of object, or why an iterable containing any types of objects can be passed to the built-in functions any or all.
在 Python 中,有许多情况下将对象视为真值,也就是说它们被视为真值(即,被视为布尔值 True )或假值(即,像布尔值 False )。这就是为什么 if 语句的条件表达式或 while 循环可以评估为任何类型的对象,或者为什么包含任何类型对象的可迭代对象可以传递给内置函数 anyall

Making that feature work requires a decision on the fundamental question: Which objects are considered truthy and which are considered falsy? The design of Python answers that question for its built-in types, including rules such as these.
使该功能正常工作需要对基本问题做出决定:哪些对象被视为真实,哪些被视为虚假?Python 的设计回答了这个问题,包括其内置类型的规则。

But what about objects of the classes we write? Under what conditions are they truthy? Under what conditions are they falsy? And, most importantly, can we decide those conditions, instead of leaving it to Python to decide?
那么我们编写的类的对象呢?它们在什么条件下为真?在什么条件下为假?最重要的是,我们能否决定这些条件,而不是让 Python 来决定?

>>> class Person:
...     def __init__(self, name):
...         self._name = name
>>> p1 = Person('Alex')
>>> bool(p1)
    True       # A Person with a non-empty name is truthy.
>>> p2 = Person('')
>>> bool(p2)
    True       # A Person with an empty name is also truthy.

From our experimentation, it appears that objects of our classes are always truthy, but there's more to the story than meets the eye, though. Given what we know already about the Python data model, we can reasonably expect that one or more dunder methods will allow us to alter this outcome.
根据我们的实验,我们的类的对象似乎总是真实的,但事实并非如此简单。考虑到我们已经了解的 Python 数据模型,我们可以合理地期望一个或多个 dunder 方法将允许我们改变这种结果。

How lengths impact truthiness

We saw previously that we can give objects a length by writing a __len__ method in their class. We've also seen that empty strings and empty lists — whose lengths are zero — are considered to be falsy. What happens to objects of our classes when they have lengths?
我们之前看到,通过在它们的类中编写一个 __len__ 方法,我们可以为对象赋予长度。我们还看到,空字符串和空列表的长度为零,被认为是虚假的。当我们的类的对象具有长度时会发生什么?

>>> len(MyRange(10))
>>> bool(MyRange(10))
    True            # A MyRange with a non-zero length is truthy.
>>> len(MyRange(5, 5))
>>> bool(MyRange(5, 5))
    False           # A MyRange with a zero length is falsy.

For objects that are sized (i.e., those that implement a __len__ method), their lengths can be used to determine truthiness. If calculating lengths is inexpensive, and if we're happy with that behavior — which is in line with objects that are built into Python, so we'd need a good reason to feel otherwise about it — then we're done. (This is one reason why implementing our methods efficiently is so important; it has a compounding benefit, since one method can often form the basis of others, as well, so that one fast operation becomes many fast operations.)
对于具有大小的对象(即实现 __len__ 方法的对象),它们的长度可以用来确定真实性。如果计算长度不费力,并且我们对这种行为感到满意——这符合 Python 内置对象的行为,因此我们需要一个充分的理由来改变这种看法——那么我们就完成了。(这就是为什么高效实现我们的方法如此重要的原因之一;它具有复利效益,因为一个方法通常也可以成为其他方法的基础,这样一个快速操作就变成了许多快速操作。)

Still, not all objects are sized, but we might nonetheless want to control their truthiness. Or, we might be able to implement a way to determine truthiness that's cheaper than we're able to calculate a length. What do we do then?

Directly overriding truthiness

Adding a __bool__(self) method to a class directly overrides how its truthiness is determined, independent of whether it has a length. This means that determining the truthiness of an object is really a process that has as many as three steps.
向类添加一个 __bool__(self) 方法会直接覆盖其真值的确定方式,而不受其长度的影响。这意味着确定对象的真值实际上是一个包含多达三个步骤的过程。

This explains why objects of our previous Person class were always truthy: In the absence of __bool__ or __len__ methods in a class, this is Python's default. So, if we want to override that default, we'll need at least one of those methods.
这解释了为什么我们之前的 Person 类的对象始终为真:在类中没有 __bool____len__ 方法的情况下,这是 Python 的默认行为。因此,如果我们想要覆盖该默认行为,我们至少需要其中一个方法。

>>> class Person:
...     def __init__(self, name):
...         self._name = name
...     def __bool__(self):
...         return self._name == 'Boo'
>>> p1 = Person('Boo')
>>> bool(p1)
    True          # Boo is truthy
>>> p2 = Person('Alex')
>>> bool(p2)
    False         # Everyone else is falsy

This is an aspect of Python's data model that we'll see play out repeatedly. It's often the case that providing one operation (in this case, a length) will automatically supply a default behavior for others (in this case, truthiness), though we can do something other than that default when it's appropriate from the perspectives of correctness or performance. This makes the common situations easier to implement, while still allowing us to implement things more carefully when we need to.
这是 Python 数据模型的一个方面,我们会反复看到这种情况。通常情况下,提供一个操作(在这种情况下是长度)会自动为其他操作(在这种情况下是真实性)提供默认行为,尽管在正确性或性能的角度来看,当适当时我们可以做一些与默认行为不同的事情。这使得常见情况更容易实现,同时仍然允许我们在需要时更加谨慎地实现事物。

Indexing 索引

Some kinds of objects in Python can be indexed, which generally means that we can think of them as containing other objects, but that they give us a way to uniquely identify each of those objects so that we can ask for them individually and know definitively which one we'll get back.
在 Python 中,一些类型的对象可以被索引,这通常意味着我们可以将它们视为包含其他对象,但它们为我们提供了一种独特标识每个对象的方式,以便我们可以单独请求它们并确切地知道我们将得到哪一个。

The simplest example of indexing is asking a list for one of its elements given an index. Since lists are designed around an indexing scheme where the first element has the index 0, the second element has the index 1, and so on, then when we ask for the element at a particular index, it's clear which one we're asking for. Strings and ranges have that same design characteristic, so they can be indexed similarly.
索引的最简单示例是根据索引向列表请求其元素之一。由于列表是围绕索引方案设计的,其中第一个元素的索引为 0,第二个元素的索引为 1,依此类推,因此当我们请求特定索引处的元素时,很明确我们要请求哪一个。字符串和范围具有相同的设计特征,因此它们可以类似地进行索引。

>>> values = [1, 3, 5, 7, 9]
>>> values[4]
    9   # ^^^ 4 is the index, in this case, so we want the fifth element.
>>> range(1, 100, 4)[3]
    13            # ^^^ Here, we want the fourth value in the range.
>>> 'Boo is happy'[0]
    'B'         # ^^^ We're looking for a string containing the first character of 'Boo is happy'.

Dictionaries can also be indexed, albeit in a somewhat different way. A dictionary contains unique keys, with a value associated with each of them. So, when you index a dictionary, you're asking a different question: What is the value associated with this key? Still, the syntax is the same, and the underlying idea is, too: Give me the value that's uniquely identified by this index (where, for a dictionary, those indices are really its keys).

>>> d = {'A': 27, 'B': 17, 'C': 0}
>>> d['B']

For some kinds of objects that allow indexing — though not all kinds — we can also assign into those indexes. Again, the syntax is the same for all such indexed objects, and the underlying idea is also the same, though the implementation details differ from one type of object to another.

>>> values[3] = 13
>>> values
    [1, 3, 5, 13, 9]             # One object in the list has been replaced.
>>> d['B'] = 1
>>> d
    {'A': 27, 'B': 1, 'C': 0}    # The value associated with a key has been replaced.
>>> range(1, 100, 4)[3] = 10
    Traceback (most recent call last):
    TypeError: 'range' object does not support item assignment
                                 # Ranges are immutable, so we can't assign into them.

Those objects that allow assignment into indexes usually also allow deletion of an index, using the del statement.
那些允许分配到索引的对象通常也允许使用 del 语句删除索引。

>>> del values[3]
>>> values
    [1, 3, 5, 9]
>>> del d['A']
>>> d
    {'B': 1, 'C': 0}

That many kinds of objects support the same syntax with potentially different implementation details suggests again that dunder methods are being called behind the scenes here.

Dunder methods for implementing indexing
实现索引的 Dunder 方法

When we want objects to support indexing, we add at least one dunder method to their class.

Note that the word "index" does not necessarily mean a non-negative integer, or even an integer at all. It's up to the __getitem__ method to decide what constitutes a valid index and what an index means. (This is what makes it possible to index lists with integers, while being able to index dictionaries with arbitrary hashable keys. Their __getitem__ methods are written differently.)
请注意,“索引”一词不一定意味着非负整数,甚至根本不是整数。决定什么构成有效索引以及索引意味着什么,这取决于 __getitem__ 方法。(这就是为什么可以使用整数索引列表,同时能够使用任意可散列键索引字典。它们的 __getitem__ 方法编写方式不同。)

If we want to support assigning into an index and deletion of an index, there are additional dunder methods we can add alongside __getitem__.
如果我们想支持分配到索引和删除索引,我们可以在 __getitem__ 旁边添加额外的 dunder 方法。

Indexing is one feature that Python's built-in range provides that our MyRange class doesn't. Rectifying that would be a matter of adding a __getitem__ method to our MyRange class. (Since ranges are immutable, we wouldn't want to add __setitem__ or __delitem__.) Like our __len__ method, __getitem__ can calculate its answer in O(1) time using O(1) memory, so it would be best to do so.
索引是 Python 内置的 range 提供的一个功能,而我们的 MyRange 类没有。要纠正这一点,只需向我们的 MyRange 类添加一个 __getitem__ 方法即可。(由于范围是不可变的,我们不希望添加 __setitem____delitem__ 。)与我们的 __len__ 方法一样, __getitem__ 可以在 O(1)时间内使用 O(1)内存计算其答案,因此最好这样做。

class MyRange:

    def __getitem__(self, index):
        if type(index) is not int:
            raise TypeError(f'MyRange index must be int, but was {type(index).__name__}')
        elif index < 0 or index >= len(self):
            raise IndexError('MyRange index was out of range')

        return self._start + index * self._step

Since __getitem__ accepts a parameter other than self, but needs to perform calculations based on that parameter's value, some validation was necessary, so that non-integer indices and out-of-range indices would raise exceptions with descriptive error messages instead of returning invalid answers.
由于 __getitem__ 接受除 self 之外的参数,但需要根据该参数的值执行计算,因此需要进行一些验证,以便非整数索引和超出范围的索引会引发异常,并显示描述性错误消息,而不是返回无效答案。

How the presence of indexing impacts other operations

When a class has both a __len__ method and a __getitem__ method that accepts non-negative indices, an interesting thing happens: Even without an __iter__ method, its objects become iterable automatically. This is because __len__ and __getitem__ combine together into something called the sequence protocol, which means that objects supporting that combination of methods are what we call sequences.
当一个类同时具有 __len__ 方法和 __getitem__ 方法来接受非负索引时,一个有趣的事情发生了:即使没有 __iter__ 方法,它的对象也会自动变成可迭代的。这是因为 __len____getitem__ 结合在一起形成了一种称为序列协议的东西,这意味着支持这种方法组合的对象被我们称为序列。

If we know that an object is a sequence, we know that it can be iterated without an __iter__ method, via calls to __getitem__ and __len__. To understand why, let's briefly experiment with a class that includes these methods.
如果我们知道一个对象是一个序列,我们知道它可以在没有 __iter__ 方法的情况下被迭代,通过调用 __getitem____len__ 。为了理解这一点,让我们简要地尝试一个包含这些方法的类。

>>> class ThreeSequence:
...     def __len__(self):
...         return 3
...     def __getitem__(self, index):
...         if 0 <= index < len(self):
...             return index * 3
...         else:
...             raise IndexError
>>> s = ThreeSequence()
>>> s[0]
    0        # If s can be indexed with integers, isn't 0 the first index?
>>> s[1]
    3        # In that case, isn't 1 the second index?
>>> s[2]
    6        # And isn't 2 the third?
>>> len(s)
    3        # Doesn't this tell us that s[3] would fail if we tried it?
>>> index = 0
>>> while index < len(s):
...     print(s[index])
...     index += 1
...          # Therefore, isn't this a reliable pattern for iterating such a sequence?

So, as it turns out, when we iterate an object, there's a bit more to the story than we've seen.

In fact, iteration works in the presence of a __getitem__ method that accepts non-negative indexes, even in the absence of a __len__ method, in which case successively larger indexes are passed to __getitem__ until it raises an IndexError, at which point the iteration is considered to have ended.
实际上,在存在接受非负索引的 __getitem__ 方法的情况下,即使没有 __len__ 方法,也可以进行迭代,此时会将逐渐增大的索引传递给 __getitem__ 直到引发 IndexError 为止,此时迭代被视为已结束。

However, the __len__ method is useful in concert with __getitem__ for another reason: It also provides the automatic ability to iterate an object in reverse, since something akin to the following while loop can be used instead.
然而, __len__ 方法与 __getitem__ 结合使用还有另一个原因:它还提供了自动反向迭代对象的能力,因为可以使用类似以下 while 循环。

>>> index = len(s)
>>> while index > 0:
...     index -= 1
...     print(s[index])
...          # This is a reliable pattern for iterating a sequence in reverse, if we know its length.
...          # Without knowing the length, Python couldn't efficiently know where to start.

Additionally, objects implementing indexing (with or without a __len__ method) have another similar automatically implemented behavior.
此外,实现索引(无论是否具有 __len__ 方法)的对象具有另一种类似的自动实现行为。

>>> [i in s for i in range(8)]
    [True, False, False, True, False, False, True, False]
                   # The 'in' operator can be used to see if they contain a value,
                   # though this will be done using iteration, which will take linear time
                   # for each use of the 'in' operator.

When we write a class that implements a sequence, we'll quite often want to provide our own implementations of these three features — iteration, reverse iteration, and "contains" — especially if we can do so more performantly than the default. If so, we could add these three dunder methods to a class.
当我们编写一个实现序列的类时,我们经常会想要提供这三个功能的自定义实现 — 迭代、反向迭代和“包含” — 尤其是如果我们可以比默认实现更高效地实现这些功能。如果可以的话,我们可以将这三个特殊方法添加到一个类中。

MyRange would benefit from an implementation of __contains__, for example, since its result could then be determined in constant time using some straightforward arithmetic, rather than iterating every value in a potentially large range. There's no reason it should cost more to evaluate 100000 in MyRange(1000000) than it does to evaluate 0 in MyRange(1), but only a custom __contains__ method will make that possible. On the other hand, the automatic implementations of forward and reverse iteration arising from MyRange's indexing feature are probably fine.
例如, MyRange 可以受益于实现 __contains__ ,因为这样可以使用一些简单的算术在常数时间内确定其结果,而不是在潜在的大范围内迭代每个值。评估 100000 in MyRange(1000000) 的成本不应该比评估 0 in MyRange(1) 的成本更高,但只有自定义的 __contains__ 方法才能实现这一点。另一方面,由 MyRange 的索引特性产生的正向和反向迭代的自动实现可能是可以接受的。

So, it's worth knowing what features are provided automatically (and how they're provided automatically), because when these automatic implementations are performant enough for our needs, it means fewer features that we need to build, test, and maintain over time. Those positive decisions compound as programs and teams grow.

Slicing 切片

Indexing allows us to obtain a single object within another, such as one element of a list or the value associated with a key in a dictionary. A variant of indexing that we've not yet considered is what Python calls slicing, which allows us to take a sequence of objects and obtain a subsequence, containing some of the objects while skipping others. The slice will usually be the same type — so, for example, a slice of a list will be a list, a slice of a string will be a string, and so on.
索引允许我们在另一个对象中获取单个对象,例如列表中的一个元素或字典中与键关联的值。我们尚未考虑的索引变体是 Python 称之为切片的操作,它允许我们获取一系列对象并获得一个子序列,其中包含一些对象而跳过其他对象。切片通常将是相同类型的 — 例如,列表的切片将是一个列表,字符串的切片将是一个字符串,依此类推。

>>> values = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
>>> values[2:6]
    [5, 7, 9, 11]
>>> values[2:8:2]
    [5, 9, 13]
>>> values[:3]
    [1, 3, 5]
>>> values[7:]
    [15, 17, 19]
>>> 'Boo is happy today'[:3]
>>> range(0, 20, 2)[3:7]
    range(6, 14, 2)
>>> 'Boo'[::]

Syntactically, slicing looks a lot like indexing. If we start with an expression whose value can be sliced, we can follow that expression with brackets, in which we can write either two or three values separated by colons. Those three values, analogous to the values that describe a range, are named start, stop, and step. In the slice notation, all three values are optional.
从句法上看,切片看起来很像索引。如果我们从一个可以切片的表达式开始,我们可以在该表达式后面加上方括号,在方括号中我们可以写入两个或三个由冒号分隔的值。这三个值,类似于描述一个 range 的值,分别被命名为 startstopstep 。在切片表示法中,这三个值都是可选的。

This raises the question of what mechanism is used to implement slicing. We've seen previously that indexing is implemented via the __getitem__ dunder method, and we can see that slicing uses a similar bracket-surrounded notation, so is it safe for us to assume that __getitem__ does slicing, too? If so, how do we tell the difference between indexing and slicing? One way to find out is to experiment a bit.
这引发了一个问题,即用什么机制来实现切片。我们之前已经看到,索引是通过 __getitem__ dunder 方法实现的,我们可以看到切片使用了类似的方括号包围的表示法,那么我们可以安全地假设 __getitem__ 也实现了切片吗?如果是这样,我们如何区分索引和切片?找出答案的一种方法是进行一些实验。

>>> class Thing:
...     def __getitem__(self, index):
...         print(f'type(index) = {type(index)}')
...         print(f'index = {index}')
...         return None
>>> t = Thing()
>>> t[4]
    type(index) = <class 'int'>
    index = 4
>>> t[1:17:6]
    type(index) = <class 'slice'>
    index = slice(1, 17, 6)
>>> t[1:17]
    type(index) = <class 'slice'>
    index = slice(1, 17, None)
>>> t[:17]
    type(index) = <class 'slice'>
    index = slice(None, 17, None)
>>> t[::]
    type(index) = <class 'slice'>
    index = slice(None, None, None)

From this experimentation, we can deduce a few things:

So, if we want to implement slicing in a class, we'll need to add some functionality to our __getitem__ method to detect that its parameter is a slice and, if so, handle it specially. How do we interact with a slice object?
因此,如果我们想在类中实现切片,我们需要向我们的 __getitem__ 方法添加一些功能,以检测其参数是否为 slice ,如果是,则特殊处理。我们如何与 slice 对象交互?

>>> s = slice(1, 17, 6)
>>> s.start, s.stop, s.step
    (1, 17, 6)        # We can access its start, stop, and step attributes.
>>> s.step = 100
    Traceback (most recent call last):
    AttributeError: readonly attribute
                      # Like ranges, slices are immutable.
>>> start, stop, step = s.indices(10)
>>> start, stop, step
    (1, 10, 6)        # We can ask it what the applicable start, stop, and step
                      # values would be for a given length.  In this case, we've asked this
                      # for a length of 10, which is why the applicable stop is less than
                      # the original one.
>>> defaulted = slice(None, None, None)
>>> [type(x) for x in (defaulted.start, defaulted.stop, defaulted.step)]
    [<class 'NoneType'>, <class 'NoneType'>, <class 'NoneType'>]
                      # When a slice is constructed with Nones, they aren't defaulted
                      # to anything; they remain Nones.
>>> dstart, dstop, dstep = defaulted.indices(10)
>>> dstart, dstop, dstep
    (0, 10, 1)        # Even if the start, stop, and step are all None, the
                      # indices method returns integer results.
>>> [index for index in range(*defaulted.indices(10))]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
                      # If indices returns a tuple of three values, we can unpack it into
                      # three arguments (the start, stop, and step), pass them to range (which
                      # also understands the concept of a start, stop, and step), and we now have
                      # a range of the indices that make up our slice.

Now that we understand the building blocks available to us, we have an idea of how we might add slicing to our MyRang