่ฟ™ๆ˜ฏ็”จๆˆทๅœจ 2025-5-9 18:07 ไธบ https://docs.pola.rs/user-guide/expressions/strings/ ไฟๅญ˜็š„ๅŒ่ฏญๅฟซ็…ง้กต้ข๏ผŒ็”ฑ ๆฒ‰ๆตธๅผ็ฟป่ฏ‘ ๆไพ›ๅŒ่ฏญๆ”ฏๆŒใ€‚ไบ†่งฃๅฆ‚ไฝ•ไฟๅญ˜๏ผŸ
Skip to content

Strings  ๅญ—็ฌฆไธฒ

The following section discusses operations performed on string data, which is a frequently used data type when working with dataframes. String processing functions are available in the namespace str.
ไปฅไธ‹้ƒจๅˆ†่ฎจ่ฎบๅฏนๅญ—็ฌฆไธฒๆ•ฐๆฎๆ‰ง่กŒ็š„ๆ“ไฝœ๏ผŒ่ฟ™ๆ˜ฏๅœจๅค„็†ๆ•ฐๆฎๆก†ๆ—ถๅธธ็”จ็š„ๆ•ฐๆฎ็ฑปๅž‹ใ€‚ๅญ—็ฌฆไธฒๅค„็†ๅ‡ฝๆ•ฐๅฏๅœจๅ‘ฝๅ็ฉบ้—ด str ไธญไฝฟ็”จใ€‚

Working with strings in other dataframe libraries can be highly inefficient due to the fact that strings have unpredictable lengths. Polars mitigates these inefficiencies by following the Arrow Columnar Format specification, so you can write performant data queries on string data too.
ๅœจๅ…ถไป–ๆ•ฐๆฎๆก†ๅบ“ไธญๅค„็†ๅญ—็ฌฆไธฒๅฏ่ƒฝๆ•ˆ็އๆžไฝŽ๏ผŒๅ› ไธบๅญ—็ฌฆไธฒ้•ฟๅบฆไธๅฏ้ข„ๆต‹ใ€‚Polars ้€š่ฟ‡้ตๅพช Arrow ๅˆ—ๅผๆ ผๅผ่ง„่Œƒๆฅ็ผ“่งฃ่ฟ™ไบ›ไฝŽๆ•ˆ้—ฎ้ข˜๏ผŒๅ› ๆญคๆ‚จไนŸ่ƒฝๅฏนๅญ—็ฌฆไธฒๆ•ฐๆฎ็ผ–ๅ†™้ซ˜ๆ€ง่ƒฝๆŸฅ่ฏขใ€‚

The string namespace  ๅญ—็ฌฆไธฒๅ‘ฝๅ็ฉบ้—ด

When working with string data you will likely need to access the namespace str, which aggregates 40+ functions that let you work with strings. As an example of how to access functions from within that namespace, the snippet below shows how to compute the length of the strings in a column in terms of the number of bytes and the number of characters:
ๅœจๅค„็†ๅญ—็ฌฆไธฒๆ•ฐๆฎๆ—ถ๏ผŒๆ‚จๅพˆๅฏ่ƒฝ้œ€่ฆ่ฎฟ้—ฎๅ‘ฝๅ็ฉบ้—ด str ๏ผŒๅฎƒ้›†ๆˆไบ† 40 ๅคšไธช็”จไบŽๆ“ไฝœๅญ—็ฌฆไธฒ็š„ๅ‡ฝๆ•ฐใ€‚ไปฅไธ‹ไปฃ็ ็‰‡ๆฎตๅฑ•็คบไบ†ๅฆ‚ไฝ•ไปŽ่ฏฅๅ‘ฝๅ็ฉบ้—ดไธญ่ฐƒ็”จๅ‡ฝๆ•ฐๆฅ่ฎก็ฎ—ๅˆ—ไธญๅญ—็ฌฆไธฒ็š„ๅญ—่Š‚้•ฟๅบฆๅ’Œๅญ—็ฌฆ้•ฟๅบฆ๏ผŒไฝœไธบไฝฟ็”จ็คบไพ‹๏ผš

str.len_bytes ยท str.len_chars

import polars as pl

df = pl.DataFrame(
    {
        "language": ["English", "Dutch", "Portuguese", "Finish"],
        "fruit": ["pear", "peer", "pรชra", "pรครคrynรค"],
    }
)

result = df.with_columns(
    pl.col("fruit").str.len_bytes().alias("byte_count"),
    pl.col("fruit").str.len_chars().alias("letter_count"),
)
print(result)

str.len_bytes ยท str.len_chars

use polars::prelude::*;

let df = df! (
    "language" => ["English", "Dutch", "Portuguese", "Finish"],
    "fruit" => ["pear", "peer", "pรชra", "pรครคrynรค"],
)?;

let result = df
    .clone()
    .lazy()
    .with_columns([
        col("fruit").str().len_bytes().alias("byte_count"),
        col("fruit").str().len_chars().alias("letter_count"),
    ])
    .collect()?;

println!("{}", result);

shape: (4, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ language   โ”† fruit   โ”† byte_count โ”† letter_count โ”‚
โ”‚ ---        โ”† ---     โ”† ---        โ”† ---          โ”‚
โ”‚ str        โ”† str     โ”† u32        โ”† u32          โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ English    โ”† pear    โ”† 4          โ”† 4            โ”‚
โ”‚ Dutch      โ”† peer    โ”† 4          โ”† 4            โ”‚
โ”‚ Portuguese โ”† pรชra    โ”† 5          โ”† 4            โ”‚
โ”‚ Finish     โ”† pรครคrynรค โ”† 10         โ”† 7            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Note  ๆณจๆ„

If you are working exclusively with ASCII text, then the results of the two computations will be the same and using len_bytes is recommended since it is faster.
่‹ฅๆ‚จไป…ๅค„็† ASCII ๆ–‡ๆœฌ๏ผŒไธค็ง่ฎก็ฎ—ๆ–นๅผ็š„็ป“ๆžœๅฐ†็›ธๅŒ๏ผŒๆญคๆ—ถๆŽจ่ไฝฟ็”จ len_bytes ๏ผŒๅ› ๅ…ถ้€Ÿๅบฆๆ›ดๅฟซใ€‚

Parsing strings  ่งฃๆžๅญ—็ฌฆไธฒ

Polars offers multiple methods for checking and parsing elements of a string column, namely checking for the existence of given substrings or patterns, and counting, extracting, or replacing, them. We will demonstrate some of these operations in the upcoming examples.
Polars ๆไพ›ไบ†ๅคš็งๆ–นๆณ•ๆฅๆฃ€ๆŸฅๅ’Œ่งฃๆžๅญ—็ฌฆไธฒๅˆ—็š„ๅ…ƒ็ด ๏ผŒๅŒ…ๆ‹ฌๆฃ€ๆŸฅๆ˜ฏๅฆๅญ˜ๅœจ็ป™ๅฎš็š„ๅญๅญ—็ฌฆไธฒๆˆ–ๆจกๅผ๏ผŒไปฅๅŠ่ฎกๆ•ฐใ€ๆๅ–ๆˆ–ๆ›ฟๆขๅฎƒไปฌใ€‚ๆˆ‘ไปฌๅฐ†ๅœจๆŽฅไธ‹ๆฅ็š„็คบไพ‹ไธญๆผ”็คบๅ…ถไธญไธ€ไบ›ๆ“ไฝœใ€‚

Check for the existence of a pattern
ๆฃ€ๆŸฅๆจกๅผๆ˜ฏๅฆๅญ˜ๅœจ

We can use the function contains to check for the presence of a pattern within a string. By default, the argument to the function contains is interpreted as a regular expression. If you want to specify a literal substring, set the parameter literal to True.
ๆˆ‘ไปฌๅฏไปฅไฝฟ็”จๅ‡ฝๆ•ฐ contains ๆฅๆฃ€ๆŸฅๅญ—็ฌฆไธฒไธญๆ˜ฏๅฆๅญ˜ๅœจๆŸไธชๆจกๅผใ€‚้ป˜่ฎคๆƒ…ๅ†ตไธ‹๏ผŒๅ‡ฝๆ•ฐ contains ็š„ๅ‚ๆ•ฐไผš่ขซ่งฃ้‡Šไธบๆญฃๅˆ™่กจ่พพๅผใ€‚ๅฆ‚ๆžœไฝ ๆƒณๆŒ‡ๅฎšไธ€ไธชๅญ—้ขๅญๅญ—็ฌฆไธฒ๏ผŒ่ฏทๅฐ†ๅ‚ๆ•ฐ literal ่ฎพ็ฝฎไธบ True ใ€‚

For the special cases where you want to check if the strings start or end with a fixed substring, you can use the functions starts_with or ends_with, respectively.
ๅฏนไบŽ้œ€่ฆๆฃ€ๆŸฅๅญ—็ฌฆไธฒๆ˜ฏๅฆไปฅๅ›บๅฎšๅญๅญ—็ฌฆไธฒๅผ€ๅคดๆˆ–็ป“ๅฐพ็š„็‰นๆฎŠๆƒ…ๅ†ต๏ผŒๅฏไปฅๅˆ†ๅˆซไฝฟ็”จๅ‡ฝๆ•ฐ starts_with ๆˆ– ends_with ใ€‚

str.contains ยท str.starts_with ยท str.ends_with

result = df.select(
    pl.col("fruit"),
    pl.col("fruit").str.starts_with("p").alias("starts_with_p"),
    pl.col("fruit").str.contains("p..r").alias("p..r"),
    pl.col("fruit").str.contains("e+").alias("e+"),
    pl.col("fruit").str.ends_with("r").alias("ends_with_r"),
)
print(result)

str.contains ยท str.starts_with ยท str.ends_with ยท Available on feature regex

let result = df
    .clone()
    .lazy()
    .select([
        col("fruit"),
        col("fruit")
            .str()
            .starts_with(lit("p"))
            .alias("starts_with_p"),
        col("fruit").str().contains(lit("p..r"), true).alias("p..r"),
        col("fruit").str().contains(lit("e+"), true).alias("e+"),
        col("fruit").str().ends_with(lit("r")).alias("ends_with_r"),
    ])
    .collect()?;

println!("{}", result);

shape: (4, 5)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ fruit   โ”† starts_with_p โ”† p..r  โ”† e+    โ”† ends_with_r โ”‚
โ”‚ ---     โ”† ---           โ”† ---   โ”† ---   โ”† ---         โ”‚
โ”‚ str     โ”† bool          โ”† bool  โ”† bool  โ”† bool        โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ pear    โ”† true          โ”† true  โ”† true  โ”† true        โ”‚
โ”‚ peer    โ”† true          โ”† true  โ”† true  โ”† true        โ”‚
โ”‚ pรชra    โ”† true          โ”† false โ”† false โ”† false       โ”‚
โ”‚ pรครคrynรค โ”† true          โ”† true  โ”† false โ”† false       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Regex specification  ๆญฃๅˆ™่กจ่พพๅผ่ง„่Œƒ

Polars relies on the Rust crate regex to work with regular expressions, so you may need to refer to the syntax documentation to see what features and flags are supported. In particular, note that the flavor of regex supported by Polars is different from Python's module re.
Polars ไพ่ต– Rust ็š„ regex ๅบ“ๆฅๅค„็†ๆญฃๅˆ™่กจ่พพๅผ๏ผŒๅ› ๆญคๆ‚จๅฏ่ƒฝ้œ€่ฆๅ‚่€ƒๅ…ถ่ฏญๆณ•ๆ–‡ๆกฃไปฅไบ†่งฃๆ”ฏๆŒ็š„ๅŠŸ่ƒฝๅ’Œๆ ‡ๅฟ—ใ€‚็‰นๅˆซ้œ€่ฆๆณจๆ„็š„ๆ˜ฏ๏ผŒPolars ๆ”ฏๆŒ็š„ๆญฃๅˆ™่กจ่พพๅผ้ฃŽๆ ผไธŽ Python ็š„ re ๆจกๅ—ๆœ‰ๆ‰€ไธๅŒใ€‚

Extract a pattern  ๆๅ–ๆจกๅผ

The function extract allows us to extract patterns from the string values in a column. The function extract accepts a regex pattern with one or more capture groups and extracts the capture group specified as the second argument.
ๅ‡ฝๆ•ฐ extract ๅ…่ฎธๆˆ‘ไปฌไปŽๅˆ—ไธญ็š„ๅญ—็ฌฆไธฒๅ€ผๆๅ–ๆจกๅผใ€‚ๅ‡ฝๆ•ฐ extract ๆŽฅๅ—ๅŒ…ๅซไธ€ไธชๆˆ–ๅคšไธชๆ•่Žท็ป„็š„ๆญฃๅˆ™่กจ่พพๅผๆจกๅผ๏ผŒๅนถๆๅ–ไฝœไธบ็ฌฌไบŒไธชๅ‚ๆ•ฐๆŒ‡ๅฎš็š„ๆ•่Žท็ป„ใ€‚

str.extract

df = pl.DataFrame(
    {
        "urls": [
            "http://vote.com/ballon_dor?candidate=messi&ref=polars",
            "http://vote.com/ballon_dor?candidat=jorginho&ref=polars",
            "http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
        ]
    }
)
result = df.select(
    pl.col("urls").str.extract(r"candidate=(\w+)", group_index=1),
)
print(result)

str.extract

let df = df! (
    "urls" => [
        "http://vote.com/ballon_dor?candidate=messi&ref=polars",
        "http://vote.com/ballon_dor?candidat=jorginho&ref=polars",
        "http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
    ]
)?;

let result = df
    .clone()
    .lazy()
    .select([col("urls").str().extract(lit(r"candidate=(\w+)"), 1)])
    .collect()?;

println!("{}", result);

shape: (3, 1)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ urls    โ”‚
โ”‚ ---     โ”‚
โ”‚ str     โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ messi   โ”‚
โ”‚ null    โ”‚
โ”‚ ronaldo โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

To extract all occurrences of a pattern within a string, we can use the function extract_all. In the example below, we extract all numbers from a string using the regex pattern (\d+), which matches one or more digits. The resulting output of the function extract_all is a list containing all instances of the matched pattern within the string.
่ฆๆๅ–ๅญ—็ฌฆไธฒไธญๆ‰€ๆœ‰ๅŒน้…ๆŸไธชๆจกๅผ็š„ๅ†…ๅฎน๏ผŒๆˆ‘ไปฌๅฏไปฅไฝฟ็”จๅ‡ฝๆ•ฐ extract_all ใ€‚ๅœจไธ‹้ข็š„็คบไพ‹ไธญ๏ผŒๆˆ‘ไปฌไฝฟ็”จๆญฃๅˆ™่กจ่พพๅผๆจกๅผ (\d+) ๏ผˆๅŒน้…ไธ€ไธชๆˆ–ๅคšไธชๆ•ฐๅญ—๏ผ‰ไปŽๅญ—็ฌฆไธฒไธญๆๅ–ๆ‰€ๆœ‰ๆ•ฐๅญ—ใ€‚ๅ‡ฝๆ•ฐ extract_all ็š„่พ“ๅ‡บ็ป“ๆžœๆ˜ฏไธ€ไธชๅˆ—่กจ๏ผŒๅŒ…ๅซๅญ—็ฌฆไธฒไธญๆ‰€ๆœ‰ๅŒน้…่ฏฅๆจกๅผ็š„ๅฎžไพ‹ใ€‚

str.extract_all

df = pl.DataFrame({"text": ["123 bla 45 asd", "xyz 678 910t"]})
result = df.select(
    pl.col("text").str.extract_all(r"(\d+)").alias("extracted_nrs"),
)
print(result)

str.extract_all

let df = df! (
    "text" => ["123 bla 45 asd", "xyz 678 910t"]
)?;

let result = df
    .clone()
    .lazy()
    .select([col("text")
        .str()
        .extract_all(lit(r"(\d+)"))
        .alias("extracted_nrs")])
    .collect()?;

println!("{}", result);

shape: (2, 1)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ extracted_nrs  โ”‚
โ”‚ ---            โ”‚
โ”‚ list[str]      โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ ["123", "45"]  โ”‚
โ”‚ ["678", "910"] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Replace a pattern  ๆ›ฟๆขไธ€ไธชๆจกๅผ

Akin to the functions extract and extract_all, Polars provides the functions replace and replace_all. These accept a regex pattern or a literal substring (if the parameter literal is set to True) and perform the replacements specified. The function replace will make at most one replacement whereas the function replace_all will make all the non-overlapping replacements it finds.
็ฑปไผผไบŽๅ‡ฝๆ•ฐ extract ๅ’Œ extract_all ๏ผŒPolars ๆไพ›ไบ†ๅ‡ฝๆ•ฐ replace ๅ’Œ replace_all ใ€‚่ฟ™ไบ›ๅ‡ฝๆ•ฐๆŽฅๅ—ไธ€ไธชๆญฃๅˆ™่กจ่พพๅผๆจกๅผๆˆ–ๅญ—้ขๅญๅญ—็ฌฆไธฒ๏ผˆๅฆ‚ๆžœๅ‚ๆ•ฐ literal ่ฎพ็ฝฎไธบ True ๏ผ‰๏ผŒๅนถๆ‰ง่กŒๆŒ‡ๅฎš็š„ๆ›ฟๆขๆ“ไฝœใ€‚ๅ‡ฝๆ•ฐ replace ๆœ€ๅคš่ฟ›่กŒไธ€ๆฌกๆ›ฟๆข๏ผŒ่€Œๅ‡ฝๆ•ฐ replace_all ๅˆ™ไผš่ฟ›่กŒๆ‰€ๆœ‰ๆ‰พๅˆฐ็š„้ž้‡ๅ ๆ›ฟๆขใ€‚

str.replace ยท str.replace_all

df = pl.DataFrame({"text": ["123abc", "abc456"]})
result = df.with_columns(
    pl.col("text").str.replace(r"\d", "-"),
    pl.col("text").str.replace_all(r"\d", "-").alias("text_replace_all"),
)
print(result)

str.replace ยท str.replace_all ยท Available on feature regex

let df = df! (
    "text" => ["123abc", "abc456"]
)?;

let result = df
    .clone()
    .lazy()
    .with_columns([
        col("text").str().replace(lit(r"\d"), lit("-"), false),
        col("text")
            .str()
            .replace_all(lit(r"\d"), lit("-"), false)
            .alias("text_replace_all"),
    ])
    .collect()?;

println!("{}", result);

shape: (2, 2)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ text   โ”† text_replace_all โ”‚
โ”‚ ---    โ”† ---              โ”‚
โ”‚ str    โ”† str              โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ -23abc โ”† ---abc           โ”‚
โ”‚ abc-56 โ”† abc---           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Modifying strings  ไฟฎๆ”นๅญ—็ฌฆไธฒ

Case conversion  ๅคงๅฐๅ†™่ฝฌๆข

Converting the casing of a string is a common operation and Polars supports it out of the box with the functions to_lowercase, to_titlecase, and to_uppercase:
่ฝฌๆขๅญ—็ฌฆไธฒ็š„ๅคงๅฐๅ†™ๆ˜ฏไธ€็งๅธธ่งๆ“ไฝœ๏ผŒPolars ้€š่ฟ‡ๅ‡ฝๆ•ฐ to_lowercase ใ€ to_titlecase ๅ’Œ to_uppercase ๅŽŸ็”Ÿๆ”ฏๆŒ่ฟ™ไธ€ๅŠŸ่ƒฝ๏ผš

str.to_lowercase ยท str.to_titlecase ยท str.to_uppercase

addresses = pl.DataFrame(
    {
        "addresses": [
            "128 PERF st",
            "Rust blVD, 158",
            "PoLaRs Av, 12",
            "1042 Query sq",
        ]
    }
)

addresses = addresses.select(
    pl.col("addresses").alias("originals"),
    pl.col("addresses").str.to_titlecase(),
    pl.col("addresses").str.to_lowercase().alias("lower"),
    pl.col("addresses").str.to_uppercase().alias("upper"),
)
print(addresses)

str.to_lowercase ยท str.to_titlecase ยท str.to_uppercase ยท Available on feature nightly

let addresses = df! (
    "addresses" => [
        "128 PERF st",
        "Rust blVD, 158",
        "PoLaRs Av, 12",
        "1042 Query sq",
    ]
)?;

let addresses = addresses
    .clone()
    .lazy()
    .select([
        col("addresses").alias("originals"),
        col("addresses").str().to_titlecase(),
        col("addresses").str().to_lowercase().alias("lower"),
        col("addresses").str().to_uppercase().alias("upper"),
    ])
    .collect()?;

println!("{}", addresses);

shape: (4, 4)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ originals      โ”† addresses      โ”† lower          โ”† upper          โ”‚
โ”‚ ---            โ”† ---            โ”† ---            โ”† ---            โ”‚
โ”‚ str            โ”† str            โ”† str            โ”† str            โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ 128 PERF st    โ”† 128 Perf St    โ”† 128 perf st    โ”† 128 PERF ST    โ”‚
โ”‚ Rust blVD, 158 โ”† Rust Blvd, 158 โ”† rust blvd, 158 โ”† RUST BLVD, 158 โ”‚
โ”‚ PoLaRs Av, 12  โ”† Polars Av, 12  โ”† polars av, 12  โ”† POLARS AV, 12  โ”‚
โ”‚ 1042 Query sq  โ”† 1042 Query Sq  โ”† 1042 query sq  โ”† 1042 QUERY SQ  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Stripping characters from the ends
ๅŽป้™คไธค็ซฏๅญ—็ฌฆ

Polars provides five functions in the namespace str that let you strip characters from the ends of the string:
Polars ๅœจ str ๅ‘ฝๅ็ฉบ้—ดไธญๆไพ›ไบ†ไบ”ไธชๅ‡ฝๆ•ฐ๏ผŒๅฏ็”จไบŽๅŽป้™คๅญ—็ฌฆไธฒไธค็ซฏ็š„ๅญ—็ฌฆ๏ผš

Function  ๅ‡ฝๆ•ฐ Behaviour  ่กŒไธบ
strip_chars Removes leading and trailing occurrences of the characters specified.
็งป้™คๆŒ‡ๅฎšๅญ—็ฌฆๅœจๅผ€ๅคดๅ’Œ็ป“ๅฐพ็š„ๅ‡บ็Žฐใ€‚
strip_chars_end Removes trailing occurrences of the characters specified.
็งป้™คๆŒ‡ๅฎšๅญ—็ฌฆ็š„ๅฐพ้ƒจๅ‡บ็Žฐ้กนใ€‚
strip_chars_start Removes leading occurrences of the characters specified.
็งป้™คๆŒ‡ๅฎšๅญ—็ฌฆ็š„ๅคด้ƒจๅ‡บ็Žฐ้กนใ€‚
strip_prefix Removes an exact substring prefix if present.
ๅฆ‚ๆžœๅญ˜ๅœจ๏ผŒ็งป้™ค็ฒพ็กฎ็š„ๅญๅญ—็ฌฆไธฒๅ‰็ผ€ใ€‚
strip_suffix Removes an exact substring suffix if present.
ๅฆ‚ๆžœๅญ˜ๅœจ๏ผŒ็งป้™ค็ฒพ็กฎ็š„ๅญๅญ—็ฌฆไธฒๅŽ็ผ€ใ€‚
Similarity to Python string methods
ไธŽ Python ๅญ—็ฌฆไธฒๆ–นๆณ•็š„็›ธไผผๆ€ง

strip_chars is similar to Python's string method strip and strip_prefix/strip_suffix are similar to Python's string methods removeprefix and removesuffix, respectively.

It is important to understand that the first three functions interpret their string argument as a set of characters whereas the functions strip_prefix and strip_suffix do interpret their string argument as a literal string.
้‡่ฆ็š„ๆ˜ฏ่ฆ็†่งฃๅ‰ไธ‰ไธชๅ‡ฝๆ•ฐๅฐ†ๅ…ถๅญ—็ฌฆไธฒๅ‚ๆ•ฐ่งฃ้‡Šไธบไธ€็ป„ๅญ—็ฌฆ๏ผŒ่€Œๅ‡ฝๆ•ฐ strip_prefix ๅ’Œ strip_suffix ๅˆ™ๅฐ†ๅ…ถๅญ—็ฌฆไธฒๅ‚ๆ•ฐ่งฃ้‡Šไธบๅญ—้ขๅญ—็ฌฆไธฒใ€‚

str.strip_chars ยท str.strip_chars_end ยท str.strip_chars_start ยท str.strip_prefix ยท str.strip_suffix

addr = pl.col("addresses")
chars = ", 0123456789"
result = addresses.select(
    addr.str.strip_chars(chars).alias("strip"),
    addr.str.strip_chars_end(chars).alias("end"),
    addr.str.strip_chars_start(chars).alias("start"),
    addr.str.strip_prefix("128 ").alias("prefix"),
    addr.str.strip_suffix(", 158").alias("suffix"),
)
print(result)

str.strip_chars ยท str.strip_chars_end ยท str.strip_chars_start ยท str.strip_prefix ยท str.strip_suffix

let addr = col("addresses");
let chars = lit(", 0123456789");
let result = addresses
    .clone()
    .lazy()
    .select([
        addr.clone().str().strip_chars(chars.clone()).alias("strip"),
        addr.clone()
            .str()
            .strip_chars_end(chars.clone())
            .alias("end"),
        addr.clone()
            .str()
            .strip_chars_start(chars.clone())
            .alias("start"),
        addr.clone().str().strip_prefix(lit("128 ")).alias("prefix"),
        addr.clone()
            .str()
            .strip_suffix(lit(", 158"))
            .alias("suffix"),
    ])
    .collect()?;

println!("{}", result);

shape: (4, 5)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ strip     โ”† end           โ”† start          โ”† prefix         โ”† suffix        โ”‚
โ”‚ ---       โ”† ---           โ”† ---            โ”† ---            โ”† ---           โ”‚
โ”‚ str       โ”† str           โ”† str            โ”† str            โ”† str           โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ Perf St   โ”† 128 Perf St   โ”† Perf St        โ”† Perf St        โ”† 128 Perf St   โ”‚
โ”‚ Rust Blvd โ”† Rust Blvd     โ”† Rust Blvd, 158 โ”† Rust Blvd, 158 โ”† Rust Blvd     โ”‚
โ”‚ Polars Av โ”† Polars Av     โ”† Polars Av, 12  โ”† Polars Av, 12  โ”† Polars Av, 12 โ”‚
โ”‚ Query Sq  โ”† 1042 Query Sq โ”† Query Sq       โ”† 1042 Query Sq  โ”† 1042 Query Sq โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

If no argument is provided, the three functions strip_chars, strip_chars_end, and strip_chars_start, remove whitespace by default.
ๅฆ‚ๆžœๆœชๆไพ›ๅ‚ๆ•ฐ๏ผŒไธ‰ไธชๅ‡ฝๆ•ฐ strip_chars ใ€ strip_chars_end ๅ’Œ strip_chars_start ้ป˜่ฎคไผš็งป้™ค็ฉบ็™ฝๅญ—็ฌฆใ€‚

Slicing  ๅˆ‡็‰‡

Besides extracting substrings as specified by patterns, you can also slice strings at specified offsets to produce substrings. The general-purpose function for slicing is slice and it takes the starting offset and the optional length of the slice. If the length of the slice is not specified or if it's past the end of the string, Polars slices the string all the way to the end.
้™คไบ†ๆŒ‰็…งๆจกๅผๆๅ–ๅญๅญ—็ฌฆไธฒๅค–๏ผŒๆ‚จ่ฟ˜ๅฏไปฅๅœจๆŒ‡ๅฎšๅ็งป้‡ๅค„ๅฏนๅญ—็ฌฆไธฒ่ฟ›่กŒๅˆ‡็‰‡ไปฅ็”Ÿๆˆๅญๅญ—็ฌฆไธฒใ€‚้€š็”จ็š„ๅˆ‡็‰‡ๅ‡ฝๆ•ฐๆ˜ฏ slice ๏ผŒๅฎƒๆŽฅๅ—่ตทๅง‹ๅ็งป้‡ๅ’Œๅฏ้€‰็š„ๅˆ‡็‰‡้•ฟๅบฆใ€‚ๅฆ‚ๆžœๆœชๆŒ‡ๅฎšๅˆ‡็‰‡้•ฟๅบฆๆˆ–้•ฟๅบฆ่ถ…ๅ‡บๅญ—็ฌฆไธฒๆœซๅฐพ๏ผŒPolars ไผšๅฐ†ๅญ—็ฌฆไธฒไธ€็›ดๅˆ‡็‰‡ๅˆฐๆœซๅฐพใ€‚

The functions head and tail are specialised versions used for slicing the beginning and end of a string, respectively.
ๅ‡ฝๆ•ฐ head ๅ’Œ tail ๆ˜ฏไธ“้—จ็”จไบŽๅˆ†ๅˆซๅˆ‡็‰‡ๅญ—็ฌฆไธฒๅผ€ๅคดๅ’Œ็ป“ๅฐพ็š„ไธ“็”จ็‰ˆๆœฌใ€‚

str.slice ยท str.head ยท str.tail

df = pl.DataFrame(
    {
        "fruits": ["pear", "mango", "dragonfruit", "passionfruit"],
        "n": [1, -1, 4, -4],
    }
)

result = df.with_columns(
    pl.col("fruits").str.slice(pl.col("n")).alias("slice"),
    pl.col("fruits").str.head(pl.col("n")).alias("head"),
    pl.col("fruits").str.tail(pl.col("n")).alias("tail"),
)
print(result)

str.str_slice ยท str.str_head ยท str.str_tail

let df = df! (
    "fruits" => ["pear", "mango", "dragonfruit", "passionfruit"],
    "n" => [1, -1, 4, -4],
)?;

let result = df
    .clone()
    .lazy()
    .with_columns([
        col("fruits")
            .str()
            .slice(col("n"), lit(NULL))
            .alias("slice"),
        col("fruits").str().head(col("n")).alias("head"),
        col("fruits").str().tail(col("n")).alias("tail"),
    ])
    .collect()?;

println!("{}", result);

shape: (4, 5)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ fruits       โ”† n   โ”† slice   โ”† head     โ”† tail     โ”‚
โ”‚ ---          โ”† --- โ”† ---     โ”† ---      โ”† ---      โ”‚
โ”‚ str          โ”† i64 โ”† str     โ”† str      โ”† str      โ”‚
โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ก
โ”‚ pear         โ”† 1   โ”† ear     โ”† p        โ”† r        โ”‚
โ”‚ mango        โ”† -1  โ”† o       โ”† mang     โ”† ango     โ”‚
โ”‚ dragonfruit  โ”† 4   โ”† onfruit โ”† drag     โ”† ruit     โ”‚
โ”‚ passionfruit โ”† -4  โ”† ruit    โ”† passionf โ”† ionfruit โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

API documentation  API ๆ–‡ๆกฃ

In addition to the examples covered above, Polars offers various other string manipulation functions. To explore these additional methods, you can go to the API documentation of your chosen programming language for Polars.
้™คไบ†ไธŠ่ฟฐ็คบไพ‹ๅค–๏ผŒPolars ่ฟ˜ๆไพ›ไบ†ๅคš็งๅ…ถไป–ๅญ—็ฌฆไธฒๆ“ไฝœๅ‡ฝๆ•ฐใ€‚่ฆๆŽข็ดข่ฟ™ไบ›้ขๅค–ๆ–นๆณ•๏ผŒๆ‚จๅฏไปฅๅ‰ๅพ€ๆ‰€้€‰็ผ–็จ‹่ฏญ่จ€็š„ Polars API ๆ–‡ๆกฃ่ฟ›่กŒๆŸฅ้˜…ใ€‚