Strings ๅญ็ฌฆไธฒ
The following section discusses operations performed on string data, which is a frequently used data
type when working with dataframes. String processing functions are available in the namespace str
.
ไปฅไธ้จๅ่ฎจ่ฎบๅฏนๅญ็ฌฆไธฒๆฐๆฎๆง่ก็ๆไฝ๏ผ่ฟๆฏๅจๅค็ๆฐๆฎๆกๆถๅธธ็จ็ๆฐๆฎ็ฑปๅใๅญ็ฌฆไธฒๅค็ๅฝๆฐๅฏๅจๅฝๅ็ฉบ้ด str
ไธญไฝฟ็จใ
Working with strings in other dataframe libraries can be highly inefficient due to the fact that
strings have unpredictable lengths. Polars mitigates these inefficiencies by
following the Arrow Columnar Format specification,
so you can write performant data queries on string data too.
ๅจๅ
ถไปๆฐๆฎๆกๅบไธญๅค็ๅญ็ฌฆไธฒๅฏ่ฝๆ็ๆไฝ๏ผๅ ไธบๅญ็ฌฆไธฒ้ฟๅบฆไธๅฏ้ขๆตใPolars ้่ฟ้ตๅพช Arrow ๅๅผๆ ผๅผ่ง่ๆฅ็ผ่งฃ่ฟไบไฝๆ้ฎ้ข๏ผๅ ๆญคๆจไน่ฝๅฏนๅญ็ฌฆไธฒๆฐๆฎ็ผๅ้ซๆง่ฝๆฅ่ฏขใ
The string namespace ๅญ็ฌฆไธฒๅฝๅ็ฉบ้ด
When working with string data you will likely need to access the namespace str
, which aggregates
40+ functions that let you work with strings. As an example of how to access functions from within
that namespace, the snippet below shows how to compute the length of the strings in a column in
terms of the number of bytes and the number of characters:
ๅจๅค็ๅญ็ฌฆไธฒๆฐๆฎๆถ๏ผๆจๅพๅฏ่ฝ้่ฆ่ฎฟ้ฎๅฝๅ็ฉบ้ด str
๏ผๅฎ้ๆไบ 40 ๅคไธช็จไบๆไฝๅญ็ฌฆไธฒ็ๅฝๆฐใไปฅไธไปฃ็ ็ๆฎตๅฑ็คบไบๅฆไฝไป่ฏฅๅฝๅ็ฉบ้ดไธญ่ฐ็จๅฝๆฐๆฅ่ฎก็ฎๅไธญๅญ็ฌฆไธฒ็ๅญ่้ฟๅบฆๅๅญ็ฌฆ้ฟๅบฆ๏ผไฝไธบไฝฟ็จ็คบไพ๏ผ
str.len_bytes
ยท str.len_chars
import polars as pl
df = pl.DataFrame(
{
"language": ["English", "Dutch", "Portuguese", "Finish"],
"fruit": ["pear", "peer", "pรชra", "pรครคrynรค"],
}
)
result = df.with_columns(
pl.col("fruit").str.len_bytes().alias("byte_count"),
pl.col("fruit").str.len_chars().alias("letter_count"),
)
print(result)
str.len_bytes
ยท str.len_chars
use polars::prelude::*;
let df = df! (
"language" => ["English", "Dutch", "Portuguese", "Finish"],
"fruit" => ["pear", "peer", "pรชra", "pรครคrynรค"],
)?;
let result = df
.clone()
.lazy()
.with_columns([
col("fruit").str().len_bytes().alias("byte_count"),
col("fruit").str().len_chars().alias("letter_count"),
])
.collect()?;
println!("{}", result);
shape: (4, 4)
โโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ language โ fruit โ byte_count โ letter_count โ
โ --- โ --- โ --- โ --- โ
โ str โ str โ u32 โ u32 โ
โโโโโโโโโโโโโโชโโโโโโโโโโชโโโโโโโโโโโโโชโโโโโโโโโโโโโโโก
โ English โ pear โ 4 โ 4 โ
โ Dutch โ peer โ 4 โ 4 โ
โ Portuguese โ pรชra โ 5 โ 4 โ
โ Finish โ pรครคrynรค โ 10 โ 7 โ
โโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Note ๆณจๆ
If you are working exclusively with ASCII text, then the results of the two computations will be the same and using len_bytes
is recommended since it is faster.
่ฅๆจไป
ๅค็ ASCII ๆๆฌ๏ผไธค็ง่ฎก็ฎๆนๅผ็็ปๆๅฐ็ธๅ๏ผๆญคๆถๆจ่ไฝฟ็จ len_bytes
๏ผๅ ๅ
ถ้ๅบฆๆดๅฟซใ
Parsing strings ่งฃๆๅญ็ฌฆไธฒ
Polars offers multiple methods for checking and parsing elements of a string column, namely checking
for the existence of given substrings or patterns, and counting, extracting, or replacing, them. We
will demonstrate some of these operations in the upcoming examples.
Polars ๆไพไบๅค็งๆนๆณๆฅๆฃๆฅๅ่งฃๆๅญ็ฌฆไธฒๅ็ๅ
็ด ๏ผๅ
ๆฌๆฃๆฅๆฏๅฆๅญๅจ็ปๅฎ็ๅญๅญ็ฌฆไธฒๆๆจกๅผ๏ผไปฅๅ่ฎกๆฐใๆๅๆๆฟๆขๅฎไปฌใๆไปฌๅฐๅจๆฅไธๆฅ็็คบไพไธญๆผ็คบๅ
ถไธญไธไบๆไฝใ
Check for the existence of a pattern
ๆฃๆฅๆจกๅผๆฏๅฆๅญๅจ
We can use the function contains
to check for the presence of a pattern within a string. By
default, the argument to the function contains
is interpreted as a regular expression. If you want
to specify a literal substring, set the parameter literal
to True
.
ๆไปฌๅฏไปฅไฝฟ็จๅฝๆฐ contains
ๆฅๆฃๆฅๅญ็ฌฆไธฒไธญๆฏๅฆๅญๅจๆไธชๆจกๅผใ้ป่ฎคๆ
ๅตไธ๏ผๅฝๆฐ contains
็ๅๆฐไผ่ขซ่งฃ้ไธบๆญฃๅ่กจ่พพๅผใๅฆๆไฝ ๆณๆๅฎไธไธชๅญ้ขๅญๅญ็ฌฆไธฒ๏ผ่ฏทๅฐๅๆฐ literal
่ฎพ็ฝฎไธบ True
ใ
For the special cases where you want to check if the strings start or end with a fixed substring,
you can use the functions starts_with
or ends_with
, respectively.
ๅฏนไบ้่ฆๆฃๆฅๅญ็ฌฆไธฒๆฏๅฆไปฅๅบๅฎๅญๅญ็ฌฆไธฒๅผๅคดๆ็ปๅฐพ็็นๆฎๆ
ๅต๏ผๅฏไปฅๅๅซไฝฟ็จๅฝๆฐ starts_with
ๆ ends_with
ใ
str.contains
ยท str.starts_with
ยท str.ends_with
result = df.select(
pl.col("fruit"),
pl.col("fruit").str.starts_with("p").alias("starts_with_p"),
pl.col("fruit").str.contains("p..r").alias("p..r"),
pl.col("fruit").str.contains("e+").alias("e+"),
pl.col("fruit").str.ends_with("r").alias("ends_with_r"),
)
print(result)
str.contains
ยท str.starts_with
ยท str.ends_with
ยท Available on feature regex
let result = df
.clone()
.lazy()
.select([
col("fruit"),
col("fruit")
.str()
.starts_with(lit("p"))
.alias("starts_with_p"),
col("fruit").str().contains(lit("p..r"), true).alias("p..r"),
col("fruit").str().contains(lit("e+"), true).alias("e+"),
col("fruit").str().ends_with(lit("r")).alias("ends_with_r"),
])
.collect()?;
println!("{}", result);
shape: (4, 5)
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโฌโโโโโโโโฌโโโโโโโโโโโโโโ
โ fruit โ starts_with_p โ p..r โ e+ โ ends_with_r โ
โ --- โ --- โ --- โ --- โ --- โ
โ str โ bool โ bool โ bool โ bool โ
โโโโโโโโโโโชโโโโโโโโโโโโโโโโชโโโโโโโโชโโโโโโโโชโโโโโโโโโโโโโโก
โ pear โ true โ true โ true โ true โ
โ peer โ true โ true โ true โ true โ
โ pรชra โ true โ false โ false โ false โ
โ pรครคrynรค โ true โ true โ false โ false โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโดโโโโโโโโดโโโโโโโโโโโโโโ
Regex specification ๆญฃๅ่กจ่พพๅผ่ง่
Polars relies on the Rust crate regex
to work with regular expressions, so you may need to
refer to the syntax documentation to see what features
and flags are supported. In particular, note that the flavor of regex supported by Polars is
different from Python's module re
.
Polars ไพ่ต Rust ็ regex
ๅบๆฅๅค็ๆญฃๅ่กจ่พพๅผ๏ผๅ ๆญคๆจๅฏ่ฝ้่ฆๅ่ๅ
ถ่ฏญๆณๆๆกฃไปฅไบ่งฃๆฏๆ็ๅ่ฝๅๆ ๅฟใ็นๅซ้่ฆๆณจๆ็ๆฏ๏ผPolars ๆฏๆ็ๆญฃๅ่กจ่พพๅผ้ฃๆ ผไธ Python ็ re
ๆจกๅๆๆไธๅใ
Extract a pattern ๆๅๆจกๅผ
The function extract
allows us to extract patterns from the string values in a column. The
function extract
accepts a regex pattern with one or more capture groups and extracts the capture
group specified as the second argument.
ๅฝๆฐ extract
ๅ
่ฎธๆไปฌไปๅไธญ็ๅญ็ฌฆไธฒๅผๆๅๆจกๅผใๅฝๆฐ extract
ๆฅๅๅ
ๅซไธไธชๆๅคไธชๆ่ท็ป็ๆญฃๅ่กจ่พพๅผๆจกๅผ๏ผๅนถๆๅไฝไธบ็ฌฌไบไธชๅๆฐๆๅฎ็ๆ่ท็ปใ
df = pl.DataFrame(
{
"urls": [
"http://vote.com/ballon_dor?candidate=messi&ref=polars",
"http://vote.com/ballon_dor?candidat=jorginho&ref=polars",
"http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
]
}
)
result = df.select(
pl.col("urls").str.extract(r"candidate=(\w+)", group_index=1),
)
print(result)
let df = df! (
"urls" => [
"http://vote.com/ballon_dor?candidate=messi&ref=polars",
"http://vote.com/ballon_dor?candidat=jorginho&ref=polars",
"http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
]
)?;
let result = df
.clone()
.lazy()
.select([col("urls").str().extract(lit(r"candidate=(\w+)"), 1)])
.collect()?;
println!("{}", result);
shape: (3, 1)
โโโโโโโโโโโ
โ urls โ
โ --- โ
โ str โ
โโโโโโโโโโโก
โ messi โ
โ null โ
โ ronaldo โ
โโโโโโโโโโโ
To extract all occurrences of a pattern within a string, we can use the function extract_all
. In
the example below, we extract all numbers from a string using the regex pattern (\d+)
, which
matches one or more digits. The resulting output of the function extract_all
is a list containing
all instances of the matched pattern within the string.
่ฆๆๅๅญ็ฌฆไธฒไธญๆๆๅน้
ๆไธชๆจกๅผ็ๅ
ๅฎน๏ผๆไปฌๅฏไปฅไฝฟ็จๅฝๆฐ extract_all
ใๅจไธ้ข็็คบไพไธญ๏ผๆไปฌไฝฟ็จๆญฃๅ่กจ่พพๅผๆจกๅผ (\d+)
๏ผๅน้
ไธไธชๆๅคไธชๆฐๅญ๏ผไปๅญ็ฌฆไธฒไธญๆๅๆๆๆฐๅญใๅฝๆฐ extract_all
็่พๅบ็ปๆๆฏไธไธชๅ่กจ๏ผๅ
ๅซๅญ็ฌฆไธฒไธญๆๆๅน้
่ฏฅๆจกๅผ็ๅฎไพใ
df = pl.DataFrame({"text": ["123 bla 45 asd", "xyz 678 910t"]})
result = df.select(
pl.col("text").str.extract_all(r"(\d+)").alias("extracted_nrs"),
)
print(result)
let df = df! (
"text" => ["123 bla 45 asd", "xyz 678 910t"]
)?;
let result = df
.clone()
.lazy()
.select([col("text")
.str()
.extract_all(lit(r"(\d+)"))
.alias("extracted_nrs")])
.collect()?;
println!("{}", result);
shape: (2, 1)
โโโโโโโโโโโโโโโโโโ
โ extracted_nrs โ
โ --- โ
โ list[str] โ
โโโโโโโโโโโโโโโโโโก
โ ["123", "45"] โ
โ ["678", "910"] โ
โโโโโโโโโโโโโโโโโโ
Replace a pattern ๆฟๆขไธไธชๆจกๅผ
Akin to the functions extract
and extract_all
, Polars provides the functions replace
and
replace_all
. These accept a regex pattern or a literal substring (if the parameter literal
is
set to True
) and perform the replacements specified. The function replace
will make at most one
replacement whereas the function replace_all
will make all the non-overlapping replacements it
finds.
็ฑปไผผไบๅฝๆฐ extract
ๅ extract_all
๏ผPolars ๆไพไบๅฝๆฐ replace
ๅ replace_all
ใ่ฟไบๅฝๆฐๆฅๅไธไธชๆญฃๅ่กจ่พพๅผๆจกๅผๆๅญ้ขๅญๅญ็ฌฆไธฒ๏ผๅฆๆๅๆฐ literal
่ฎพ็ฝฎไธบ True
๏ผ๏ผๅนถๆง่กๆๅฎ็ๆฟๆขๆไฝใๅฝๆฐ replace
ๆๅค่ฟ่กไธๆฌกๆฟๆข๏ผ่ๅฝๆฐ replace_all
ๅไผ่ฟ่กๆๆๆพๅฐ็้้ๅ ๆฟๆขใ
str.replace
ยท str.replace_all
df = pl.DataFrame({"text": ["123abc", "abc456"]})
result = df.with_columns(
pl.col("text").str.replace(r"\d", "-"),
pl.col("text").str.replace_all(r"\d", "-").alias("text_replace_all"),
)
print(result)
str.replace
ยท str.replace_all
ยท Available on feature regex
let df = df! (
"text" => ["123abc", "abc456"]
)?;
let result = df
.clone()
.lazy()
.with_columns([
col("text").str().replace(lit(r"\d"), lit("-"), false),
col("text")
.str()
.replace_all(lit(r"\d"), lit("-"), false)
.alias("text_replace_all"),
])
.collect()?;
println!("{}", result);
shape: (2, 2)
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ text โ text_replace_all โ
โ --- โ --- โ
โ str โ str โ
โโโโโโโโโโชโโโโโโโโโโโโโโโโโโโก
โ -23abc โ ---abc โ
โ abc-56 โ abc--- โ
โโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
Modifying strings ไฟฎๆนๅญ็ฌฆไธฒ
Case conversion ๅคงๅฐๅ่ฝฌๆข
Converting the casing of a string is a common operation and Polars supports it out of the box with
the functions to_lowercase
, to_titlecase
, and to_uppercase
:
่ฝฌๆขๅญ็ฌฆไธฒ็ๅคงๅฐๅๆฏไธ็งๅธธ่งๆไฝ๏ผPolars ้่ฟๅฝๆฐ to_lowercase
ใ to_titlecase
ๅ to_uppercase
ๅ็ๆฏๆ่ฟไธๅ่ฝ๏ผ
str.to_lowercase
ยท str.to_titlecase
ยท str.to_uppercase
addresses = pl.DataFrame(
{
"addresses": [
"128 PERF st",
"Rust blVD, 158",
"PoLaRs Av, 12",
"1042 Query sq",
]
}
)
addresses = addresses.select(
pl.col("addresses").alias("originals"),
pl.col("addresses").str.to_titlecase(),
pl.col("addresses").str.to_lowercase().alias("lower"),
pl.col("addresses").str.to_uppercase().alias("upper"),
)
print(addresses)
str.to_lowercase
ยท str.to_titlecase
ยท str.to_uppercase
ยท Available on feature nightly
let addresses = df! (
"addresses" => [
"128 PERF st",
"Rust blVD, 158",
"PoLaRs Av, 12",
"1042 Query sq",
]
)?;
let addresses = addresses
.clone()
.lazy()
.select([
col("addresses").alias("originals"),
col("addresses").str().to_titlecase(),
col("addresses").str().to_lowercase().alias("lower"),
col("addresses").str().to_uppercase().alias("upper"),
])
.collect()?;
println!("{}", addresses);
shape: (4, 4)
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ originals โ addresses โ lower โ upper โ
โ --- โ --- โ --- โ --- โ
โ str โ str โ str โ str โ
โโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโก
โ 128 PERF st โ 128 Perf St โ 128 perf st โ 128 PERF ST โ
โ Rust blVD, 158 โ Rust Blvd, 158 โ rust blvd, 158 โ RUST BLVD, 158 โ
โ PoLaRs Av, 12 โ Polars Av, 12 โ polars av, 12 โ POLARS AV, 12 โ
โ 1042 Query sq โ 1042 Query Sq โ 1042 query sq โ 1042 QUERY SQ โ
โโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
Stripping characters from the ends
ๅป้คไธค็ซฏๅญ็ฌฆ
Polars provides five functions in the namespace str
that let you strip characters from the ends of
the string:
Polars ๅจ str
ๅฝๅ็ฉบ้ดไธญๆไพไบไบไธชๅฝๆฐ๏ผๅฏ็จไบๅป้คๅญ็ฌฆไธฒไธค็ซฏ็ๅญ็ฌฆ๏ผ
Function ๅฝๆฐ | Behaviour ่กไธบ |
---|---|
strip_chars |
Removes leading and trailing occurrences of the characters specified. ็งป้คๆๅฎๅญ็ฌฆๅจๅผๅคดๅ็ปๅฐพ็ๅบ็ฐใ |
strip_chars_end |
Removes trailing occurrences of the characters specified. ็งป้คๆๅฎๅญ็ฌฆ็ๅฐพ้จๅบ็ฐ้กนใ |
strip_chars_start |
Removes leading occurrences of the characters specified. ็งป้คๆๅฎๅญ็ฌฆ็ๅคด้จๅบ็ฐ้กนใ |
strip_prefix |
Removes an exact substring prefix if present. ๅฆๆๅญๅจ๏ผ็งป้ค็ฒพ็กฎ็ๅญๅญ็ฌฆไธฒๅ็ผใ |
strip_suffix |
Removes an exact substring suffix if present. ๅฆๆๅญๅจ๏ผ็งป้ค็ฒพ็กฎ็ๅญๅญ็ฌฆไธฒๅ็ผใ |
Similarity to Python string methods
ไธ Python ๅญ็ฌฆไธฒๆนๆณ็็ธไผผๆง
strip_chars
is similar to Python's string method strip
and strip_prefix
/strip_suffix
are similar to Python's string methods removeprefix
and removesuffix
, respectively.
It is important to understand that the first three functions interpret their string argument as a
set of characters whereas the functions strip_prefix
and strip_suffix
do interpret their string
argument as a literal string.
้่ฆ็ๆฏ่ฆ็่งฃๅไธไธชๅฝๆฐๅฐๅ
ถๅญ็ฌฆไธฒๅๆฐ่งฃ้ไธบไธ็ปๅญ็ฌฆ๏ผ่ๅฝๆฐ strip_prefix
ๅ strip_suffix
ๅๅฐๅ
ถๅญ็ฌฆไธฒๅๆฐ่งฃ้ไธบๅญ้ขๅญ็ฌฆไธฒใ
str.strip_chars
ยท str.strip_chars_end
ยท str.strip_chars_start
ยท str.strip_prefix
ยท str.strip_suffix
addr = pl.col("addresses")
chars = ", 0123456789"
result = addresses.select(
addr.str.strip_chars(chars).alias("strip"),
addr.str.strip_chars_end(chars).alias("end"),
addr.str.strip_chars_start(chars).alias("start"),
addr.str.strip_prefix("128 ").alias("prefix"),
addr.str.strip_suffix(", 158").alias("suffix"),
)
print(result)
str.strip_chars
ยท str.strip_chars_end
ยท str.strip_chars_start
ยท str.strip_prefix
ยท str.strip_suffix
let addr = col("addresses");
let chars = lit(", 0123456789");
let result = addresses
.clone()
.lazy()
.select([
addr.clone().str().strip_chars(chars.clone()).alias("strip"),
addr.clone()
.str()
.strip_chars_end(chars.clone())
.alias("end"),
addr.clone()
.str()
.strip_chars_start(chars.clone())
.alias("start"),
addr.clone().str().strip_prefix(lit("128 ")).alias("prefix"),
addr.clone()
.str()
.strip_suffix(lit(", 158"))
.alias("suffix"),
])
.collect()?;
println!("{}", result);
shape: (4, 5)
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ strip โ end โ start โ prefix โ suffix โ
โ --- โ --- โ --- โ --- โ --- โ
โ str โ str โ str โ str โ str โ
โโโโโโโโโโโโโชโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโก
โ Perf St โ 128 Perf St โ Perf St โ Perf St โ 128 Perf St โ
โ Rust Blvd โ Rust Blvd โ Rust Blvd, 158 โ Rust Blvd, 158 โ Rust Blvd โ
โ Polars Av โ Polars Av โ Polars Av, 12 โ Polars Av, 12 โ Polars Av, 12 โ
โ Query Sq โ 1042 Query Sq โ Query Sq โ 1042 Query Sq โ 1042 Query Sq โ
โโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
If no argument is provided, the three functions strip_chars
, strip_chars_end
, and
strip_chars_start
, remove whitespace by default.
ๅฆๆๆชๆไพๅๆฐ๏ผไธไธชๅฝๆฐ strip_chars
ใ strip_chars_end
ๅ strip_chars_start
้ป่ฎคไผ็งป้ค็ฉบ็ฝๅญ็ฌฆใ
Slicing ๅ็
Besides extracting substrings as specified by patterns, you can also slice
strings at specified offsets to produce substrings. The general-purpose function for slicing is
slice
and it takes the starting offset and the optional length of the slice. If the length of
the slice is not specified or if it's past the end of the string, Polars slices the string all the
way to the end.
้คไบๆ็
งๆจกๅผๆๅๅญๅญ็ฌฆไธฒๅค๏ผๆจ่ฟๅฏไปฅๅจๆๅฎๅ็งป้ๅคๅฏนๅญ็ฌฆไธฒ่ฟ่กๅ็ไปฅ็ๆๅญๅญ็ฌฆไธฒใ้็จ็ๅ็ๅฝๆฐๆฏ slice
๏ผๅฎๆฅๅ่ตทๅงๅ็งป้ๅๅฏ้็ๅ็้ฟๅบฆใๅฆๆๆชๆๅฎๅ็้ฟๅบฆๆ้ฟๅบฆ่ถ
ๅบๅญ็ฌฆไธฒๆซๅฐพ๏ผPolars ไผๅฐๅญ็ฌฆไธฒไธ็ดๅ็ๅฐๆซๅฐพใ
The functions head
and tail
are specialised versions used for slicing the beginning and end of a
string, respectively.
ๅฝๆฐ head
ๅ tail
ๆฏไธ้จ็จไบๅๅซๅ็ๅญ็ฌฆไธฒๅผๅคดๅ็ปๅฐพ็ไธ็จ็ๆฌใ
str.slice
ยท str.head
ยท str.tail
df = pl.DataFrame(
{
"fruits": ["pear", "mango", "dragonfruit", "passionfruit"],
"n": [1, -1, 4, -4],
}
)
result = df.with_columns(
pl.col("fruits").str.slice(pl.col("n")).alias("slice"),
pl.col("fruits").str.head(pl.col("n")).alias("head"),
pl.col("fruits").str.tail(pl.col("n")).alias("tail"),
)
print(result)
str.str_slice
ยท str.str_head
ยท str.str_tail
let df = df! (
"fruits" => ["pear", "mango", "dragonfruit", "passionfruit"],
"n" => [1, -1, 4, -4],
)?;
let result = df
.clone()
.lazy()
.with_columns([
col("fruits")
.str()
.slice(col("n"), lit(NULL))
.alias("slice"),
col("fruits").str().head(col("n")).alias("head"),
col("fruits").str().tail(col("n")).alias("tail"),
])
.collect()?;
println!("{}", result);
shape: (4, 5)
โโโโโโโโโโโโโโโโฌโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ
โ fruits โ n โ slice โ head โ tail โ
โ --- โ --- โ --- โ --- โ --- โ
โ str โ i64 โ str โ str โ str โ
โโโโโโโโโโโโโโโโชโโโโโโชโโโโโโโโโโชโโโโโโโโโโโชโโโโโโโโโโโก
โ pear โ 1 โ ear โ p โ r โ
โ mango โ -1 โ o โ mang โ ango โ
โ dragonfruit โ 4 โ onfruit โ drag โ ruit โ
โ passionfruit โ -4 โ ruit โ passionf โ ionfruit โ
โโโโโโโโโโโโโโโโดโโโโโโดโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ
API documentation API ๆๆกฃ
In addition to the examples covered above, Polars offers various other string manipulation
functions. To explore these additional methods, you can go to the API documentation of your chosen
programming language for Polars.
้คไบไธ่ฟฐ็คบไพๅค๏ผPolars ่ฟๆไพไบๅค็งๅ
ถไปๅญ็ฌฆไธฒๆไฝๅฝๆฐใ่ฆๆข็ดข่ฟไบ้ขๅคๆนๆณ๏ผๆจๅฏไปฅๅๅพๆ้็ผ็จ่ฏญ่จ็ Polars API ๆๆกฃ่ฟ่กๆฅ้
ใ