🧊 The Ultimate Polars Cheat Sheet for Python Data Pros
Everything you need to know to get started — and go pro — with Polars, the lightning-fast DataFrame library built in Rust.
📦 Getting Started
import polars as pl
Read from CSV / Create from dict
df = pl.read_csv("data.csv")
df = pl.DataFrame({
"name": ["Alice", "Bob"],
"age": [25, 30],
"city": ["NY", "SF"]
})
📊 Basic Exploration
df.head()
df.tail(10)
df.shape
df.columns
df.dtypes
df.describe()
df.null_count()
🔍 Filtering Rows
df.filter(pl.col("age") > 25)
df.filter(
(pl.col("age") > 25) &
(pl.col("city") == "NY")
)
➕ Add / Update Columns
# Simple column
df.with_columns((pl.col("sales") * pl.col("units")).alias("total"))
# Conditional logic
df.with_columns(
pl.when(pl.col("age") >= 30)
.then("Senior")
.otherwise("Junior")
.alias("category")
)
# Literal value
df.with_columns(pl.lit("USA").alias("country"))
✂️ Selecting Columns
df.select(["name", "age"])
# Regex
df.select(pl.col("^a.*")) # columns that start with "a"
# All columns except
df.select([pl.exclude("name")])
📚 Sorting & Ranking
df.sort("age")
df.sort("sales", descending=True)
# Top N rows by value
df.sort("sales", descending=True).head(10)
🔁 Groupby & Aggregations
df.groupby("city").agg([
pl.col("sales").sum().alias("total_sales"),
pl.col("sales").mean().alias("avg_sales")
])
🔗 Joins
df1.join(df2, on="id", how="inner")
df1.join(df2, on="id", how="left")
🔄 Reshaping
# Melt
df.melt(id_vars="city", value_vars=["sales", "units"])
# Transpose (wide to tall)
df.transpose()
🧮 Pivot Tables
df.pivot(
values="sales",
index="date",
columns="region",
aggregate_fn="sum"
)
⏳ Working with Dates & Times
df = df.with_columns(pl.col("timestamp").cast(pl.Datetime))
df.with_columns([
pl.col("timestamp").dt.year().alias("year"),
pl.col("timestamp").dt.month().alias("month"),
pl.col("timestamp").dt.weekday().alias("weekday")
])
Create a date range:
pl.date_range("2023-01-01", "2023-01-10", interval="1d")
🧹 Missing Data
df.null_count()
df.drop_nulls()
df.fill_null(strategy="forward")
df.fill_null(strategy="mean")
⚙️ Lazy Evaluation
lazy_df = (
df.lazy()
.filter(pl.col("sales") > 1000)
.with_columns((pl.col("sales") * pl.col("units")).alias("total"))
.groupby("region")
.agg(pl.col("total").sum())
)
result = lazy_df.collect()
✅ Lazy mode delays execution and optimizes the full pipeline before running — perfect for large data.
🧠 Expressions & Functions
pl.col("sales") # Select a column
pl.lit(1) # Literal
pl.exclude("id") # All columns except...
# String methods
pl.col("name").str.contains("a")
pl.col("name").str.to_lowercase()
# List operations
pl.col("tags").arr.lengths()
📤 Exporting Data
df.write_csv("out.csv")
df.write_parquet("out.parquet")
📊 Performance & Memory
df.estimated_size("mb") # Approximate memory footprint
lazy_df.profile() # Lazy execution performance stats
🔁 Pandas to Polars: 20 Common Translations with Code Examples
Whether you're migrating a project or just exploring Polars, this guide will show you how to convert your Pandas muscle memory into fast, expressive Polars code.
📌 Imports & Setup
import pandas as pd
import polars as pl
Let’s assume df
in Pandas becomes df_pl
in Polars (you can also just name it df
in both cases).
1. Reading CSV
# Pandas
df = pd.read_csv("data.csv")
# Polars
df = pl.read_csv("data.csv")
2. Viewing the Data
# Pandas
df.head()
# Polars
df.head()
3. Column Selection
# Pandas
df[["name", "age"]]
# Polars
df.select(["name", "age"])
4. Filtering Rows
# Pandas
df[df["age"] > 30]
# Polars
df.filter(pl.col("age") > 30)
5. Chained Filters
# Pandas
df[(df["age"] > 30) & (df["city"] == "NY")]
# Polars
df.filter((pl.col("age") > 30) & (pl.col("city") == "NY"))
6. Adding a New Column
# Pandas
df["total"] = df["price"] * df["qty"]
# Polars
df = df.with_columns((pl.col("price") * pl.col("qty")).alias("total"))
7. Conditional Columns
# Pandas
df["level"] = np.where(df["score"] > 80, "High", "Low")
# Polars
df = df.with_columns(
pl.when(pl.col("score") > 80)
.then("High")
.otherwise("Low")
.alias("level")
)
8. Groupby Aggregation
# Pandas
df.groupby("dept")["salary"].mean()
# Polars
df.groupby("dept").agg(pl.col("salary").mean())
9. Sorting
# Pandas
df.sort_values(by="score", ascending=False)
# Polars
df.sort("score", descending=True)
10. Merging (Joins)
# Pandas
pd.merge(df1, df2, on="id", how="left")
# Polars
df1.join(df2, on="id", how="left")
11. Handling Missing Values
# Pandas
df.dropna()
df.fillna(0)
# Polars
df.drop_nulls()
df.fill_null(0)
12. Datetime Operations
# Pandas
df["date"].dt.year
# Polars
df.with_columns(pl.col("date").dt.year().alias("year"))
13. Pivot Table
# Pandas
df.pivot_table(index="region", columns="month", values="sales", aggfunc="sum")
# Polars
df.pivot(
values="sales",
index="region",
columns="month",
aggregate_fn="sum"
)
14. Melting (Unpivoting)
# Pandas
df.melt(id_vars="region", value_vars=["q1", "q2"])
# Polars
df.melt(id_vars="region", value_vars=["q1", "q2"])
15. String Operations
# Pandas
df[df["name"].str.contains("John")]
# Polars
df.filter(pl.col("name").str.contains("John"))
16. Rename Columns
# Pandas
df.rename(columns={"old": "new"})
# Polars
df.rename({"old": "new"})
17. Apply a Custom Function
# Pandas
df["double"] = df["x"].apply(lambda x: x * 2)
# Polars
df = df.with_columns((pl.col("x") * 2).alias("double"))
# ⚠️ Polars avoids row-wise apply — vectorized ops are preferred!
18. Rolling Window Aggregations
# Pandas
df["rolling_avg"] = df["sales"].rolling(7).mean()
# Polars
df = df.with_columns(
pl.col("sales").rolling_mean(window_size=7).alias("rolling_avg")
)
19. Exporting to CSV
# Pandas
df.to_csv("out.csv")
# Polars
df.write_csv("out.csv")
20. Lazy Evaluation for Pipelines
# Pandas
df = pd.read_csv("data.csv")
df = df[df["sales"] > 1000]
df["total"] = df["sales"] * df["qty"]
result = df.groupby("region")["total"].sum()
# Polars (Lazy Mode)
result = (
pl.read_csv("data.csv").lazy()
.filter(pl.col("sales") > 1000)
.with_columns((pl.col("sales") * pl.col("qty")).alias("total"))
.groupby("region")
.agg(pl.col("total").sum())
.collect()
)
💡 Tips for Migrating to Polars
Avoid
apply()
and row-wise logic — use vectorized expressionsUse
pl.col(...)
to reference columns inside transformationsSwitch to
.lazy()
for multi-step processing — it’s much faster on large dataUse
.alias()
to rename or assign new columns in expressionsChain methods fluently — Polars is built for that
🚀 Summary
Polars is a modern, fast, and expressive library that’s ideal for:
✅ Large datasets
✅ Multi-threaded workloads
✅ Fast filtering, grouping, and reshaping
✅ Functional pipelines with .lazy()
It may not fully replace Pandas (yet), but it will supercharge your workflows if you give it a shot.