🧊 The Ultimate Polars Cheat Sheet for Python Data Pros

Everything you need to know to get started — and go pro — with Polars, the lightning-fast DataFrame library built in Rust.

Apr 02, 2025

📦 Getting Started

import polars as pl

Read from CSV / Create from dict

df = pl.read_csv("data.csv")

df = pl.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [25, 30],
    "city": ["NY", "SF"]
})

📊 Basic Exploration

df.head()
df.tail(10)
df.shape
df.columns
df.dtypes
df.describe()
df.null_count()

🔍 Filtering Rows

df.filter(pl.col("age") > 25)

df.filter(
    (pl.col("age") > 25) & 
    (pl.col("city") == "NY")
)

➕ Add / Update Columns

# Simple column
df.with_columns((pl.col("sales") * pl.col("units")).alias("total"))

# Conditional logic
df.with_columns(
    pl.when(pl.col("age") >= 30)
      .then("Senior")
      .otherwise("Junior")
      .alias("category")
)

# Literal value
df.with_columns(pl.lit("USA").alias("country"))

✂️ Selecting Columns

df.select(["name", "age"])

# Regex
df.select(pl.col("^a.*"))     # columns that start with "a"

# All columns except
df.select([pl.exclude("name")])

📚 Sorting & Ranking

df.sort("age")
df.sort("sales", descending=True)

# Top N rows by value
df.sort("sales", descending=True).head(10)

🔁 Groupby & Aggregations

df.groupby("city").agg([
    pl.col("sales").sum().alias("total_sales"),
    pl.col("sales").mean().alias("avg_sales")
])

🔗 Joins

df1.join(df2, on="id", how="inner")
df1.join(df2, on="id", how="left")

🔄 Reshaping

# Melt
df.melt(id_vars="city", value_vars=["sales", "units"])

# Transpose (wide to tall)
df.transpose()

🧮 Pivot Tables

df.pivot(
    values="sales",
    index="date",
    columns="region",
    aggregate_fn="sum"
)

⏳ Working with Dates & Times

df = df.with_columns(pl.col("timestamp").cast(pl.Datetime))

df.with_columns([
    pl.col("timestamp").dt.year().alias("year"),
    pl.col("timestamp").dt.month().alias("month"),
    pl.col("timestamp").dt.weekday().alias("weekday")
])

Create a date range:

pl.date_range("2023-01-01", "2023-01-10", interval="1d")

🧹 Missing Data

df.null_count()
df.drop_nulls()
df.fill_null(strategy="forward")
df.fill_null(strategy="mean")

⚙️ Lazy Evaluation

lazy_df = (
    df.lazy()
      .filter(pl.col("sales") > 1000)
      .with_columns((pl.col("sales") * pl.col("units")).alias("total"))
      .groupby("region")
      .agg(pl.col("total").sum())
)

result = lazy_df.collect()

✅ Lazy mode delays execution and optimizes the full pipeline before running — perfect for large data.

🧠 Expressions & Functions

pl.col("sales")          # Select a column
pl.lit(1)                # Literal
pl.exclude("id")         # All columns except...

# String methods
pl.col("name").str.contains("a")
pl.col("name").str.to_lowercase()

# List operations
pl.col("tags").arr.lengths()

📤 Exporting Data

df.write_csv("out.csv")
df.write_parquet("out.parquet")

📊 Performance & Memory

df.estimated_size("mb")   # Approximate memory footprint
lazy_df.profile()         # Lazy execution performance stats

🔁 Pandas to Polars: 20 Common Translations with Code Examples

Whether you're migrating a project or just exploring Polars, this guide will show you how to convert your Pandas muscle memory into fast, expressive Polars code.

📌 Imports & Setup

import pandas as pd
import polars as pl

Let’s assume df in Pandas becomes df_pl in Polars (you can also just name it df in both cases).

1. Reading CSV

# Pandas
df = pd.read_csv("data.csv")

# Polars
df = pl.read_csv("data.csv")

2. Viewing the Data

# Pandas
df.head()

# Polars
df.head()

3. Column Selection

# Pandas
df[["name", "age"]]

# Polars
df.select(["name", "age"])

4. Filtering Rows

# Pandas
df[df["age"] > 30]

# Polars
df.filter(pl.col("age") > 30)

5. Chained Filters

# Pandas
df[(df["age"] > 30) & (df["city"] == "NY")]

# Polars
df.filter((pl.col("age") > 30) & (pl.col("city") == "NY"))

6. Adding a New Column

# Pandas
df["total"] = df["price"] * df["qty"]

# Polars
df = df.with_columns((pl.col("price") * pl.col("qty")).alias("total"))

7. Conditional Columns

# Pandas
df["level"] = np.where(df["score"] > 80, "High", "Low")

# Polars
df = df.with_columns(
    pl.when(pl.col("score") > 80)
      .then("High")
      .otherwise("Low")
      .alias("level")
)

8. Groupby Aggregation

# Pandas
df.groupby("dept")["salary"].mean()

# Polars
df.groupby("dept").agg(pl.col("salary").mean())

9. Sorting

# Pandas
df.sort_values(by="score", ascending=False)

# Polars
df.sort("score", descending=True)

10. Merging (Joins)

# Pandas
pd.merge(df1, df2, on="id", how="left")

# Polars
df1.join(df2, on="id", how="left")

11. Handling Missing Values

# Pandas
df.dropna()
df.fillna(0)

# Polars
df.drop_nulls()
df.fill_null(0)

12. Datetime Operations

# Pandas
df["date"].dt.year

# Polars
df.with_columns(pl.col("date").dt.year().alias("year"))

13. Pivot Table

# Pandas
df.pivot_table(index="region", columns="month", values="sales", aggfunc="sum")

# Polars
df.pivot(
    values="sales",
    index="region",
    columns="month",
    aggregate_fn="sum"
)

14. Melting (Unpivoting)

# Pandas
df.melt(id_vars="region", value_vars=["q1", "q2"])

# Polars
df.melt(id_vars="region", value_vars=["q1", "q2"])

15. String Operations

# Pandas
df[df["name"].str.contains("John")]

# Polars
df.filter(pl.col("name").str.contains("John"))

16. Rename Columns

# Pandas
df.rename(columns={"old": "new"})

# Polars
df.rename({"old": "new"})

17. Apply a Custom Function

# Pandas
df["double"] = df["x"].apply(lambda x: x * 2)

# Polars
df = df.with_columns((pl.col("x") * 2).alias("double"))
# ⚠️ Polars avoids row-wise apply — vectorized ops are preferred!

18. Rolling Window Aggregations

# Pandas
df["rolling_avg"] = df["sales"].rolling(7).mean()

# Polars
df = df.with_columns(
    pl.col("sales").rolling_mean(window_size=7).alias("rolling_avg")
)

19. Exporting to CSV

# Pandas
df.to_csv("out.csv")

# Polars
df.write_csv("out.csv")

20. Lazy Evaluation for Pipelines

# Pandas
df = pd.read_csv("data.csv")
df = df[df["sales"] > 1000]
df["total"] = df["sales"] * df["qty"]
result = df.groupby("region")["total"].sum()

# Polars (Lazy Mode)
result = (
    pl.read_csv("data.csv").lazy()
    .filter(pl.col("sales") > 1000)
    .with_columns((pl.col("sales") * pl.col("qty")).alias("total"))
    .groupby("region")
    .agg(pl.col("total").sum())
    .collect()
)

💡 Tips for Migrating to Polars

Avoid apply() and row-wise logic — use vectorized expressions
Use pl.col(...) to reference columns inside transformations
Switch to .lazy() for multi-step processing — it’s much faster on large data
Use .alias() to rename or assign new columns in expressions
Chain methods fluently — Polars is built for that

🚀 Summary

Polars is a modern, fast, and expressive library that’s ideal for:

✅ Large datasets
✅ Multi-threaded workloads
✅ Fast filtering, grouping, and reshaping
✅ Functional pipelines with .lazy()

It may not fully replace Pandas (yet), but it will supercharge your workflows if you give it a shot.

Data Guy Michael

Discussion about this post

Ready for more?