Getting Started with Polars

· 2 min read · Updated March 17, 2026 · intermediate
python polars data

Polars is a blazingly fast DataFrame library written in Rust but exposed to Python. If you have used pandas, Polars will feel familiar — but with significantly better performance and a more intuitive API for common operations.

Why Polars?

Polars was built from the ground up for speed. Unlike pandas, which originated in Python and inherited some inefficiencies, Polars uses:

  • Rust for the core computation engine
  • Apache Arrow for memory-efficient columnar data representation
  • Parallel execution by default across all available CPU cores

In benchmark tests, Polars frequently outperforms pandas by 5-10x on typical DataFrame operations, and the gap widens for larger datasets. But performance is not the only reason to switch:

  • Eager and Lazy modes — Polars gives you a query optimizer that can dramatically speed up complex pipelines
  • Cleaner API — method chaining feels natural and reduces nested for-loops
  • Better type handling — Polars is stricter about data types, catching errors earlier
  • No GIL bottleneck — Rust handles parallelism without Python'''s Global Interpreter Lock

Installation

Install Polars with pip:

pip install polars

Or with conda:

conda install polars -c conda-forge

There are two variants: the full-featured polars and the lighter polars-lt for constrained environments. Most users want the full version.

Creating DataFrames

Start by importing Polars:

import polars as pl

Create a DataFrame from a dictionary:

df = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["London", "Paris", "Berlin"]
})
print(df)

Basic Operations

Selecting Columns

Use select() to choose columns:

df.select(["name", "age"])

Or use the dot syntax for single columns:

df.name

Filtering Rows

Filter with filter():

df.filter(pl.col("age") > 28)

Adding New Columns

Use with_columns() to add or transform columns:

df.with_columns(
    pl.col("age").alias("age_next_year"),
    (pl.col("age") * 2).alias("age_doubled")
)

Aggregations with GroupBy

Group and aggregate:

df.group_by("city").agg(
    pl.col("age").mean().alias("avg_age"),
    pl.col("name").count().alias("count")
)

Lazy vs Eager Execution

Polars has two execution modes:

Eager — executes immediately

Lazy — builds a query plan and optimizes before execution

Switch to lazy mode with .lazy():

lazy_df = df.lazy()
result = (
    lazy_df
    .filter(pl.col("age") > 25)
    .select(["name", "age"])
    .collect()
)

For large datasets or complex pipelines, always use lazy mode.

Practical Examples

Reading a CSV and Computing Statistics

import polars as pl

df = pl.read_csv("sales.csv")

summary = (
    df.lazy()
    .group_by("product_category")
    .agg([
        pl.col("revenue").sum().alias("total_revenue"),
        pl.col("quantity").mean().alias("avg_quantity"),
        pl.col("id").n_unique().alias("num_products")
    ])
    .sort("total_revenue", descending=True)
    .collect()
)

print(summary)

Handling Missing Data

Polars represents missing values as null:

df = pl.DataFrame({
    "a": [1, 2, None, 4],
    "b": ["x", None, "z", "w"]
})

# Drop rows with any nulls
df.drop_nulls()

# Fill nulls with a value
df.fill_null(0)

See Also