Exercise 2: Performing a Big Data workflow with Spark and Polars

In this session, we will demonstrate how to use Apache Spark, a powerful Big Data processing engine, for processing large datasets in ML projects. The session will cover tasks such as setting up a Spark environment, reading and writing data, and performing transformations and aggregations on data. Additionally, we will also introduce Polars, a similar data manipulation library for Rust, and compare its features to those of Spark.

This session will provide hands-on exercises to reinforce your understanding and skills in working with Spark and Polars for processing big data in ML projects.

Notebooks

Polars

Spark