Social / Business Data Science 2022 > Data Engineering and MLOps > Big Data workflows > Exercise 2: Performing a Big Data workflow with Spark and Polars

Exercise 2: Performing a Big Data workflow with Spark and Polars

In this session, we will demonstrate how to use Apache Spark, a powerful Big Data processing engine, for processing large datasets in ML projects. The session will cover tasks such as setting up a Spark environment, reading and writing data, and performing transformations and aggregations on data. Additionally, we will also introduce Polars, a similar data manipulation library for Rust, and compare its features to those of Spark.

This session will provide hands-on exercises to reinforce your understanding and skills in working with Spark and Polars for processing big data in ML projects.

Notebooks

Polars

Exercises for the Bike Sharing Demand Dataset using Polars
Exercises and solutions for the Bike Sharing Demand Dataset using Polars

Spark

Exercises for the Bike Sharing Demand Dataset using Spark
Exercises and solutions for the Bike Sharing Demand Dataset using Spark