Social / Business Data Science 2022 > Data Engineering and MLOps > Big Data workflows

Big Data workflows

Introduction to Big Data workflows: In ML projects, it is often necessary to process large amounts of data, known as Big Data.

This will be demonstrated at the case of Spark, a powerful Big Data processing engine allowing to work with large datasets in ML projects. The introduction will include tasks such as setting up a Spark environment, reading and writing data, and performing transformations and aggregations on data.

Recommended Datacamp exercises

PySpark Documentation
Polars Documentation