In this discussion, I aim to explore the evolving trends in data orchestration and data modelling, highlighting the advancements in tools and their core benefits for data engineers. While Airflow has been the dominant player since 2014, the data engineering landscape has significantly transformed, now addressing more sophisticated use cases and requirements, including support for multiple programming languages, integrations, and enhanced scalability. I will examine contemporary and perhaps unconventional tools that streamline my data engineering processes, enabling me to effortlessly create, manage, and orchestrate robust, durable, and scalable data pipelines.
During the last decade we witnessed a “Cambrian explosion” of various ETL frameworks for data extraction, transformation and orchestration. It’s not a surprise that many of them are open-source and are Python-based.
The most popular ones:
- Airflow, 2014
- Luigi, 2014
- Prefect,2018
- Temporal, 2019
- Flyte, 2020
- Dagster, 2020
- Mage, 2021
- Orchestra, 2023