ETL vs ELT vs Streaming ETL

Exploring batch and real-time design paradigms for data processing

Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) are two fundamental concepts in the context of data processing, used to describe data ingestion and transformation design paradigms. While these terms are often used interchangeably, they refer to slightly different concepts and are applicable to different use cases that also impose varying designs.

In this article, we will explore the differences and similarities of both ETL and ELT and discuss how the landscape in cloud computing and data engineering has affected data processing design patterns. Furthermore, we will outline the main advantages and disadvantages both have to offer in modern data teams. Lastly, we will discuss Streaming ETL, an emerging data-processing pattern that aims to solve various disadvantages of more traditional batch approaches.

Ingesting and persisting data from external sources into a destination system involves three distinct steps.

Extract
The ‘Extract’ step involves all processes required in order to pull data from a source system. Such sources include an Application Programming Interface (API), a database system or a file, and Internet of Things (IoT) devices while the data can be in any form; structured, semi-structured or unstructured. Data pulled during this step are usually referred to as ‘raw data’.

Transform
During the ‘Transform’ step, the pipeline applies transformations on top of the raw data in order to achieve a certain goal. This goal is usually related to business or technical requirements. Some commonly applied transformations include data modification (e.g. mapping United States to US), record or attribute selection, joins into other data sources or even data validations.

Applying transformation on raw data to achieve a certain goal as part of the ‘Transform’ step in ETL/ELT pipelines — Source: Author

Load
During the ‘load’ step, the data (either raw or transformed) are loaded into a destination system. Usually, the destination is an OLAP system (i.e. a Data Warehouse or…

Source link

What's Hot

How to Build Your Own Roadmap for a Successful Data Science Career | by TDS Editors | Sep, 2024

This AI Paper Introduces a Comprehensive Framework for LLM-Driven Software Engineering Tasks

CollaMamba: A Resource-Efficient Framework for Collaborative Perception in Autonomous Systems

ETL vs ELT vs Streaming ETL

How to Build Your Own Roadmap for a Successful Data Science Career | by TDS Editors | Sep, 2024

Build a Tokenizer for the Thai Language from Scratch | by Milan Tamang | Sep, 2024

Asking for Feedback as a Data Scientist Individual Contributor | by Jose Parreño | Sep, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

How to Build Your Own Roadmap for a Successful Data Science Career | by TDS Editors | Sep, 2024

This AI Paper Introduces a Comprehensive Framework for LLM-Driven Software Engineering Tasks

CollaMamba: A Resource-Efficient Framework for Collaborative Perception in Autonomous Systems

Source2Synth: A New AI Technique for Synthetic Data Generation and Curation Grounded in Real Data Sources

Our Picks

How to Build Your Own Roadmap for a Successful Data Science Career | by TDS Editors | Sep, 2024

This AI Paper Introduces a Comprehensive Framework for LLM-Driven Software Engineering Tasks

CollaMamba: A Resource-Efficient Framework for Collaborative Perception in Autonomous Systems

What's Hot

ETL vs ELT vs Streaming ETL

Exploring batch and real-time design paradigms for data processing

Related Posts

Leave A Reply Cancel Reply