ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) are two terms commonly used in the realm of Data Engineering and more specifically in the context of data ingestion and transformation.
While these terms are often used interchangeably, they refer to slightly different concepts and have different implications for the design of a data pipeline.
In this post, we will clarify the definitions of ETL and ELT processes, outline the differences between the two, and discuss the advantages and disadvantages both have to offer to engineers and data teams in general.
And most importantly, I am going to describe how the recent changes in modern data teams’ formation has impacted the landscape around ETL vs ELT battle.
The main stake when it comes to comparing ETL and ELT is obviously the sequence the Extract, Load and Transform steps are executed within a data pipeline.
For now, let’s ignore this execution sequence and let’s focus on the actual terminology and discuss about what each individual step is supposed to do.
Extract: This step refers to the process of pulling data from a persistent source. This data source could be a database, an API endpoint, a file or really anything that contains any form of data, including both structured or unstructured.
Transform: In this step, the pipeline is expected to perform some changes in the structure or format of the data in order to achieve a certain goal. A transformation could be an attribute selection, a modification of records (e.g. transform 'United Kingdom'
into 'UK'
), a data validation, a join to another source or really anything that changes the format of the input raw data.
Load: The load step refers to the process of copying the data (either the raw or the transformed version) into the target system…