ETL Design Pattern : E-LT-L
I was ramping up C# to create BI Framework and I hit upon the term “Design Patterns”. I decided to go through few patterns only, as I did not have patience to complete the book, also I assumed I don’t need it. Finally at the end of day, I was able to implement a couple of patterns in my BI Framework application. During this time, I wondered whether there are any pre-defined design patterns in Data Warehouses, ETL, Cubes, and Universes. My quest begins … In this blog, I will focus on ETL alone.
I started looking back at my projects to analyze if I had used any design patterns or whether I hit upon the recurring problems that could have been solved with alternate designs. What I found was quite interesting. I have been using best practices defined for each component all along. Then why were we slogging all the time. What were the problems we were facing every day? Now the right time! I decided to plunge deep in each component to find out exactly what I could have done differently during design which could have made my life easier. And that is where I came up with an ETL design pattern E-LT-L.
Before I explain what this E-LT-L means, let’s look at ETL first. ETL stands for Extraction, Transformation and Loading. In ETL we extract data from multiple data sources and transform the incoming data in a format compatible with data warehouse structure, followed by loading into data marts. Usually companies use some commercial tools like SSIS, BO Data Integrator Designer, etc… The developers then make full use of these tools and they end up using most of the functionality provided in these tools. And to an extent, this looks right too, as the companies have made an investment for this purpose.
Finally when companies, evaluate their ROI, the results are amazing. By following industry standards and using commercial ETL tools, training their development teams, the results do not look good. A new set of problems like performance problems, steep learning curves, fixing the parts that are not broken, buying additional hardware, etc. have come up. From ETL perspective, even though, most companies have a designated job server, they do not get a good performance. After a while in production, when there data marts grow in size or the size of incoming data increases, the performance of ETL job takes a hit. To resolve these issues, many things are done like increasing configuration of job server, make changes in database structure, use bulk loading options (favorite choice for techies), split the jobs, pushing few things like summarization over to weekends, etc.. This results in making the system more complex than it should be which has a direct hit on overall IT cost.
So exactly what went wrong?
This is where a new pattern E-LT-L comes into existence. Most of the recurring problems can be resolved using this design pattern.
E-LT-L stands for Extraction, Loading and Transformation, and Loading. This basically suggests that, once the data is extracted, instead of applying transformations (T) in the staging area, load this data into data mart; and then apply transformations there (LT). Since, incoming (raw) data is in data mart already, it would make more sense to use database objects (stored procedures) for transformations instead of row-based transformations available in ETL tools. Using database based transformations would resolve most of the problems.
Effectively, this pattern calls for BULK loading and transformations at correct place without moving huge amount of data around.
This may sound strange and many people would agree to debate. To prove the point, I would present few scenarios and let you decide what is good for your implementation.
Scenario 1: Let’s assume you have 1 million incoming records that need to do look up for say customers. No matter what tool you use and how much you configure it, it has to run some sql against customer master, which is typically huge in may DW installations, to get the customer code. This will become a bottleneck as the sql would be executed multiple times. Also to add to the woes, the “customer code” column value has to travel from database server to job server and is stored in the placeholder (variable) in incoming row. Instead if you code the lookup transformations, in stored procedure, with one sql you can update all the rows by simply joining the staging table with customer master table. By all counts, the performance of this sql cannot be beaten.
Scenario 2: We have incoming product master for a large retail chain. We need to implement slowly changing dimension type 2 here (insert new records and update existing records if already present. Any ETL tool would implement this transformation row by row and it can get painfully slow. This transform can be easily developed in stored procedure using MERGE sql statement. For more experienced developers this can be made meta data driven.
To summarize, the transformations done using stored procedures help as follows:
- It does not move data between database servers and job servers. We get a good boost in performance.
- The data is processed in bulk.
- Database engine (Oracle,SQL Server) is designed to do this job in the most efficient manner than any third party ETL tool can do.
- It is always easy to fix/debug the stored procedure than some transform in ETL tool.
- Small learning curve for developers to ramp up on the ETL tools.
- Deployment of ETL packages (which is surely a problem in ETL tools). It takes a lot of effort.
I am sure, many of you would be thinking then why we shall use any commercial tool for ETL at all. These tools have their own significance. We would need a commercial ETL tool to achieve other goals of ETL, such as:
- Defining a workflow where we decide upon what tasks shall execute and in what order. What to do if some task fails.
- Executing tasks in parallel.
- Even though a stored procedure is created to transform some data, it needs to be called from an appropriate place in ETL.
- Use the logging/notifications capabilities of ETL tool, which usually are very efficient and simple.
- Use the scheduling and other features readily available in commercial ETL tools
What this E-LT-L pattern suggests is the design of ETL architecture. By having good ETL tool and good design of ETL, we can write efficient and manageable jobs.
If you have any questions or comments, you can reach me at jagdishm@aditi.com.
Architecture & Design, BI - Business Intelligence, Performance














Hi could you suggest books that might help me better understand ETL patterns. I’m looking for best practises in defining formats and standards for extracting data from various ERP systems.
Hi do you have any suggested readings to help me better understand various ETL patterns ? I am attempting to define standards and a format for extracting data from 3rd party ERP’s and would like to know if you have any helpful ideas on best practices ?
@Sonny
Hi Sonny,
Ralph Kimball’s book on ETL is a good source to start with. He has explained the standards and document templates.
Hope this helps.
Jagdish Malani
Hi,
I need help designing a data migration tool. In which my source can be any form(Excel, DB, flat file, etc) for now, i am taking an excel file with few tables and columns. I want to map columns from source to columns on teh destination. How do i store this mapping in teh DB?