Choosing between ELT vs ETL for business intelligence

TL;DR: The order in which we extract, load, and transform (ELT) data is part of today’s business decisions as companies can also extract, transform, and load (ETL) data. Small businesses that don’t conceive large data influxes and analysis might benefit from the traditional ELT form of data handling. Yet, the ELT vs ETL discussion also contemplates how larger companies aiming at competitive business intelligence can profit from an ETL model today.

One of the big questions in business intelligence has to do with the ideal order for data extraction, load, and transformation. And, even though the ELT vs ETL question may sound quite simple, the differences in the order in which we handle data can bear a diverse technological weight for any business; including any data handling under a hybrid structure.

In this article, we’re exploring the difference between ELT vs ETL to make each order’s benefits visible given precise circumstances.

What are we talking about when we compare ELT vs ETL?

In both cases, we’re speaking about ways to store and process information that comes from external sources. The acronyms stand for Extract, Load, and Transform versus Extract, Transform, and Load.

One key concept on the topic is that the general objective of both models is the same. Therefore, we handle data for similar reasons in both scenarios.

However, what changes is how we sort the steps in which we engage in that process of data handling.

Extract, transform, and load: the traditional way

In this first configuration, we transform our data before we load it. This has been the traditional way to handle data for the past 20 years. The plus side to this order is how information abides by the scheme we give it.

What’s not so great about it is how we need to transform data in real-time, which requires plenty of resources. When data needs to be handled in large volumes, it gets complicated. A business’ processing capacity simply needs to be bigger, which is exactly how and why ELT came around.

Extract, load, and transform: a new form

Under this second order presented, we extract data from a source to load it without any alterations. The transformation of that data happens at the end of the process.

This particular order of data handling can make processes more efficient or ideal for a company for one purpose or another. It can also be quicker, safer, more expensive or less costly than the traditional ETL form we described at the beginning.

With the information in a data warehouse, we can transform it right away. And this pattern is very much in vogue right now.

But company size matters

While ETL is quick and easy to implement, it’s especially useful for small or medium-sized businesses. For companies that start off neatly ordered, the process is quite simple to handle. Yet, as data volumes get bigger, handling it can easily fall out of their hands. They’re unable to write that data, because the sizes are so large there just isn’t room for processing.

ELT, on the other hand, is a more difficult order and it can take longer, but it’ll give better results in the long run.

An example I like to use to picture this dilemma is imagining you were building a large storage room for everything you get. The opposite to that is having a small storage room for tons of information that needs to be fit into it. And we need to know when to build which.

Think of Rappi in Bogotá, Colombia, for example. The delivery service company once mentioned they never stored the initial data they got from consumers. If they had stored all of that raw data, however, they’d have a data lake on which to start applying business intelligence, comparing customer preferences every year since their launch.

When a business is small, you know its data architecture and don’t really conceive you’ll be growing. If a business is big or we know it’ll grow quickly, ETL, on the other hand, won’t work. Sooner rather than later, it’ll run out of capacity.

With a larger storage house, you can choose what you order, where, and how.

And only transform data that matters

Whenever we transform, we can do so only on data that matters to us for some reason. And the motive can be anything we deem important or anything we want to check.

What’s valuable in this sense is to keep the option open for us to re-design data that’s of our interest. When we transform data before we load it, we can’t go back and draw analysis or conclusions on what’s already loaded. Our capacity isn’t retroactive.

It works wonders for high-level business intelligence to have access to all of a business’ raw data in a reliable storage, however. Once you decide to go digging for any leads or information, you can start asking any questions that interest you – and get answers! It’s all written so quickly, too! Because we’re not looking at the data all the time, but just storing it all very swiftly at highly competitive market speeds and rates.

Cloud computing can certainly lend a hand

It used to be that companies resorted to on-premise solutions with their own servers and improved their hardware with time for bigger capacity.

With cloud computing, however, companies can now just pay for services. It’s much less costly to work out of the cloud as you only pay for the instances when you process data, which can happen when volumes are low or at ideal times during the day for shorter spans.

When processed this way, information might be of a much bigger volume, but companies no longer have to think about that when transforming their data. Businesses can simply hire processing storage to then determine how to interact with those data banks. As such, the focus can move to extracting true business intelligence out of the data you’ve got.

This cloud-computing technology can help end users link diverse data sources to bring that information to a given platform, also. But that requires transformation before loading.

These sets in cloud computing are furthermore unstructured. They’re not SQL, but raw material. And, the rawest it is or the less conditioned that data is, all the better. This is what Google and Facebook do, for example.

At Blankfactor, we’re working on the cloud 100%. When tied to the ELT vs ETL dilemma, we can say we’re using the best of both worlds. We tend to focus more on ETL processes than ELT, but we seek to be capable of abiding, at all times, by any regulations and rules clients need or want to keep their transaction and data security. Within a project, we can work on both ETL as much as ELT orders.

Consider all security risks, as well.

On the note of security above, ELT’s precise challenge is guaranteeing data security as the process bears security risks. That’s the case because, under this model, data is just being written, it’s not being processed.

Depending on the industry, data transformation needs to happen before it can be stored. And this is why many banks would rather use ETL. Banking institutions typically choose small processes on information or not to store certain sensitive data. They do so to guarantee what they’re storing abides by legal policies and standards; especially as per regulating mechanisms in diverse parts of the globe.

Facebook, on the other hand, simply writes every information it gets. It then looks for engineers to seek out information of their interest and write it elsewhere. No one ever thought how that can be regulated or ordered to determine what to do with it. Today, this unlimited access to new information is a great technological advantage.

The truth is that the latest tech is all ELT. Businesses now store everything they get, even if they’re not using it. The reason for that is that, whenever you happen to need anything from it, you can just hire the necessary tools to get the data you need from that stored bank. This is the least expensive option nowadays, too.

An amplified banking example

Right now, our banks may keep a record of every transaction we make. Yet, there are other kinds of data they hold that barely ever changes. If we created a table with client information and fitted their transactions, all those transactions would store what happened in a client’s account. So, if a client removes that data, the transactions table can store it. Yet, for that, I need to have transformed the data before writing it.

In banking, we also commonly need to integrate technologically with systems that are 40 or 50 years old. That includes some in COBOL from back in the late 1980s. Some user transactions today are still being written in COBOL, in fact. In those cases, we have to write the data before loading as ELT wouldn’t work. But even in these circumstances, you can move banking user transactions through a portal, for example, or establish a user interaction with a web or mobile app using only information that’s not sensitive. Doing so can mean applying business intelligence on that data that doesn’t have any data processing. And the best part is how it won’t affect response times if that happens.

[CTA] Find out more about data engineering.

Was this article insightful? Then don’t forget to look at other blog posts and follow us on LinkedIn, Twitter, Facebook, and Instagram.

Data & Digital Engineering

Enterprise AI

Full-Stack Product Development

Banking & Capital Markets

Payments

Insurance

Life Sciences

Choosing between ELT vs ETL for business intelligence

Listen to the article

Juan Sebastián Cuesta Palacio