Azure Data Factory vs on-premise ETL tools
15 September 2020
Choosing the right ETL tool for your company is a complex task. Both Azure Data Factory and on-premise tools have their strengths and weaknesses. It's important to understand the decision criteria and nuances involved to pick the right tool for the job. In this article we'll clarify the key differences and help you make the right decision for your business.
Computing engine
A classic ETL tool like SSIS, PDI, Talend, ... is a desktop tool that requires a good-sized server for the tool to run on. Such a server requires management and monitoring to keep track of system health. This means that the size of the server needs to be determined upfront. The ETL tool also has to be maintained and upgraded when necessary. Azure Data Factory on the other hand is a cloud-based, serverless service that doesn't require any hardware or any installation. Since no server is required, no monitoring or managing is needed. Updates of Data Factory are fully managed by Microsoft Azure and are implemented automatically.
Costs
Comparing the cost of the 2 systems is an impossible job since costs depends on multiple parameters and the pricing models differ totally. When using a classic ETL tool the cost is especially determined by the price of the used server and the license cost of the ETL tool. This price can vary depending on the type of the server installed and the choice of the ETL tool.
Data Factory uses a pay-as-you go system. This means that no up-front costs are necessary and that you will only be charged for the services you use. Pricing for Data Factory is roughly based on the time needed to process the data and the performance-scaling you use.
The difficulty here is to find a perfect equilibrium between time and performance to lower the cost. A disadvantage of this pricing model is that the invoice will be less predictable than a classic environment. Every month the cost will vary depending of different parameters. However MS Azure foresees a built-in logging and monitoring service where costs are monitored automatically and collected into a user-friendly dashboard. You can create budgets to manage costs and create alerts that automatically notify anomalies and overspending risks.
Performance
In terms of performance, both classic ETL tools and Azure Data Factory have controls to help optimize runtimes, which are more or less comparable. However, in good written ETL, the answer to performance issues is often to add more resources (memory, CPU, disk I/O). This is where Azure Data Factory is more flexible, because you can easily change the performance settings with one click, and it is even possible at pipeline-level or for a single execution. By using this technique you can assign more resources to heavy-loaded pipelines which results in a better performance at lower costs.
Scalability
When using a classic ETL tool, scaling can be a bottleneck for an agile business. Classic ETL tools are not always created to handle big volumes of data and the type of server cannot always be switched easily, which will result in a higher time to market. This is one of the main reasons why cloud is so conductive: your system can be upgraded in no time to each desired level.
Azure Data Factory is not standalone
When using Data Factory, not only standard ETL-transformations are embedded, but also more advanced components are integrated such as Azure Databricks, Azure Machine Learning, HDInsight, Azure Data Lake Analytics, etc. This enables the developer to use more advanced services of Microsoft Azure within a data-pipeline. When using these components resources will be provided automatically by the trigger of the Data Factory-pipeline. This can be very useful when creating for example predictive pipelines, or to test a specific use-case. This architecture is not possible within a classic ETL tool since these components are not standardly integrated.
How can we help?
As an experienced Microsoft partner, Datashift helps companies develop new data platforms in the cloud. The specific solution we use is Azure Data Factory v2. We also help build your business case to migrate existing architecture to Data Factory & Microsoft Azure. Don't hesitate to reach out for more info!