Hive vs Iceberg Tables in AWS Athena
Choosing the Best Option for Your Data Pipelines with dbt
Choosing the Best Option for Your Data Pipelines with dbt
Envision a situation where you're tasked with managing clickstream data received via Snowplow. In this blog post, we'll guide you through our solution, step by step.
Tired of manually provisioning and managing your infrastructure? Well, then it's time to adopt best practices and treat your infrastructure as code. In this blog post, we’ll be diving into the world of Infrastructure as Code (IaC) using one of the most popular tools available - Terraform.
By the end of this post, you’ll have a better understanding of how to leverage Terraform to deploy your AWS Glue Pyspark jobs, giving you a more automated and scalable infrastructure. So, let’s get started and spark your infrastructure!
AWS Glue, the serverless ETL service of AWS, supports two types of jobs: Spark and Python shell. In this article, we'll focus on Python shell jobs and explain how you can make optimal use of your S3 Data Lake using Athena within Python shell jobs.