AWS

Hive vs Iceberg Tables in AWS Athena

Cedric Raeymaeckers and Sofie Theys on 10 May 2024

Choosing the Best Option for Your Data Pipelines with dbt

How to build a cost-effective and robust streaming data pipeline

Pieter Coremans on 26 February 2024

Envision a situation where you're tasked with managing clickstream data received via Snowplow. In this blog post, we'll guide you through our solution, step by step.

aws modern data platform

Spark your Infrastructure: Terraform to deploy AWS Glue Pyspark job

Stefanie Turelinckx on 17 April 2023

Tired of manually provisioning and managing your infrastructure? Well, then it's time to adopt best practices and treat your infrastructure as code. In this blog post, we’ll be diving into the world of Infrastructure as Code (IaC) using one of the most popular tools available - Terraform.

By the end of this post, you’ll have a better understanding of how to leverage Terraform to deploy your AWS Glue Pyspark jobs, giving you a more automated and scalable infrastructure. So, let’s get started and spark your infrastructure!

aws modern data platform

How to query your S3 Data Lake using Athena within an AWS Glue Python shell job

Cedric Raeymaeckers on 16 November 2022

AWS Glue, the serverless ETL service of AWS, supports two types of jobs: Spark and Python shell. In this article, we'll focus on Python shell jobs and explain how you can make optimal use of your S3 Data Lake using Athena within Python shell jobs.

Blog

Hive vs Iceberg Tables in AWS Athena

How to build a cost-effective and robust streaming data pipeline

Spark your Infrastructure: Terraform to deploy AWS Glue Pyspark job

How to query your S3 Data Lake using Athena within an AWS Glue Python shell job

Offering

Jobs