Airflow — The Easy Way

“Running Airflow on AWS EC2 & RDS using docker-compose”

·

4 min read

Airflow — The Easy Way

Hello Folks,

Lets start the year on the roll. Wishing you all a successful learning year.

I am Kunal Shah, AWS Certified Solutions Architect, helping clients to achieve optimal solutions on the Cloud. Cloud Enabler by choice, having 6+ Years of experience in the IT industry. I love to talk about Cloud Technology, Digital Transformation, Analytics, DevOps, Operational efficiency, Cost Optimization, Cloud Networking & Security.

You can reach out to me @ linkedin.com/in/kunal-shah07

Abstract

For quick set up of Apache Airflow, we will deploy airflow using docker-compose and run it on AWS EC2 & RDS Instance. Some of the readers reached out to me for more easy & development friendly playground for Airflow Setup on AWS.

Here I am with Airflow — The Easy Way

Table Of Contents

• Introduction

• Prerequisites

• Architecture

• AWS Infrastructure Provisioning

• Airflow Provisioning

• Environment Validation

• Cleanup

Introduction -

Airflow — Please check my first blog

docker-compose — It is used to run multiple containers as a single service. For example, suppose you had an application which required NGNIX and MySQL, you could create one file which would start both the containers as a service without the need to start each one separately.

The docker-compose.yaml contains several service definitions:

1. airflow-scheduler — The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.

2. airflow-webserver — The webserver available at localhost:8080.

3. airflow-worker — The worker that executes the tasks given by the scheduler.

4. airflow-init — The initialization service.

5. flower — The flower app for monitoring the environment & available at localhost:5555.

6. redis — The redis — broker that forwards messages from scheduler to worker.

• Some directories in the container are mounted, which means that their contents are synchronized between the services.

  • ./dags — you can put your DAG files here.
  • ./logs — contains logs from task execution and scheduler.
  • ./plugins — you can put your custom plugins here.

Prerequisites -

• Must have access to an AWS account with the required roles or permissions. The below steps can be run from AWS EC2 Instance(Ubuntu) in the given AWS account with necessary access permissions.

• AWS Services — Full Access to RDS, EC2, IAM, S3, VPC.

• Tools Dependencies — AWS CLI (V2), Cron, docker-compose.

Architecture -

image.png

AWS Infrastructure Provisioning -

• Create two S3 buckets for DAGs & Plugins from AWS Console.

• Amazon EC2 Instance having latest Ubuntu AMI.

• Amazon RDS PostgreSQL Database

• Deploy the CloudFormation scripts from Repo

• Your AWS EC2 Instance & AWS RDS Instance are ready to use.

• Install AWS CLI version 2 & configure. docs.aws.amazon.com/cli/latest/userguide/ge..

$ aws configure

AWS Access Key ID [None]: (Your Access Key)

AWS Secret Access Key [None]: (Your Secret Key)

Default region name [None]: (Your Region)

Default output format [None]: json

• Install Ubuntu Desktop & XRDP for remote RDP.

# sudo apt-get update && sudo apt-get upgrade

# sudo apt install tasksel

# sudo tasksel install ubuntu-desktop

# reboot (You have to Log In Again to EC2 Instance & run the below command)

# sudo apt-get install xrdp

• Now you can either change the user ubuntu password or create a new user.

• This will be used for RDP authentication.

• Install vim editor -> apt install vim

• Install Cron -> apt install cron

• (Optional) Install Google Chrome browser. Run below mentioned commands in the given order.

> # wget dl.google.com/linux/direct/google-chrome-st..

> # sudo apt install ./google-chrome-stable_current_amd64.deb

Airflow Provisioning -

• Copy the docker-compose.yaml file on AWS EC2 Instance & update below parameters.

> ‘AIRFLOWCORESQL_ALCHEMY_CONN’

> ‘AIRFLOWCELERYRESULT_BACKEND’

• set the env variable -> echo -e “AIRFLOW_UID=50000\nAIRFLOW_GID=0” > .env

• Create local folders on EC2 instance -> mkdir ./dags ./logs ./plugins

• Install docker-compose -> > sudo curl -L “github.com/docker/compose/releases/download.. -s)-$(uname -m)” -o /usr/bin/docker-compose

• Set Crontab to Sync s3 folder to EC2 local folder.

> # crontab -e

> # add below commands inside the editor.

> # /usr/local/bin/aws s3 sync s3:// /root/dags/*

> # /usr/local/bin/aws s3 sync s3:// /root/plugins/*

> # Change s3 folder as per your environment bucket folder.

• Start the Cron service -> service cron start

• Deploy Airflow through docker-compose -> docker-compose up -d

• Please verify container status using below commands from EC2 bash terminal

> # docker ps

> # docker-compose run airflow-worker airflow info

image.png

• To upload custom DAGs on Airflow Web UI -

• We need to upload DAGs & plugins file in the respective created s3 bucket. Environment Validation -

• RDP to AWS EC2 instance & Open Google Chrome browser.

• Enter Webserver URL — localhost:8080

image.png

• Enter Credentials > username — airflow > password — airflow

• After login Check the DAGs & start running it.

image.png

• As you trigger the DAG, Airflow will create pods to execute the code included in the DAG.

image.png

DAGs Running Status

• Check RDS connections on AWS Console it will show current connections from Airflow docker.

• Voilaaaa..!! Airflow is ready on AWS EC2 & RDS.

Pros- Easy, Fast, developer friendly setup

Cons- Not production ready, Performance issues, Slowness

Cleanup -

• docker compose stop.

• Delete the CloudFormation template of AWS EC2 & RDS.

• Delete the S3 buckets created from console.

THANK YOU & FOLLOW FOR MORE..

I had fun deploying this setup & playing around AWS EC2, RDS & AIRFLOW.

Hope you guys like it & start playing around.

More things lined up around AWS Stay Tuned..

“Nothing is particularly hard if you break it down into small bits”

image.png