DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Leveraging Apache Airflow on AWS EKS (Part 1): Foundations of Data Orchestration in the Cloud
  • Inspecting Cloud Composer - Apache Airflow
  • The Rise of Kubernetes: Reshaping the Future of Application Development
  • Pure Storage Accelerates Application Modernization With Robust Kubernetes and Cloud-Native Solutions

Trending

  • A Complete Guide To Implementing GraphQL for Java
  • Linting Excellence: How Black, isort, and Ruff Elevate Python Code Quality
  • Open-Source Dapr for Spring Boot Developers
  • Explore the Complete Guide to Various Internet of Things (IoT) Protocols
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. How To Set up a Local Airflow Installation With Kubernetes

How To Set up a Local Airflow Installation With Kubernetes

At the end of this guided and insightful installation, you will unlock new skills in setting up a local Airflow installation running in Kubernetes.

By 
Karthik Rajashekaran user avatar
Karthik Rajashekaran
·
May. 22, 24 · Tutorial
Like (2)
Save
Tweet
Share
664 Views

Join the DZone community and get the full member experience.

Join For Free

Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. This definition was provided by Airbnb when the platform was developed at the startup. In 2014, it was used to manage complex data engineering pipelines, and since then, it has been widely adopted. Some companies using Apache Airflow include Slack, Zoom, eBay, and Adobe, among others.

In this article, we dive into the intricacies of setting up a local airflow installation running in Kubernetes.

At the end of this guided and insightful installation, you will unlock new skills in setting up a local Airflow installation running in Kubernetes. These skills are necessary to accelerate your Airflow development and stay ahead in today's competitive tech landscape.

Don’t be discouraged by the intricacies of the installation, you can install locally by choosing your favorite OS flavor, Windows or Mac!  

The statistics speak!  

airflow

Credit: Astronomer

Are you ready to learn? 

Let's dive in for an insightful installation! 

1. Installation Prerequisites 

Please make sure that the following prerequisites are installed on your local machine:

If you are on Mac you should check the following tools 

  • I use brew on my Mac but installing these tools should be easy enough on other operating systems.
  • kubectl: the Kubernetes command line tool, is a precious CLI designed to manage Kubernetes objects and clusters.
  • helm: It is a valuable package manager for Kubernetes, enabling developers and operators to package, configure, and deploy applications and services on Kubernetes clusters. 
  • k3d: It's a small program designed to run a Kubernetes cluster in Docker. 
  • yq: manipulates yaml files and simplifies editing them from the command line.

kubectl

If you are on Windows you should visit the same tools with guided documentation for a particular setup: 

  • For Kubernetes tools
  • For Helm 
  • For k3d 
  • For yq: Using Chocolateychoco install yq

2. Building the Airflow Image With DockerFile

Docker remains the de facto standard for containerization in the industry, below is the DockerFile.

DockerFile

 Here we build an image named Apache/airflow with the version tag/tag 2.3.3.

build

3. Setup a Local Image Registry to Load Airflow Image

The below command is used to create a local container registry named "myregistry.localhost" with a specified port number of 12345. This local registry allows for storing and accessing Docker images within the Kubernetes cluster managed by k3d. It provides a convenient way to manage and distribute container images locally without the need for an external registry service.

localhost

4. Upload the Airflow Image to a Local Image Registry

Upload the Airflow Image to a Local Image Registry

5. Add Registry at /Etc/Hosts

The below entry is added to the /etc/hosts  file which maps the hostname to the IP address, which is the localhost. It is often used in development environments where services are running locally, and this hostname needs to be resolved to the localhost IP address.

Add Registry at /Etc/Hosts

6. Create the Kubernetes Cluster

Set configuration for mounting the local DAGs folder to the Kubernetes cluster.

The below command sets up the mounting of the DAGs folder to each of the Kubernetes nodes. This is important for connecting the local filesystem directly to each Kubernetes pod that will be running as a part of the Airflow cluster.

k3d.yaml contains the mounts from Mac to k3d container.

Create the Kubernetes Cluster

We need a Kubernetes cluster on which to run Airflow and k3d makes that very simple.

We need a Kubernetes cluster on which to run Airflow and k3d makes that very simple.

7. Test Airflow Image Creation Over Kubernetes

The command creates a new container named airflow-test using the specified Docker image and then starts an interactive Bash shell inside the container.

Test Airflow Image Creation Over Kubernetes

8. Deploy the Kubernetes Dashboard

#service-account.yaml

Applies the Configuration in the YAML file service-account.yaml to the Kubernetes cluster.

Deploy the Kubernetes Dashboard

This step deploys the Kubernetes dashboard to the cluster. It applies the required configuration from the official Kubernetes dashboard repository. 

This step deploys the Kubernetes dashboard to the cluster. It applies the required configuration from the official Kubernetes dashboard repository.

9. Kubernetes Dashboard URL

Sign in with the token from the above step.

Sign in with the token from the above step.

10. Instantiate Airflow Over Kubernetes

override-values.yaml values contain the mount points from the k3d container to the airflow scheduler and workers.

The Kubernetes Executor allows you to run all the Airflow tasks on Kubernetes as separate Pods.

A screenshot of a computer programDescription automatically generatedThe Kubernetes Executor allows you to run all the Airflow tasks on Kubernetes as separate Pods.

pvc-claim.yaml Persistent Volume Claim is a request for storage resources from a cluster.

pvc-claim.yaml Persistent Volume Claim is a request for storage resources from a cluster.

Helm command to install airflow in Kubernetes.

Helm command to install airflow in Kubernetes.

11. Forward Airflow UI From the k3d Container to the Local

This command sets up port forwarding from a local port (8080) to the port (8080) of the service named airflow-webserver within the airflow namespace in Kubernetes. This allows the browser to access the web server of the Airflow application running in the Kubernetes cluster locally on your machine.

Forward Airflow UI From the k3d Container to the Local

12. Stop the Cluster

The below command is used to stop the Kubernetes cluster created using k3d.

k3d cluster

13. Start the Cluster

The below command is used to re-create the Kubernetes cluster using k3d.

The below command is used to re-create the Kubernetes cluster using k3d.

14. Delete the Airflow Instance

Never run this unless you want to completely delete the airflow setup and create a new setup.

Delete the Airflow Instance

15. Delete the Cluster

 delete the cluster 

Conclusion 

Here are the steps to locally install Apache Airflow in Kubernetes. Widely adopted in the industry, Airflow is crucial for developers, DevOps engineers, machine learning engineers, and so many job roles, as data is ubiquitous, knowing how to work with different workflows provides significant value to companies in this competitive tech economy.

Apache Airflow Kubernetes

Opinions expressed by DZone contributors are their own.

Related

  • Leveraging Apache Airflow on AWS EKS (Part 1): Foundations of Data Orchestration in the Cloud
  • Inspecting Cloud Composer - Apache Airflow
  • The Rise of Kubernetes: Reshaping the Future of Application Development
  • Pure Storage Accelerates Application Modernization With Robust Kubernetes and Cloud-Native Solutions

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: