DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Implementing EKS Multi-Tenancy Using Capsule (Part 3)
  • Strategic Deployments in AWS: Leveraging IaC for Cross-Account Efficiency
  • Building a Fortified Foundation: The Essential Guide to Secure Landing Zones in the Cloud
  • Automating AWS Infrastructure: Creating API Gateway, NLB, Security Group, and VPC With CloudFormation

Trending

  • Unlocking Potential With Mobile App Performance Testing
  • Maintain Chat History in Generative AI Apps With Valkey
  • Packages for Store Routines in MariaDB 11.4
  • Getting Started With Microsoft Tool Playwright for Automated Testing
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Analyze Your ALB/NLB Logs With ClickHouse

Analyze Your ALB/NLB Logs With ClickHouse

Explore AWS Load Balancer Logs with ClickHouse for efficient log analysis and implement a scalable solution to analyze AWS NLB or ALB access logs in real time.

By 
Andrei Tserakhau user avatar
Andrei Tserakhau
·
May. 07, 24 · Tutorial
Like (1)
Save
Tweet
Share
551 Views

Join the DZone community and get the full member experience.

Join For Free

In the dynamic world of cloud computing, data engineers are constantly challenged with managing and analyzing vast amounts of data. A critical aspect of this challenge is effectively handling AWS Load Balancer Logs. This article examines the integration of AWS Load Balancer Logs with ClickHouse for efficient log analysis. We start by exploring AWS’s method of storing these logs in S3 and its queuing system for data management. The focus then shifts to setting up a log analysis framework using S3 and ClickHouse, highlighting the process with Terraform. The goal is to provide a clear and practical guide for implementing a scalable solution for analyzing AWS NLB or ALB access logs in real time.

To understand the application of this process, consider a standard application using an AWS Load Balancer. Load Balancers, as integral components of AWS services, direct logs to an S3 bucket. This article will guide you through each step of the process, demonstrating how to make these crucial load-balancer logs available for real-time analysis in ClickHouse, facilitated by Terraform. However, before delving into the specifics of Terraform’s capabilities, it’s important to first comprehend the existing infrastructure and the critical Terraform configurations that enable the interaction between S3 and SQS for the ALB.

Existing infrastructure

Setting Up the S3 Log Storage

Begin by establishing an S3 bucket for ALB log storage. This initial step is vital and involves linking an S3 bucket to your ALB. The process starts with creating an S3 Bucket, as demonstrated in the provided code snippet (see /example_projects/transfer/nlb_observability_stack/s3.tf#L1-L3).

ProtoBuf
 
resource "aws_s3_bucket" "nlb_logs" {
  bucket = var.bucket_name
}


The code snippet demonstrates the initial step of establishing an S3 bucket. This bucket is specifically configured for storing AWS ALB logs, serving as the primary repository for these logs.

ProtoBuf
 
resource "aws_lb" "alb" {
  /* your config
	*/

  dynamic "access_logs" {
    for_each = var.access_logs_bucket != null ? { enabled = true } : {}

    content {
      enabled = true
      bucket  = var.bucket_name
      prefix  = var.access_logs_bucket_prefix
    }
  }
}


Next, we configure an SQS queue that works in tandem with the S3 bucket. The configuration details for the SQS queue are outlined here.

ProtoBuf
 
resource "aws_sqs_queue" "nlb_logs_queue" {
  name   = var.sqs_name
  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Id": "sqspolicy",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "sqs:SendMessage",
      "Resource": "arn:aws:sqs:*:*:${var.sqs_name}",
      "Condition": {
        "ArnEquals": { "aws:SourceArn": "${aws_s3_bucket.nlb_logs.arn}" }
      }
    }
  ]
}
POLICY
}


This code initiates the creation of an SQS queue, facilitating the seamless delivery of ALB logs to the designated S3 bucket.

As logs are delivered, they are automatically organized within a dedicated folder:

Logs automatically organized within a dedicated folder

Regularly generated new log files demand a streamlined approach for notification and processing. To establish a seamless notification channel, we'll configure an optimal push notification system via SQS. Referencing the guidelines outlined in Amazon S3's notification configuration documentation, our next step involves the creation of an SQS queue. This queue will serve as the conduit for receiving timely notifications, ensuring prompt handling and processing of newly generated log files within our S3 bucket.

This linkage is solidified through the creation of the SQS queue (see /example_projects/transfer/nlb_observability_stack/s3.tf#L54-L61).

ProtoBuf
 
resource "aws_s3_bucket_notification" "nlb_logs_bucket_notification" {
  bucket = aws_s3_bucket.nlb_logs.id

  queue {
    queue_arn = aws_sqs_queue.nlb_logs_queue.arn
    events    = ["s3:ObjectCreated:*"]
  }
}


The configurations established thus far form the core infrastructure for our log storage system. We have methodically set up the S3 bucket, configured the SQS queue, and carefully linked them. This systematic approach lays the groundwork for efficient log management and processing, ensuring that each component functions cohesively in the following orchestrated setup:Composed architecture, where the S3 bucket, SQS queue, and their interconnection stand as pivotal components for storing and managing logs effectively within the AWS environment

The illustration above showcases the composed architecture, where the S3 bucket, SQS queue, and their interconnection stand as pivotal components for storing and managing logs effectively within the AWS environment.

Logs are now in your S3 bucket, but reading these logs may be challenging. Let’s take a look at a data sample:

Plain Text
 
tls 2.0 2024-01-02T23:58:58 net/preprod-public-api-dt-tls/9f8794be28ab2534 4d9af2ddde90eb82 84.247.112.144:33342 10.0.223.207:443 244 121 0 15 - arn:aws:acm:eu-central-1:840525340941:certificate/5240a1e4-c7fe-44c1-9d89-c256213c5d23 - ECDHE-RSA-AES128-GCM-SHA256 tlsv12 - 18.193.17.109 - - "%ef%b5%bd%8" 2024-01-02T23:58:58


The snippet above represents a sample of the log data residing within the S3 bucket. Understanding this data's format and content will help us to build an efficient strategy to parse and store it.

Let’s move this data to DoubleCloud Managed Clickhouse.

Configuring VPC and ClickHouse With DoubleCloud

The next step involves adding a Virtual Private Cloud (VPC) and a managed ClickHouse instance. These will act as the primary storage systems for our logs, ensuring secure and efficient log management (see /example_projects/transfer/nlb_observability_stack/network.tf#L1-L7).

ProtoBuf
 
resource "doublecloud_network" "nlb-network" {
  project_id      = var.project_id
  name            = var.network_name
  region_id       = var.region
  cloud_type      = var.cloud_type
  ipv4_cidr_block = var.ipv4_cidr
}


Next, we’ll demonstrate how to integrate a VPC and ClickHouse into our log storage setup. The following step is to establish a ClickHouse instance within this VPC, ensuring a seamless and secure storage solution for our logs (see /example_projects/transfer/nlb_observability_stack/ch.tf#L1-L35).

ProtoBuf
 
resource "doublecloud_clickhouse_cluster" "nlb-logs-clickhouse-cluster" {
  project_id = var.project_id
  name       = var.clickhouse_cluster_name
  region_id  = var.region
  cloud_type = var.cloud_type
  network_id = resource.doublecloud_network.nlb-network.id

  resources {
    clickhouse {
      resource_preset_id = var.clickhouse_cluster_resource_preset
      disk_size          = 34359738368
      replica_count      = 1
    }
  }

  config {
    log_level       = "LOG_LEVEL_INFORMATION"
    max_connections = 120
  }

  access {
    data_services = ["transfer"]
    ipv4_cidr_blocks = [
      {
        value       = var.ipv4_cidr
        description = "VPC CIDR"
      }
    ]
  }
}

data "doublecloud_clickhouse" "nlb-logs-clickhouse" {
  project_id = var.project_id
  id         = doublecloud_clickhouse_cluster.nlb-logs-clickhouse-cluster.id
}


Integrating S3 Logs With ClickHouse

To link S3 and ClickHouse, we utilize DoubleCloud Transfer, an ELT (Extract, Load, Transform) tool. The setup for DoubleCloud Transfer includes configuring both the source and target endpoints. Below is the Terraform code outlining the setup for the source endpoint (see /example_projects/transfer/nlb_observability_stack/transfer.tf#L1-L197).

ProtoBuf
 
resource "doublecloud_transfer_endpoint" "nlb-s3-s32ch-source" {
  name       = var.transfer_source_name
  project_id = var.project_id
  settings {
    object_storage_source {
      provider {
        bucket                = var.bucket_name
        path_prefix           = var.bucket_prefix
        aws_access_key_id     = var.aws_access_key_id
        aws_secret_access_key = var.aws_access_key_secret
        region                = var.region
        endpoint              = var.endpoint
        use_ssl               = true
        verify_ssl_cert       = true
      }
      format {
        csv {
          delimiter = " " // space as delimiter
          advanced_options {
          }
          additional_options {
          }
        }
      }
      event_source {
        sqs {
          queue_name = var.sqs_name
        }
      }
      result_table {
        add_system_cols = true
        table_name      = var.transfer_source_table_name
        table_namespace = var.transfer_source_table_namespace
      }
      result_schema {
        data_schema {
          fields {
            field {
              name     = "type"
              type     = "string"
              required = false
              key      = false
              path     = "0"
            }
            field {
              name     = "version"
              type     = "string"
              required = false
              key      = false
              path     = "1"
            }
            /*
	            Rest of Fields
            */	
            field {
              name     = "tls_connection_creation_time"
              type     = "datetime"
              required = false
              key      = false
              path     = "21"
            }
          }
        }
      }
    }
  }
}


This Terraform snippet details the setup of the source endpoint, including S3 connection specifications, data format, SQS queue for event notifications, and the schema for data in the S3 bucket. Next, we focus on establishing the target endpoint, which is straightforward with ClickHouse (see /example_projects/transfer/nlb_observability_stack/transfer.tf#L199-L215).

ProtoBuf
 
resource "doublecloud_transfer_endpoint" "nlb-ch-s32ch-target" {
  name       = var.transfer_target_name
  project_id = var.project_id
  settings {
    clickhouse_target {
      clickhouse_cleanup_policy = "DROP"
      connection {
        address {
          cluster_id = doublecloud_clickhouse_cluster.nlb-logs-clickhouse-cluster.id
        }
        database = "default"
        password = data.doublecloud_clickhouse.nlb-logs-clickhouse.connection_info.password
        user     = data.doublecloud_clickhouse.nlb-logs-clickhouse.connection_info.user
      }
    }
  }
}


The preceding code snippets for the source and target endpoints can now be combined to create a complete transfer configuration, as demonstrated in the following Terraform snippet (see /example_projects/transfer/nlb_observability_stack/transfer.tf#L217-L224).

ProtoBuf
 
resource "doublecloud_transfer" "nlb-logs-s32ch" {
  name       = var.transfer_name
  project_id = var.project_id
  source     = doublecloud_transfer_endpoint.nlb-s3-s32ch-source.id
  target     = doublecloud_transfer_endpoint.nlb-ch-s32ch-target.id
  type       = "INCREMENT_ONLY"
  activated  = false
}


With the establishment of this transfer, a comprehensive delivery pipeline takes shape:

Comprehensive delivery pipeline


Complete delivery pipeline primed for seamless data flow

The illustration above represents the culmination of our efforts — a complete delivery pipeline primed for seamless data flow. This integrated system, incorporating S3, SQS, VPC, ClickHouse, and the orchestrated configurations, stands ready to handle, process, and analyze log data efficiently and effectively at any scale.

Exploring Logs in ClickHouse

With ClickHouse set up, we now turn our attention to analyzing the data. This section guides you through querying your structured logs to extract valuable insights from the well-organized dataset. To begin interacting with your newly created database, the ClickHouse-client tool can be utilized:

Shell
 
clickhouse-client \
	--host $CH_HOST \
	--port 9440 \
	--secure \
	--user admin \
	--password $CH_PASSWORD


Begin by assessing the overall log count in your dataset. A straightforward query in ClickHouse will help you understand the scope of data you’re dealing with, providing a baseline for further analysis.

Shell
 
SELECT count(*)
FROM logs_alb

Query id: 6cf59405-2a61-451b-9579-a7d340c8fd5c

┌──count()─┐
│ 15935887 │
└──────────┘

1 row in set. Elapsed: 0.457 sec.


Now, we'll focus on retrieving a specific row from our dataset. Executing this targeted query allows us to inspect the contents of an individual log entry in detail.

Shell
 
SELECT *
FROM logs_alb
LIMIT 1
FORMAT Vertical

Query id: 44fc6045-a5be-47e2-8482-3033efb58206

Row 1:
──────
type:                         tls
version:                      2.0
time:                         2023-11-20 21:05:01
elb:                          net/*****/*****
listener:                     92143215dc51bb35
client_port:                  10.0.246.57:55534
destination_port:             10.0.39.32:443
connection_time:              1
tls_handshake_time:           -
received_bytes:               0
sent_bytes:                   0
incoming_tls_alert:           -
chosen_cert_arn:              -
chosen_cert_serial:           -
tls_cipher:                   -
tls_protocol_version:         -
tls_named_group:              -
domain_name:                  -
alpn_fe_protocol:             -
alpn_be_protocol:             -
alpn_client_preference_list:  -
tls_connection_creation_time: 2023-11-20 21:05:01
__file_name:                  api/AWSLogs/******/elasticloadbalancing/eu-central-1/2023/11/20/****-central-1_net.****.log.gz
__row_index:                  1
__data_transfer_commit_time:  1700514476000000000
__data_transfer_delete_time:  0

1 row in set. Elapsed: 0.598 sec.


Next, we'll conduct a simple yet revealing analysis. By running a “group by” query, we aim to identify the most frequently accessed destination ports in our dataset.

Shell
 
SELECT
    destination_port,
    count(*)
FROM logs_alb
GROUP BY destination_port

Query id: a4ab55db-9208-484f-b019-a5c13d779063

┌─destination_port──┬─count()─┐
│ 10.0.234.156:443  │   10148 │
│ 10.0.205.254:443  │   12639 │
│ 10.0.209.51:443   │   13586 │
│ 10.0.223.207:443  │   10125 │
│ 10.0.39.32:443    │ 4860701 │
│ 10.0.198.39:443   │   13837 │
│ 10.0.224.240:443  │    9546 │
│ 10.10.162.244:443 │  416893 │
│ 10.0.212.130:443  │    9955 │
│ 10.0.106.172:443  │ 4860359 │
│ 10.10.111.92:443  │  416908 │
│ 10.0.204.18:443   │    9789 │
│ 10.10.24.126:443  │  416881 │
│ 10.0.232.19:443   │   13603 │
│ 10.0.146.100:443  │ 4862200 │
└───────────────────┴─────────┘

15 rows in set. Elapsed: 1.101 sec. Processed 15.94 million rows, 405.01 MB (14.48 million rows/s., 368.01 MB/s.)


Conclusion

This article has outlined a comprehensive approach to analyzing AWS Load Balancer Logs using ClickHouse, facilitated by DoubleCloud Transfer and Terraform. We began with the fundamental setup of S3 and SQS for log storage and notification, before integrating a VPC and ClickHouse for efficient log management. Through practical examples and code snippets, we demonstrated how to configure and utilize these tools for real-time log analysis.

The seamless integration of these technologies not only simplifies the log analysis process but also enhances its efficiency, offering insights that are crucial for optimizing cloud operations. Explore the complete example in our Terraform project here for a hands-on experience with log querying in ClickHouse. The power of ClickHouse in processing large datasets, coupled with the flexibility of AWS services, forms a robust solution for modern cloud computing challenges.

As cloud technologies continue to evolve, the techniques and methods discussed in this article remain pertinent for IT professionals seeking efficient and scalable solutions for log analysis.

AWS ClickHouse Log analysis Virtual private cloud Load balancing (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Implementing EKS Multi-Tenancy Using Capsule (Part 3)
  • Strategic Deployments in AWS: Leveraging IaC for Cross-Account Efficiency
  • Building a Fortified Foundation: The Essential Guide to Secure Landing Zones in the Cloud
  • Automating AWS Infrastructure: Creating API Gateway, NLB, Security Group, and VPC With CloudFormation

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: