duckdb-alb-log-analyzer

chokkoyamada/duckdb-alb-log-analyzer

Analyze AWS Application Load Balancer (ALB) logs stored in S3 using DuckDB. Use when users request ALB log analysis, error investigation, performance analysis, traffic analysis, or need to query ALB access logs. Supports analyzing response times, status codes, error patterns, and traffic trends from S3-stored logs.

0 stars

0 forks

Shell

5 views

View on GitHub Add to Favorites

SKILL.md

name: duckdb-alb-log-analyzer description: Analyze AWS Application Load Balancer (ALB) logs stored in S3 using DuckDB. Use when users request ALB log analysis, error investigation, performance analysis, traffic analysis, or need to query ALB access logs. Supports analyzing response times, status codes, error patterns, and traffic trends from S3-stored logs.

ALB Log Analyzer

Analyze AWS Application Load Balancer (ALB) logs using DuckDB for fast, flexible S3-based queries.

Quick Start

Secure Method (Recommended for Named Profiles)

For AWS named profiles, use these secure scripts that keep credentials private:

# One-time setup (credentials read from ~/.aws/credentials)
./scripts/setup_with_profile.sh your-profile-name

# Load logs (profile name only, credentials stay secure)
./scripts/load_with_profile.sh your-profile-name 's3://bucket/path/**/*.log.gz'

# Analyze
./scripts/analyze.sh errors
./scripts/analyze.sh performance

Security: Only profile names appear in commands and logs. Credentials are read internally from ~/.aws/credentials.

Standard Method

Basic workflow for ALB log analysis:

Setup DuckDB with AWS extensions
Load logs from S3 into a table
Run analysis queries (errors, performance, traffic, etc.)
Export results or create custom queries

Prerequisites

Ensure DuckDB is installed:

# Install DuckDB (if not already installed)
brew install duckdb  # macOS
# or download from https://duckdb.org/docs/installation/

AWS credentials must be configured. See AWS Credentials Setup section below for details.

Database File Location

By default, the DuckDB database file is stored at:

/tmp/alb-log-analyzer-${USER}/alb_analysis.duckdb

Important: The database file is automatically deleted when you run setup or load commands. This ensures:

Clean start with each analysis
No accumulation of old data in /tmp
Reduced disk space usage

This temporary location also:

Keeps your skill directory clean
Prevents data from being packaged with the skill
May be cleared on system restart

To use a custom location, set the DB_FILE environment variable:

export DB_FILE=/path/to/custom/database.duckdb

Setup

Method 1: Using CREDENTIAL_CHAIN (Recommended)

This method automatically detects AWS credentials from environment variables, ~/.aws/credentials, or IAM roles:

./scripts/analyze.sh setup

Method 2: Using Environment Variables

If CREDENTIAL_CHAIN doesn't work, explicitly set environment variables.

For default profile:

# Set AWS credentials
export AWS_ACCESS_KEY_ID=your_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=ap-northeast-1

# Setup DuckDB
./scripts/analyze.sh setup-env

For named profile:

# Export credentials from your profile
export AWS_PROFILE=your-profile-name
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name)
export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name)

# Setup DuckDB
./scripts/analyze.sh setup-env

Method 3: Manual SQL

duckdb alb_analysis.duckdb < scripts/setup.sql
# or
duckdb alb_analysis.duckdb < scripts/setup_s3_env.sql

Verify Setup

Diagnose your AWS credentials configuration:

./scripts/analyze.sh diagnose

This will show:

Extension installation status
Environment variable configuration
Configured secrets in DuckDB

AWS Credentials Setup

DuckDB needs AWS credentials to access S3. Choose one of these methods:

Option 1: AWS CLI Configuration (Recommended)

If you have AWS CLI configured, DuckDB can use those credentials:

# Check if AWS CLI is configured
aws s3 ls

# If not configured, run:
aws configure

Option 2: Environment Variables

Export credentials in your shell:

export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_DEFAULT_REGION=ap-northeast-1

Option 3: AWS Credentials File

Create or edit ~/.aws/credentials:

[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

And ~/.aws/config:

[default]
region = ap-northeast-1

Option 4: AWS Profile (For Named Profiles)

If you're using a named AWS profile (not [default]), you need to explicitly export the credentials.

Easy way (using helper script):

# List available profiles
aws configure list-profiles

# Load credentials from your profile
source scripts/load_profile.sh your-profile-name

# Setup and use
./scripts/analyze.sh setup-env
./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'

Manual way:

# Check your profiles
aws configure list-profiles

# Export credentials from your profile
export AWS_PROFILE=your-profile-name
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name)
export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name)

# Setup DuckDB with these credentials
./scripts/analyze.sh setup-env

# Load logs
./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'

Why this is needed: DuckDB doesn't automatically read from named AWS profiles. It only supports environment variables or the [default] profile. By exporting the profile's credentials as environment variables, DuckDB can access them.

Option 5: IAM Role (EC2/ECS)

If running on EC2 or ECS, DuckDB can use the instance's IAM role automatically.

Loading Logs

Load ALB logs from S3 into a DuckDB table.

Using the analyze.sh script

# Load logs from S3
./scripts/analyze.sh load 's3://my-bucket/AWSLogs/123456789012/elasticloadbalancing/us-east-1/2024/11/**/*.log.gz'

Manual SQL approach

Generate SQL from template:

# Replace placeholders
sed -e "s|{{TABLE_NAME}}|alb_logs|g" \
    -e "s|{{S3_PATH}}|s3://my-bucket/path/to/logs/**/*.log.gz|g" \
    scripts/load_template.sql > load_logs.sql

Execute:

duckdb alb_analysis.duckdb < load_logs.sql

S3 Path Patterns

Use glob patterns to load multiple files:

# Single month
s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/11/**/*.log.gz

# Multiple months
s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/*/**/*.log.gz

# Specific day
s3://bucket/AWSLogs/account-id/elasticloadbalancing/region/2024/11/01/**/*.log.gz

Analysis Tasks

Error Analysis

Analyze HTTP errors, status code distributions, and error patterns:

./scripts/analyze.sh errors

This provides:

Status code distribution
Error details (non-200 responses)
5xx errors by hour
Common error reasons

Performance Analysis

Analyze response times and latency:

./scripts/analyze.sh performance

This provides:

Response time statistics (avg, p50, p95, p99)
Slowest requests
Response time trends by hour
Slow request percentage

Custom Queries

Execute custom SQL queries:

# Using a custom SQL file
./scripts/analyze.sh query my_query.sql

# Or directly with DuckDB
duckdb alb_analysis.duckdb

Common Analysis Patterns

Find requests from specific IP

SELECT *
FROM alb_logs
WHERE client_ip_port LIKE '192.168.1.%'
ORDER BY timestamp DESC;

Analyze specific URL path

SELECT
    elb_status_code,
    COUNT(*) as count,
    ROUND(AVG(target_processing_time), 3) as avg_time
FROM alb_logs
WHERE request LIKE '%/api/users%'
GROUP BY elb_status_code;

Traffic by time of day

SELECT
    EXTRACT(HOUR FROM timestamp) as hour,
    COUNT(*) as request_count
FROM alb_logs
GROUP BY hour
ORDER BY hour;

Filter by date range

SELECT *
FROM alb_logs
WHERE timestamp BETWEEN '2024-11-01' AND '2024-11-30'
AND elb_status_code >= 500;

Advanced Usage

Create aggregated summaries

CREATE TABLE daily_summary AS
SELECT
    DATE_TRUNC('day', timestamp) as day,
    COUNT(*) as total_requests,
    SUM(CASE WHEN elb_status_code >= 500 THEN 1 ELSE 0 END) as errors_5xx,
    ROUND(AVG(target_processing_time), 3) as avg_response_time
FROM alb_logs
GROUP BY day;

Export results

-- Export to CSV
COPY (SELECT * FROM alb_logs WHERE elb_status_code >= 500)
TO 'errors.csv' (HEADER, DELIMITER ',');

-- Export to Parquet
COPY alb_logs TO 'alb_logs.parquet' (FORMAT PARQUET);

Multiple table analysis

Load logs into separate tables for comparison:

# Load last week's logs
sed -e "s|{{TABLE_NAME}}|alb_logs_last_week|g" \
    -e "s|{{S3_PATH}}|s3://bucket/logs/2024/10/**/*.log.gz|g" \
    scripts/load_template.sql | duckdb alb_analysis.duckdb

# Compare with this week
SELECT
    'last_week' as period,
    COUNT(*) as requests,
    AVG(target_processing_time) as avg_time
FROM alb_logs_last_week
UNION ALL
SELECT
    'this_week' as period,
    COUNT(*) as requests,
    AVG(target_processing_time) as avg_time
FROM alb_logs;

Troubleshooting

S3 Access Issues

If you get "Access Denied" or credential errors when loading from S3:

Step 1: Diagnose the issue

./scripts/analyze.sh diagnose

This shows your AWS credential configuration status.

Step 2: Verify AWS credentials work

Test with AWS CLI:

aws s3 ls s3://your-bucket/path/to/logs/

If this fails, fix your AWS credentials first.

Step 3: Try different setup methods

3a. If using default AWS profile:

# Set credentials
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key)
export AWS_DEFAULT_REGION=$(aws configure get region)

# Setup DuckDB with environment variables
./scripts/analyze.sh setup-env

# Try loading again
./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'

3b. If using a named AWS profile:

# Check which profile you're using
aws configure list-profiles

# Replace 'your-profile-name' with your actual profile
export AWS_PROFILE=your-profile-name
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id --profile your-profile-name)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key --profile your-profile-name)
export AWS_DEFAULT_REGION=$(aws configure get region --profile your-profile-name)

# Setup DuckDB with environment variables
./scripts/analyze.sh setup-env

# Try loading again
./scripts/analyze.sh load 's3://your-bucket/path/**/*.log.gz'

Step 4: Alternative - Load from local files

If S3 access still doesn't work, download logs locally:

# Download logs from S3
aws s3 sync s3://your-bucket/path/to/logs/ ./local-logs/

# Modify load_template.sql to use local path
sed -e "s|{{TABLE_NAME}}|alb_logs|g" \
    -e "s|{{S3_PATH}}|./local-logs/**/*.log.gz|g" \
    scripts/load_template.sql | duckdb alb_analysis.duckdb

"Extension aws not found"

Run setup script first:

./scripts/analyze.sh setup

Or manually:

duckdb alb_analysis.duckdb < scripts/setup.sql

"Invalid Input Error" when loading

This usually means the S3 path pattern doesn't match any files:

# Check if files exist
aws s3 ls --recursive s3://your-bucket/path/to/logs/ | head

# Verify the path pattern matches your bucket structure
# ALB logs are typically at:
# s3://bucket/AWSLogs/{account-id}/elasticloadbalancing/{region}/{year}/{month}/{day}/*.log.gz

Empty results after loading

# Check table contents
duckdb alb_analysis.duckdb -c "SELECT COUNT(*) FROM alb_logs"

# If 0, verify S3 path and try loading again

Out of memory

Use persistent database instead of :memory::

# Already using persistent storage by default
# alb_analysis.duckdb is a file-based database

Permission errors on scripts

chmod +x scripts/analyze.sh

Region-specific issues

Ensure the region matches your S3 bucket:

export AWS_DEFAULT_REGION=ap-northeast-1  # Change to your region
./scripts/analyze.sh setup-env

References

ALB Log Schema

See references/alb_schema.md for complete field definitions and data types.

Key fields:

timestamp: Request timestamp
elb_status_code: HTTP status from ALB
target_status_code: HTTP status from target
request: Full HTTP request line
target_processing_time: Response time from target
client_ip_port: Client address and port
target_ip_port: Target address and port
error_reason: Error details if applicable
transformed_host, transformed_uri, request_transform_status: Request transformation fields

Query Examples

See references/query_examples.md for comprehensive query examples including:

Error analysis queries
Performance metrics
Traffic analysis
SSL/TLS analysis
Data export patterns

Scripts Reference

analyze.sh

Main analysis script with subcommands:

# Setup (uses CREDENTIAL_CHAIN)
./scripts/analyze.sh setup

# Setup with environment variables (if CREDENTIAL_CHAIN doesn't work)
./scripts/analyze.sh setup-env

# Diagnose S3 access and credentials
./scripts/analyze.sh diagnose

# Load logs
./scripts/analyze.sh load '<s3_path>'

# Analyze errors
./scripts/analyze.sh errors

# Analyze performance
./scripts/analyze.sh performance

# Custom query
./scripts/analyze.sh query <sql_file>

# Options
--db <file>      # Database file (default: alb_analysis.duckdb)
--table <name>   # Table name (default: alb_logs)

setup_with_profile.sh (Secure - Recommended)

Securely setup DuckDB with AWS profile. Credentials are read internally and never exposed in commands:

# Usage
./scripts/setup_with_profile.sh your-profile-name

# Security: Only profile name is visible, credentials read from ~/.aws/credentials

Security advantage: Credentials stay private. Only profile names appear in command history and logs.

load_with_profile.sh (Secure - Recommended)

Securely load logs with AWS profile. Credentials are read internally and never exposed in commands:

# Usage
./scripts/load_with_profile.sh your-profile-name 's3://bucket/path/**/*.log.gz' [table_name]

# Security: Only profile name and S3 path are visible

Security advantage: Credentials stay private. Only profile names appear in command history and logs.

load_profile.sh (Legacy - For Manual Setup)

Helper script to load AWS credentials from a named profile into environment variables:

# Usage
source scripts/load_profile.sh your-profile-name

# This exports:
# - AWS_PROFILE
# - AWS_ACCESS_KEY_ID
# - AWS_SECRET_ACCESS_KEY
# - AWS_DEFAULT_REGION

Note: Must be sourced (not executed) to export variables to your current shell.

Security consideration: This exports credentials to environment variables, which may be visible in process listings. Use setup_with_profile.sh and load_with_profile.sh for better security.

SQL Files

setup.sql: Initialize DuckDB extensions with CREDENTIAL_CHAIN
setup_s3_env.sql: Initialize DuckDB extensions with environment variables
diagnose_s3.sql: Diagnose AWS credentials and S3 access
load_template.sql: Template for loading logs (requires variable substitution)
analyze_errors.sql: Error analysis queries
analyze_performance.sql: Performance analysis queries

README

DuckDB ALB Log Analyzer - Claude Code Skill

A Claude Code skill for analyzing AWS Application Load Balancer (ALB) logs using DuckDB. Query ALB access logs stored in S3 directly for error analysis, performance analysis, traffic analysis, and more.

Features

Fast Analysis: High-performance S3 data processing with DuckDB
Flexible Queries: Analyze logs freely using SQL
Secure: Safe handling of AWS credentials
Easy Setup: Get started with simple commands
Claude Code Integration: Request analysis using natural language in Claude Code

Prerequisites

DuckDB installed
AWS CLI configured or AWS credentials available
ALB logs stored in S3

Installing DuckDB

# macOS
brew install duckdb

# Other platforms
# See https://duckdb.org/docs/installation/

Installation

Method 1: Plugin Marketplace (Recommended)

Install directly from the Claude Code plugin marketplace:

# In Claude Code, run:
/plugin marketplace add chokkoyamada/duckdb-alb-log-analyzer

# Then install the plugin:
/plugin install duckdb-alb-log-analyzer

Method 2: Manual Installation

Install this skill manually into Claude Code:

# Navigate to skills directory
cd ~/.claude/skills/

# Clone this repository
git clone https://github.com/chokkoyamada/duckdb-alb-log-analyzer.git duckdb-alb-log-analyzer

# Grant execute permissions to scripts
chmod +x duckdb-alb-log-analyzer/scripts/*.sh

Quick Start

Secure Method (Recommended)

For AWS named profiles, this method keeps credentials private:

cd ~/.claude/skills/duckdb-alb-log-analyzer

# One-time setup (credentials read from ~/.aws/credentials)
./scripts/setup_with_profile.sh your-profile-name

# Load logs (profile name only, credentials stay secure)
./scripts/load_with_profile.sh your-profile-name 's3://bucket/path/**/*.log.gz'

# Run analysis
./scripts/analyze.sh errors       # Error analysis
./scripts/analyze.sh performance  # Performance analysis

Security: Only profile names appear in commands and logs. Credentials are read internally from ~/.aws/credentials.

Standard Method

# Setup (uses CREDENTIAL_CHAIN)
./scripts/analyze.sh setup

# Load logs
./scripts/analyze.sh load 's3://my-bucket/AWSLogs/123456789012/elasticloadbalancing/us-east-1/2024/**/*.log.gz'

# Error analysis
./scripts/analyze.sh errors

# Performance analysis
./scripts/analyze.sh performance

Using with Claude Code

After installing the skill, you can request analysis using natural language in Claude Code:

# In Claude Code
"Analyze ALB logs from S3 and investigate errors from the past week"
"Find requests with slow response times"
"Analyze requests from a specific IP address"

Database File Location

By default, the DuckDB database file is stored at:

/tmp/alb-log-analyzer-${USER}/alb_analysis.duckdb

Important: The database file is automatically deleted when you run setup or load commands. This ensures:

Clean start with each analysis
No accumulation of old data in /tmp
Reduced disk space usage

To use a custom location, set the DB_FILE environment variable:

export DB_FILE=/path/to/custom/database.duckdb

Key Features

Error Analysis

Analyze HTTP errors, status code distributions, and error patterns:

./scripts/analyze.sh errors

Provides:

Status code distribution
Error details (non-200 responses)
5xx errors by hour
Common error reasons

Performance Analysis

Analyze response times and latency:

./scripts/analyze.sh performance

Provides:

Response time statistics (avg, p50, p95, p99)
Slowest requests
Response time trends by hour
Slow request percentage

Custom Queries

Execute custom SQL queries:

# Using a custom SQL file
./scripts/analyze.sh query my_query.sql

# Or directly with DuckDB
duckdb /tmp/alb-log-analyzer-${USER}/alb_analysis.duckdb

AWS Credentials Setup

DuckDB needs AWS credentials to access S3. Choose one of these methods:

Option 1: AWS CLI Configuration (Recommended)

If you have AWS CLI configured, DuckDB can use those credentials:

# Check if AWS CLI is configured
aws s3 ls

# If not configured
aws configure

Option 2: Environment Variables

Export credentials in your shell:

export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export AWS_DEFAULT_REGION=ap-northeast-1

Option 3: AWS Credentials File

Create or edit ~/.aws/credentials:

[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

For detailed configuration options, see SKILL.md.

Troubleshooting

S3 Access Issues

If you get "Access Denied" or credential errors when loading from S3:

# Diagnose the issue
./scripts/analyze.sh diagnose

# Verify AWS credentials work
aws s3 ls s3://your-bucket/path/to/logs/

For detailed troubleshooting guide, see SKILL.md.

Directory Structure

.
├── SKILL.md                    # Claude Code skill definition and documentation
├── README.md                   # This file
├── references/                 # Reference documentation
│   ├── alb_schema.md          # ALB log schema definitions
│   └── query_examples.md      # Query examples
└── scripts/                   # Analysis scripts
    ├── analyze.sh             # Main analysis script
    ├── setup_with_profile.sh  # Secure setup with AWS profile
    ├── load_with_profile.sh   # Secure log loading with AWS profile
    ├── load_profile.sh        # Profile credentials helper
    ├── setup.sql              # DuckDB initialization (CREDENTIAL_CHAIN)
    ├── setup_s3_env.sql       # DuckDB initialization (environment variables)
    ├── diagnose_s3.sql        # S3 access diagnostics
    ├── load_template.sql      # Log loading template
    ├── analyze_errors.sql     # Error analysis queries
    └── analyze_performance.sql # Performance analysis queries

Reference

ALB Log Schema

See references/alb_schema.md for complete field definitions and data types.

Key fields:

timestamp: Request timestamp
elb_status_code: HTTP status from ALB
target_status_code: HTTP status from target
request: Full HTTP request line
target_processing_time: Response time from target
client_ip_port: Client address and port
error_reason: Error details if applicable

Query Examples

See references/query_examples.md for comprehensive query examples including:

Error analysis queries
Performance metrics
Traffic analysis
SSL/TLS analysis
Data export patterns

License

MIT License

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Links

Installation

Option 1: Use slash command in Claude Code

/install-skill https://github.com/chokkoyamada/duckdb-alb-log-analyzer

Option 2: Clone to skills directory

# Global (all projects)

git clone https://github.com/chokkoyamada/duckdb-alb-log-analyzer ~/.claude/skills/duckdb-alb-log-analyzer

# Project-specific

git clone https://github.com/chokkoyamada/duckdb-alb-log-analyzer .claude/skills/duckdb-alb-log-analyzer

Add MCP server to .cursor/mcp.json:

{
  "mcpServers": {
    "skillz": {
      "command": "npx",
      "args": ["-y", "skillz-mcp", "https://github.com/chokkoyamada/duckdb-alb-log-analyzer"]
    }
  }
}

Restart Cursor after adding the configuration.

Option 1: Use Gemini CLI command

gemini extensions install https://github.com/chokkoyamada/duckdb-alb-log-analyzer

Option 2: Clone to extensions directory

git clone https://github.com/chokkoyamada/duckdb-alb-log-analyzer ~/.gemini/extensions/duckdb-alb-log-analyzer