Data Engineering
525 skills in Data & AI > Data Engineering
observability-patterns
Comprehensive observability setup patterns for Google ADK agents including logging configuration, Cloud Trace integration, BigQuery Agent Analytics, and third-party observability tools (AgentOps, Phoenix, Weave). Use when implementing monitoring, debugging agent behavior, analyzing agent performance, setting up tracing, or when user mentions observability, logging, tracing, BigQuery analytics, AgentOps, Phoenix, Arize, or Weave.
idea-generation
Work with IdeaForge's AI idea generation system. Triggers: generation flow, AI prompts, scoring system, duplicate detection, real-time logs, generation debugging. Pipeline: API → PromptBuilder → AI → Parse → Dedupe → Save.
postcss-config
PostCSS configuration template and validation logic for Tailwind CSS processing with Autoprefixer. Includes 4 required standards (required base plugins, critical plugin order with tailwindcss first and autoprefixer last, file naming as postcss.config.js, required dependencies). Use when creating or auditing postcss.config.js files to ensure correct CSS build pipeline.
run-resource-design
Guide for designing Run resources in OptAIC. Use when creating PipelineRun, ExperimentRun, BacktestRun, PortfolioOptimizationRun, TrainingRun, InferenceRun, or MonitoringRun. Covers execution tracking, metrics, output artifacts, and lineage.
deployment-patterns
Deploy projects to Vercel, Railway, or Docker with platform-specific best practices. Use when deploying applications, configuring deployment settings, debugging deployment failures, or setting up CI/CD pipelines. Triggers on "deploy to vercel", "railway deployment", "docker build", "deployment failed", "configure vercel.json".
error-handling-patterns
Master when to fail fast vs degrade gracefully. Production-tested error handling strategies for GitHub Actions, CI/CD pipelines, and platform automation.
data-engineering
ETL pipelines, Apache Spark, data warehousing, and big data processing. Use for building data pipelines, processing large datasets, or data infrastructure.
devops
Deploy and manage cloud infrastructure on Cloudflare (Workers, R2, D1, KV, Pages, Durable Objects, Browser Rendering), Docker containers, and Google Cloud Platform (Compute Engine, GKE, Cloud Run, App Engine, Cloud Storage). Use when deploying serverless functions to the edge, configuring edge computing solutions, managing Docker containers and images, setting up CI/CD pipelines, optimizing cloud infrastructure costs, implementing global caching strategies, working with cloud databases, or building cloud-native applications.
backend-development
Build robust backend systems with modern technologies (Node.js, Python, Go, Rust), frameworks (NestJS, FastAPI, Django), databases (PostgreSQL, MongoDB, Redis), APIs (REST, GraphQL, gRPC), authentication (OAuth 2.1, JWT), testing strategies, security best practices (OWASP Top 10), performance optimization, scalability patterns (microservices, caching, sharding), DevOps practices (Docker, Kubernetes, CI/CD), and monitoring. Use when designing APIs, implementing authentication, optimizing database queries, setting up CI/CD pipelines, handling security vulnerabilities, building microservices, or developing production-ready backend systems.
bigquery
Comprehensive guide for using BigQuery CLI (bq) to query and inspect tables in Monzo's BigQuery projects, with emphasis on data sensitivity and INFORMATION_SCHEMA queries.
fp-ts
Master the fp-ts library for typed functional programming in TypeScript, including Option, Either, Task, TaskEither, Reader, State, IO, Array, Record, pipe/flow composition, Do notation, optics (lenses/prisms), and integration with the Effect-TS ecosystem. Use when working with fp-ts data types, composing functional pipelines, handling effects functionally, implementing monadic patterns, or using fp-ts utilities for type-safe functional code.
storyteller
Transform abstract/metaphorical narrative into concrete visual story structure.USE WHEN: Converting poetic/theatrical narrative from diverse-content-gen into scene-by-scene visual breakdowns ready for screenwriter formatting.PIPELINE POSITION: diverse-content-gen → **storyteller** → screenwriter → production-validator → imagine → arch-vPRIMARY FUNCTION: Bridge the gap between "altar pribadi" (abstract metaphor) and "woman returns daily to same beach spot" (filmable scene).OUTPUT: Scene breakdown with concrete visual actions, preserved emotional core, and story logic documentation.
mongodb
Guide for implementing MongoDB - a document database platform with CRUD operations, aggregation pipelines, indexing, replication, sharding, search capabilities, and comprehensive security. Use when working with MongoDB databases, designing schemas, writing queries, optimizing performance, configuring deployments (Atlas/self-managed/Kubernetes), implementing security, or integrating with applications through 15+ official drivers. (project)
cloudflare-ci-cd-github-actions
Use this skill whenever the user wants to set up, refactor, or maintain a GitHub Actions CI/CD pipeline for deploying Cloudflare Workers/Pages apps (e.g. Hono + TypeScript) with D1/R2, including tests, build, migrations, and multi-environment deploys.
python-modern-cli
Build professional Python CLIs with modern UX patterns using pyfiglet (ASCII banners), typer (commands), questionary (interactions), and rich (formatting). Use when creating command-line tools, automation scripts with user interaction, data processing pipelines with CLI interfaces, or upgrading existing Python scripts to professional CLIs. Ideal for ETL workflows, GIS tools, data analysis utilities, and civic tech projects requiring reproducible, scriptable interfaces.
effect-collections-datastructs
Value-based data structures (Data.struct, tuple, array) and high-performance collections (Chunk, HashSet). Use for safe comparisons and pipelines.
github-archive
Investigate GitHub security incidents using tamper-proof GitHub Archive data via BigQuery. Use when verifying repository activity claims, recovering deleted PRs/branches/tags/repos, attributing actions to actors, or reconstructing attack timelines. Provides immutable forensic evidence of all public GitHub events since 2011.
bloodbank-event-publisher
Complete guide for creating, publishing, and consuming events in the DeLoNET home network's 33GOD agentic developer pipeline. Built on RabbitMQ with strict type safety via Pydantic, async Python (aio-pika), FastAPI, and Redis-backed correlation tracking. This skill is REQUIRED for any work involving the home network event bus.
ontology-verifier-agent
Independent verification agent for Ontology Builder Pipeline.Cross-validates outputs against inputs, checks traceability,validates constraints, and identifies issues. Use as a separateagent to verify pipeline outputs.
data-pipeline
Implement Scrapy pipeline patterns for data processing, validation, cleaning, and storage when processing scraped items. Automatically creates pipelines for common data workflows including CSV, JSON, database export, and data transformation.