Monitoring
153 skills in DevOps > Monitoring
workers-observability
Cloudflare Workers observability with logging, Analytics Engine, Tail Workers, metrics, and alerting. Use for monitoring, debugging, tracing, or encountering log parsing, metric aggregation, alert configuration errors.
query-grafana-tempo
Query and explore distributed traces in Grafana Tempo. Use when a user asks to pull tracing data for analysis.
aws-cost-operations
This skill provides AWS cost optimization, monitoring, and operational best practices with integrated MCP servers for billing analysis, cost estimation, observability, and security assessment.
observability-monitoring
Structured logging, metrics, distributed tracing, and alerting strategies
observability-monitoring
Structured logging, metrics, distributed tracing, and alerting strategies
senior-observability
Comprehensive observability skill for monitoring, logging, distributed tracing, alerting, and SLI/SLO implementation across distributed systems. Includes dashboard generation, alert rule creation, error budget calculation, and metrics analysis. Use when implementing monitoring stacks, designing alerting strategies, setting up distributed tracing, or defining SLO frameworks.
langfuse-observability
LLM observability with self-hosted Langfuse - tracing, evaluation, monitoring, prompt management, and cost tracking
logging-observability
Guidelines for structured logging, distributed tracing, and debugging patterns across languages. Covers logging best practices, observability, security considerations, and performance analysis.
clickhouse-grafana-monitoring
ClickHouse analytics and Grafana dashboard configuration for Vigil Guard v2.0.0 monitoring. Use when querying logs, analyzing 3-branch detection metrics, creating dashboards, investigating events, working with n8n_logs database, managing retention policies, or monitoring branch performance (branch_a_score, branch_b_score, branch_c_score).
observability-setup
Setting up Prometheus metrics, OpenTelemetry tracing, and health endpoints for Nais applications
backend-pino
High-performance structured JSON logging for Node.js. Use when building production APIs that need fast, structured logs for observability platforms (Datadog, ELK, CloudWatch). Provides request logging middleware, child loggers for context, and sensitive data redaction. Choose Pino over console.log for any production TypeScript backend.
langsmith-fetch
Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.
python-observability-patterns
Observability patterns for Python applications. Triggers on: logging, metrics, tracing, opentelemetry, prometheus, observability, monitoring, structlog, correlation id.
tmux
Manage concurrent background processes using tmux. Use when spawning dev servers, running long-running tasks, monitoring multiple processes, or capturing output from background commands without blocking the main session.
event-driven-file-watching
Chokidarライブラリを中心としたファイルシステム監視の専門スキル。Observer Patternによる効率的なファイル変更検知、クロスプラットフォーム対応、EventEmitterによる疎結合な通知システムを設計・実装する。Anchors:• Node.js EventEmitter / 適用: イベント駆動設計 / 目的: 疎結合な通知メカニズム• Chokidar Documentation / 適用: ファイル監視設定 / 目的: クロスプラットフォーム監視• Observer Pattern (GoF) / 適用: イベント通知設計 / 目的: 変更検知と通知の分離Trigger:Use when implementing file system watching, Chokidar configuration, file change detection, or event-based file monitoring systems.file watching, chokidar, fs watch, file change, event emitter, observer pattern, hot reload
deploying-monitoring-stacks
This skill deploys monitoring stacks, including Prometheus, Grafana, and Datadog. It is used when the user needs to set up or configure monitoring infrastructure for applications or systems. The skill generates production-ready configurations, implements best practices, and supports multi-platform deployments. Use this when the user explicitly requests to deploy a monitoring stack, or mentions Prometheus, Grafana, or Datadog in the context of infrastructure setup.
setting-up-distributed-tracing
This skill automates the setup of distributed tracing for microservices. It helps developers implement end-to-end request visibility by configuring context propagation, span creation, trace collection, and analysis. Use this skill when the user requests to set up distributed tracing, implement observability, or troubleshoot performance issues in a microservices architecture. The skill is triggered by phrases such as "setup tracing", "implement distributed tracing", "configure opentelemetry", or "add observability to microservices".
file-watcher-observability
ファイル監視システムの可観測性(Observability)を3本柱(Metrics、Logs、Traces)に基づいて実装するスキル。Prometheus/Grafana統合でSLA遵守測定、パフォーマンス監視、障害根本原因分析を支援。Anchors:• Observability Engineering(Charity Majors) / 適用: 3本柱設計 / 目的: メトリクス・ログ・トレースの統合• Google SRE Book / 適用: ゴールデンシグナル / 目的: SLI/SLO設計• Prometheus Documentation / 適用: メトリクス命名規則 / 目的: 標準準拠の実装Trigger:Use when implementing observability for file watcher systems, setting up Prometheus/Grafana monitoring, designing SLI/SLO metrics, or analyzing production performance issues.
observability
Distributed tracing with Jaeger, OpenTelemetry, and observability platforms for microservices insights
observability-pillars
オブザーバビリティ三本柱(ログ・メトリクス・トレース)の統合設計スキル。相関IDによる連携と双方向ナビゲーション(メトリクス→ログ→トレース)を実現。Anchors:• Observability Engineering (Charity Majors) / 適用: 三本柱統合パターン / 目的: 高カーディナリティObservability• Google SRE Book / 適用: メトリクス設計とSLI/SLO / 目的: 信頼性エンジニアリング• W3C Trace Context / 適用: 分散トレーシング標準 / 目的: 相互運用可能なトレース伝播Trigger:Use when integrating logs, metrics, and traces with correlation IDs, designing bi-directional navigation between pillars, implementing OpenTelemetry, or setting up high-cardinality observability.observability, three pillars, logs, metrics, traces, correlation ID, OpenTelemetry, tracing, distributed systems