How to Monitor n8n Workflows, Performance, or Metrics Using Grafana

Introduction

Automation has become a critical enabler for modern businesses, streamlining workflows, reducing manual tasks, and accelerating time-to-market. Among the many workflow automation platforms available, n8n stands out as a versatile, open-source solution that allows users to create complex automation pipelines by connecting hundreds of services with minimal coding.

As organizations scale their use of n8n, managing and maintaining these workflows efficiently becomes a challenge. Failures, slow executions, or resource bottlenecks can disrupt business processes, leading to delays and lost opportunities. To prevent this, monitoring your n8n workflows continuously for performance, uptime, error rates, and resource consumption is vital.

This is where Grafana comes into play. Grafana is a leading open-source visualization and alerting platform designed for monitoring and observability. Its rich ecosystem supports multiple data sources, including Prometheus, InfluxDB, Elasticsearch, and more. By integrating n8n with Grafana, you can visualize key workflow metrics in real-time, create custom dashboards tailored to your needs, and set up alerts that notify your team instantly when issues arise.

In this post, we will guide you through the entire process of monitoring n8n workflows using Grafana. We’ll cover:

  • What metrics matter when monitoring n8n.
  • How to expose these metrics for collection.
  • Setting up Prometheus and Grafana to gather and visualize data.
  • Building effective dashboards and alerts.
  • Advanced monitoring strategies to deepen insights.

Whether you are running n8n for personal projects or mission-critical business workflows, monitoring is an indispensable practice for maintaining stability and driving continuous improvement.

Prerequisites

Before we jump into the technical details, let’s clarify the tools and knowledge you’ll need:

1. A Running Instance of n8n

You can run n8n either on your local machine for testing or in a production environment. For production, n8n can be deployed on:

  • Cloud platforms like AWS, Azure, or DigitalOcean.
  • Containers orchestrated by Kubernetes.
  • Traditional servers.

Self-hosting gives you full control over customization, resource monitoring, and integration with internal tools.

2. A Prometheus Instance

Prometheus is an open-source monitoring system that collects and stores metrics as time series data. It offers a powerful query language (PromQL) and integrates seamlessly with Grafana.

If you prefer, you can use alternative time-series databases such as InfluxDB or TimescaleDB, both supported by Grafana.

3. A Grafana Instance

Grafana serves as the frontend visualization and alerting layer. You can install Grafana on-premise or use Grafana Cloud for a managed service experience.

4. Basic Understanding of Metrics and Dashboards

Concepts to be familiar with include:

  • Metrics Types: Counters (monotonically increasing values), Gauges (instantaneous values), Histograms (distributions).
  • Time Series Data: Data points indexed over time.
  • Dashboards: Collections of visual panels like graphs, tables, and alerts.

This foundational knowledge will help you make the most of the monitoring stack.

Understanding What to Monitor in n8n

To monitor n8n workflows effectively, you need to decide what data is important and why.

Key Metrics to Track

  1. Number of Workflow Executions.

    Tracking how many times workflows are executed gives insight into system usage patterns and workflow popularity.

  2. Execution Success/Failure Rate

    Knowing how often workflows fail is critical for operational reliability. Failures could indicate bugs, data issues, or external system problems.

  3. Execution Duration

    Tracking how long workflows take to run helps you spot performance degradation or inefficient workflow design.

  4. Resource Usage (CPU, Memory)

    Especially in self-hosted environments, monitoring resource consumption helps avoid system overloads or crashes.

  5. Queue Length and Job Throughput

    If you run n8n in queue mode (recommended for scaling), watching the job queue length and processing rate ensures that backlogs don’t build up.

Why These Metrics Matter

  • Debugging: When a workflow fails or runs slowly, you need data to identify the root cause quickly.
  • Optimization: Monitoring lets you find bottlenecks and optimize workflows for speed and resource efficiency.
  • Capacity Planning: Resource usage data informs decisions about scaling infrastructure.
  • Alerting: Early warnings of failures or performance issues enable proactive incident response.

Exposing Metrics from n8n

Unlike some platforms with built-in Prometheus exporters, n8n does not currently provide native metrics endpoints. However, you can still expose valuable metrics through several practical approaches.

Option 1: Custom Webhook Workflows

You can create dedicated n8n workflows that trigger after every execution or error event, gather relevant metrics, and push them to monitoring backends like Prometheus Pushgateway or InfluxDB.

How this works:

  • Use n8n’s Webhook Node or Execution Hooks to capture events.
  • Extract metadata: workflow ID, execution status, start/end times.
  • Format metrics in the expected format.
  • Send metrics via HTTP POST to a pushgateway or InfluxDB endpoint.

This method is highly customizable and requires no changes to n8n source code.

Option 2: Using Execution Hooks and Logging

n8n exposes hooks in the execution lifecycle where you can insert custom code snippets or external scripts to log metrics to your monitoring system.

This requires some coding and setup but allows fine-grained control.

Option 3: External Exporter or Agent

Create a small Node.js service using libraries like prom-client to collect metrics by querying n8n’s database or logs and expose them as a Prometheus endpoint.

This approach is more complex but can provide richer and more reliable metrics.

Example: Creating a Custom Webhook Workflow to Push Metrics to Prometheus Pushgateway

  1. Create a new webhook in n8n that triggers on workflow completion.
  2. Use the ‘Set’ node to assemble metric data, e.g., incrementing counters for success or failure.
  3. Use the HTTP Request node to push formatted metrics to the Pushgateway URL.

Here’s an example of the Prometheus metrics format you might send:

Plaintext
# HELP n8n_workflow_executions_total Total number of workflow executions
# TYPE n8n_workflow_executions_total counter
n8n_workflow_executions_total{workflow="SendEmail",status="success"} 10
n8n_workflow_executions_total{workflow="SendEmail",status="failure"} 2

This simple yet powerful technique enables you to gather actionable metrics with zero impact on n8n’s core.

Setting Up the Monitoring Stack

Once you have a way to expose metrics, the next step is setting up the infrastructure to collect and visualize them.

Prometheus Setup

Prometheus scrapes metrics endpoints at regular intervals.

Create or edit your prometheus.yml config file with a job that scrapes your n8n metrics endpoint or the Pushgateway:

YAML
scrape_configs:
  - job_name: 'n8n'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['localhost:9091']  # Replace with your pushgateway or metrics endpoint
      

Restart Prometheus to load the new config.

Prometheus stores metrics data locally and provides a powerful query language, PromQL, for querying the data.

Grafana Setup

With Prometheus running, set up Grafana to visualize the data:

  1. Access your Grafana instance via browser.
  2. Go to Configuration > Data Sources.
  3. Click Add data source, select Prometheus.
  4. Enter your Prometheus server URL (e.g., http://localhost:9090).
  5. Click Save & Test to verify connectivity.

Grafana can now query Prometheus for n8n metrics.

Building Dashboards in Grafana

Dashboards help translate raw data into insights. Below are examples of panels and queries you can create.

Panel 1: Workflow Success vs Failure Over Time

Use a time series graph to plot the number of successful and failed executions over time.

Example PromQL Query for Success:

Promql
sum(increase(n8n_workflow_executions_total{status="success"}[5m])) by (workflow)

For Failure:

Promql
sum(increase(n8n_workflow_executions_total{status="failure"}[5m])) by (workflow)

This panel helps identify trends or spikes in failures.

Panel 2: Top Failing Workflows

A bar chart showing workflows with the highest failure counts.

Query:

Promql
topk(10, sum(n8n_workflow_executions_total{status="failure"}) by (workflow))

Panel 3: Execution Time Distribution

If you capture execution durations as histograms or summaries, visualize execution time with percentiles to see how performance varies.

Example:

Promql
histogram_quantile(0.95, sum(rate(n8n_workflow_execution_duration_seconds_bucket[5m])) by (le, workflow))

This shows the 95th percentile execution time, highlighting outliers.

Panel 4: Resource Usage (CPU, Memory)

If you collect system metrics using node exporters or cAdvisor, display them with gauges or line graphs.

Example:

Promql
node_cpu_seconds_total{mode="idle"}

Tips for Dashboard Design

  • Use clear, descriptive panel titles.
  • Group related metrics on the same dashboard.
  • Apply thresholds for color-coded alerts on panels.
  • Use filters and variables to enable dynamic exploration by workflow or time range.

Setting Alerts and Notifications

Grafana’s alerting engine allows you to define conditions that trigger notifications when something goes wrong.

Examples of Useful Alerts:

  • Workflow failure rate spike:

    Trigger if a workflow fails more than 5 times within 10 minutes.

    Query condition:

Promql
increase(n8n_workflow_executions_total{status="failure", workflow="YourWorkflow"}[10m]) > 5
  • Execution time spike:

    Alert if the average execution time exceeds a threshold.
Promql
avg_over_time(n8n_workflow_execution_duration_seconds{workflow="YourWorkflow"}[5m]) > 10

Notification Channels

Set up Grafana to notify you via:

  • Slack (using Incoming Webhooks)
  • Email
  • PagerDuty or Opsgenie
  • Microsoft Teams
  • Custom Webhooks for integration with other systems

Alerts help your team react quickly, reducing downtime and customer impact.

Advanced Monitoring Ideas

Once basic monitoring is running, consider these advanced techniques:

Correlate Logs and Metrics Using Loki

Grafana Loki is a log aggregation system designed to work with Grafana. Centralize your n8n logs here, then correlate log events with metrics on dashboards to troubleshoot complex issues faster.

Track Performance Per User or Client

If your n8n workflows are multi-tenant, label metrics by user ID, API key, or client identifier. This enables you to see which customers are impacted and prioritize support.

Combine with Tracing and Distributed Observability

Integrate with tracing tools like Jaeger or use OpenTelemetry to gain visibility into individual workflow steps and external API calls, giving a full picture of latency and failures.

Conclusion

Monitoring your n8n workflows is not just a nice-to-have — it’s essential for ensuring your automation runs smoothly, efficiently, and reliably. Through careful collection and visualization of key metrics, you can:

  • Detect and fix workflow errors quickly.
  • Optimize performance and resource usage.
  • Plan capacity and scaling.
  • Improve overall system reliability.

This post covered how to:

  • Identify important metrics in n8n.
  • Expose those metrics using webhooks, hooks, or exporters.
  • Set up Prometheus and Grafana to collect and visualize metrics.
  • Build meaningful dashboards and set actionable alerts.
  • Explore advanced observability techniques for deeper insights.

With this monitoring foundation, you’ll be well-positioned to maintain and scale your automation workflows as your business grows.