Server Metrics

Conductor uses spectator to collect the metrics.

To enable conductor serve to publish metrics, add this dependency to your build.gradle.
Create your own AbstractModule that overides configure function and registers the Spectator metrics registry.
Initialize the Registry and add it to the global registry via ((CompositeRegistry)Spectator.globalRegistry()).add(...).

The following metrics are published by the server. You can use these metrics to configure alerts for your workflows and tasks.

Name	Purpose	Tags
workflow_server_error	Rate at which server side error is happening	methodName
workflow_failure	Counter for failing workflows	workflowName, status
workflow_start_error	Counter for failing to start a workflow	workflowName
workflow_running	Counter for no. of running workflows	workflowName, version
workflow_execution	Timer for Workflow completion	workflowName, ownerApp
task_queue_wait	Time spent by a task in queue	taskType
task_execution	Time taken to execute a task	taskType, includeRetries, status
task_poll	Time taken to poll for a task	taskType
task_poll_count	Counter for number of times the task is being polled	taskType, domain
task_queue_depth	Pending tasks queue depth	taskType, ownerApp
task_rate_limited	Current number of tasks being rate limited	taskType
task_concurrent_execution_limited	Current number of tasks being limited by concurrent execution limit	taskType
task_timeout	Counter for timed out tasks	taskType
task_response_timeout	Counter for tasks timedout due to responseTimeout	taskType
task_update_conflict	Counter for task update conflicts. Eg: when the workflow is in terminal state	workflowName, taskType, taskStatus, workflowStatus
event_queue_messages_processed	Counter for number of messages fetched from an event queue	queueType, queueName
observable_queue_error	Counter for number of errors encountered when fetching messages from an event queue	queueType
event_queue_messages_handled	Counter for number of messages executed from an event queue	queueType, queueName
external_payload_storage_usage	Counter for number of times external payload storage was used	name, operation, payloadType

Collecting metrics with Log4j

One way of collecting metrics is to push them into the logging framework (log4j). Log4j supports various appenders that can print metrics into a console/file or even send them to remote metrics collectors over e.g. syslog channel.

Conductor provides optional modules that connect metrics registry with the logging framework. To enable these modules, configure following additional modules property in config.properties:

conductor.metrics-logger.enabled = true
conductor.metrics-logger.reportPeriodSeconds = 15

This will push all available metrics into log4j every 15 seconds.

By default, the metrics will be handled as a regular log message (just printed to console with default log4j.properties). In order to change that, you can use following log4j configuration that prints metrics into a dedicated file:

log4j.rootLogger=INFO,console,file

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/app/logs/conductor.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

# Dedicated file appender for metrics
log4j.appender.fileMetrics=org.apache.log4j.RollingFileAppender
log4j.appender.fileMetrics.File=/app/logs/metrics.log
log4j.appender.fileMetrics.MaxFileSize=10MB
log4j.appender.fileMetrics.MaxBackupIndex=10
log4j.appender.fileMetrics.layout=org.apache.log4j.PatternLayout
log4j.appender.fileMetrics.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

log4j.logger.ConductorMetrics=INFO,console,fileMetrics
log4j.additivity.ConductorMetrics=false

This configuration is bundled with conductor-server in file: log4j-file-appender.properties and can be utilized by setting env var:

LOG4J_PROP=log4j-file-appender.properties

This variable is used by startup.sh script.

Integration with logstash using a log file

The metrics collected by log4j can be further processed and pushed into a central collector such as ElasticSearch. One way of achieving this is to use: log4j file appender -> logstash -> ElasticSearch.

Considering the above setup, you can deploy logstash to consume the contents of /app/logs/metrics.log file, process it and send further to elasticsearch.

Following configuration needs to be used in logstash to achieve it:

pipeline.yml:

- pipeline.id: conductor_metrics
  path.config: "/usr/share/logstash/pipeline/logstash_metrics.conf"
  pipeline.workers: 2

logstash_metrics.conf

input {

 file {
  path => ["/conductor-server-logs/metrics.log"]
  codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601} "
      negate => true
      what => previous
  }
 }
}

filter {
    kv {
        field_split => ", "
        include_keys => [ "name", "type", "count", "value" ]
    }
    mutate {
        convert => {
          "count" => "integer"
          "value" => "float"
        }
      }
}

output {
 elasticsearch {
  hosts => ["elasticsearch:9200"]
 }
}

Note: In addition to forwarding the metrics into ElasticSearch, logstash will extract following fields from each metric: name, type, count, value and set proper types

Integration with fluentd using a syslog channel

Another example of metrics collection uses: log4j syslog appender -> fluentd -> prometheus.

In this case, a specific log4j properties file needs to be used so that metrics are pushed into a syslog channel:

    log4j.rootLogger=INFO,console,file

    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

    log4j.appender.file=org.apache.log4j.RollingFileAppender
    log4j.appender.file.File=/app/logs/conductor.log
    log4j.appender.file.MaxFileSize=10MB
    log4j.appender.file.MaxBackupIndex=10
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

    # Syslog based appender streaming metrics into fluentd
    log4j.appender.server=org.apache.log4j.net.SyslogAppender
    log4j.appender.server.syslogHost=fluentd:5170
    log4j.appender.server.facility=LOCAL1
    log4j.appender.server.layout=org.apache.log4j.PatternLayout
    log4j.appender.server.layout.ConversionPattern=%d{ISO8601} %5p [%t] (%C) - %m%n

    log4j.logger.ConductorMetrics=INFO,console,server
    log4j.additivity.ConductorMetrics=false

And on the fluentd side you need following configuration:

    <source>
      @type prometheus
    </source>

    <source>
      @type syslog
      port 5170
      bind 0.0.0.0
      tag conductor
        <parse>
         ; only allow TIMER metrics of workflow execution and extract tenant ID
          @type regexp
          expression /^.*type=TIMER, name=workflow_execution.class-WorkflowMonitor.+workflowName-(?<tenant>.*)_(?<workflow>.+), count=(?<count>\d+), min=(?<min>[\d.]+), max=(?<max>[\d.]+), mean=(?<mean>[\d.]+).*$/
          types count:integer,min:float,max:float,mean:float
        </parse>
    </source>

    <filter conductor.local1.info>
        @type prometheus
        <metric>
          name conductor_workflow_count
          type gauge
          desc The total number of executed workflows
          key count
          <labels>
            workflow ${workflow}
            tenant ${tenant}
            user ${email}
          </labels>
        </metric>
        <metric>
          name conductor_workflow_max_duration
          type gauge
          desc Max duration in millis for a workflow
          key max
          <labels>
            workflow ${workflow}
            tenant ${tenant}
            user ${email}
          </labels>
        </metric>
        <metric>
          name conductor_workflow_mean_duration
          type gauge
          desc Mean duration in millis for a workflow
          key mean
          <labels>
            workflow ${workflow}
            tenant ${tenant}
            user ${email}
          </labels>
        </metric>
    </filter>

    <match **>
      @type stdout
    </match>

With above configuration, fluentd will: - Listen to raw metrics on 0.0.0.0:5170 - Collect only workflow_execution TIMER metrics - Process the raw metrics and expose 3 prometheus specific metrics - Expose prometheus metrics on http://fluentd:24231/metrics

Collecting metrics with Prometheus

Another way to collect metrics is using Prometheus client to push them to Prometheus server.

Conductor provides optional modules that connect metrics registry with Prometheus. To enable these modules, configure following additional module property in config.properties:

conductor.metrics-prometheus.enabled = true

This will simply push these metrics via Prometheus collector. However, you need to configure your own Prometheus collector and expose the metrics via an endpoint.