Monitoring
Commands, metrics, APIs, and services for monitoring Validators and Proxies.
Monitoring Validators and Proxies
Logging
Several command line options control logging:
-
--verbosity
: Sets logging verbosity.3
outputs logs up toINFO
level and is recommended.4
outputs up toDEBUG
level;5
isTRACE
. -
--vmodule
: Overrides this verblosity in specific modules. For example, to configureTRACE
level logging of consensus activity, useconsensus/istanbul/*=5
. -
--consoleoutput
: Sends output to the given path, or tostdout
. -
(Deprecatedin v1.5)
--consoleformat
: Formats logs for easy viewing in a terminal (term
), or as structured JSON (json
). -
(Introduced in v1.5)
--log.json
: Formats logs as structured JSON (true
), or for easy viewing in a terminal (false
, default option).
Useful messages to record or set up log-based metrics on:
-
msg="Validator Election Results"
: When the last block of any epoch (number
) has been agreed,elected
shows whether the validator was selected in the validator election. -
msg="Elected but didn't sign block"
: This validator was elected but did not have its signature included in the block given bynumber
(in fact, in the child's parent seal). This block could count towards downtime if 12 successive blocks are missed.
Metrics
Celo Blockchain inherits go-ethereum's metrics system, but additional Celo-specific metrics have been added.
Metrics reporting is enabled with the --metrics
flag.
Pull-based metrics are available using the --pprof
flag. This enables the pprof
debugging HTTP server, by default on http://localhost:6060
. The --pprof.addr
and --pprof.port
options can be used to configure the interface and port respectively. If the node is running inside a Docker container, you will need to set --pprof.addr 0.0.0.0
, then on your Docker command line add -p 127.0.0.1:6060:6060
.
Be sure never to expose the pprof
service to the public internet.
Prometheus format metrics are available at http://localhost:6060/debug/metrics/prometheus
.
ExpVar format metrics are available at http://localhost:6060/debug/metrics
.
Support for pushing metrics to InfluxDB is available via --metrics.influxdb
and related flags. This works without the pprof
server.
Note that metric name separators differ between these endpoints.
All metrics are soft-state and are cleared when the process is restarted.
Memory metrics
Memory metrics derived from mstats:
system_memory_held
: Gauge of virtual address space allocated by the Celo Blockchain process, measured in bytes.system_memory_used
: Gauge of Memory in use by the Celo Blockchain process, measured as bytes of allocated heap objects.system_memory_allocs
: Counter for memory allocations made, measured in bytes. Consider monitoring the rate.system_memory_pauses
: Counter for stop-the-world Garbage Collection pauses, measured in nanoseconds. Consider monitoring the rate.
CPU metrics
system_cpu_sysload
: Gauge of load average for the system.system_cpu_syswait
: Gauge of IO wait time for the system.system_cpu_procload
: Gauge of load average for the Celo Blockchain process.
Network metrics
-
p2p_peers
: The number of connected peers. This should remain at exactly1
for a proxied validator (just its proxy). It should remain at a relatively steady level for proxy nodes. -
p2p_ingress
: Counter for total inbound traffic, measured in bytes. Consider monitoring the rate. -
p2p_egress
: Counter for total outbound traffic, measured in bytes. Consider monitoring the rate. -
p2p_dials
: Counter for outbound connection attempts. Consider monitoring the rate. -
p2p_serves
: Counter for accepted inbound connection attempts. Consider monitoring the rate.
Blockchain metrics
chain_inserts_count
: The count of insertions of new blocks into this node's chain. The rate of this metric should be close to constant at0.2
/second.
Validator health metrics
A number of metrics are tracked for the parent of the last sealed block received (i.e. this is always two fewer than the current consensus sequence):
-
consensus_istanbul_blocks_elected
: Counts the number of blocks for which this validator has been elected -
consensus_istanbul_blocks_signedbyus
: Counts the blocks for which this validator was elected and its signature was included in the seal. This means the validator completed consensus correctly, sent aCOMMIT
, its commit was received in time to make the seal of the parent received by the next proposer, or was received directly by the next proposer itself, and so the block will not count as downtime. Consider monitoring the rate. -
consensus_istanbul_blocks_missedbyus
: Counts the blocks for which this validator was elected but not included in the child's parent seal (this block could count towards downtime if 12 successive blocks are missed). Consider monitoring the rate. -
consensus_istanbul_blocks_missedbyusinarow
: (since 1.0.2) Counts the blocks for which this validator was elected but not included in the child's parent seal in a row. Consider monitoring the gauge. -
consensus_istanbul_blocks_proposedbyus
: (since 1.0.2) Counts the blocks for which this validator was elected and for which a block it proposed was succesfully included in the chain. Consider monitoring the rate. -
consensus_istanbul_blocks_downtimeevent
: (since 1.0.2) Counts the blocks for which this validator was elected and for blocks where it is considered down (occurs whenmissedbyusinarow
is >= 12). Consider monitoring the rate.
Consensus metrics
-
consensus_istanbul_core_desiredround
: Current desired round for this validator, i.e the round we are waiting to see a quorum of validators sendRoundChange
messages for. Usually this value should be0
. Desired rounds increment with each timeout, which backoff exponentially. A value of5
indicates consensus has stalled for more than 30 seconds. Values above that means the validator is unable to participate in quorum (either because it is disconnected, out of sync, etc, or because of network partition or failure of other validators). -
consensus_istanbul_core_round
: : Current consensus round for this validator, i.e the round for which this validator has received a quorum ofRoundChange
messages. Usually this value should be0
. If this value is less thanconsensus_istanbul_core_desiredround
the validator is not connected to a quorum of other validators that are also unable to participate (for instance, they did see a proposed block, but this validator did not). If it is equal, it means the validator remains connected to a quorum of other validators but cannot agree on a block. -
consensus_istanbul_core_sequence
: Current consensus sequence number, i.e the block number currently being proposed.
Network consensus health metrics
-
consensus_istanbul_blocks_totalsigs
: The number of validators whose signatures were included in the child's parent seal. This can be used to determine how many validators are up and contributing to consensus. If this number falls towards two thirds of validator set size, network block production is at risk. -
consensus_istanbul_blocks_missedrounds
: Sum of theround
included in theparentAggregatedSeal
for the blocks seen. That is, the cumulative number of consensus round changes these blocks needed to make to get to this agreed block. This metric is only incremented when a block is succesfully produced after consensus rounds fails, indicating down validators or network issues. -
consensus_istanbul_blocks_missedroundsasproposer
: (since 1.0.2) A meter noting when this validator was elected and could have proposed a block with their signature but did not. In some cases this could be required by the Istanbul BFT protocol. -
consensus_istanbul_blocks_validators
: (since 1.0.2) Total number of validators eligible to sign blocks. -
consensus_istanbul_core_consensus_count
: Count and timer for succesful completions of consensus (Usequantile
tag to find percentiles:0.5
,0.75
,0.95
,0.99
,0.999
)
Management APIs
Celo blockchain inherits and extends go-ethereum's Javascript console, exposing management APIs and web3 DApp APIs.
Connect a client using a variant of the attach
command line option:
geth attach --datadir DATADIR
geth attach ipc:PATH/TO/geth.ipc
geth attach http://localhost:8545
geth attach ws://localhost:8546
Community Monitoring Tools
Atalma Signature & Attestation Viewer (Celo Vido)
- Visualizer of current and historic data on validator signatures collected in each block on Mainnet and Baklava.
- Visualizer of current and historic attestation requests and completions, and attestation endpoint versions and status on Mainnet and Baklava.
Virtual Hive Celo Network Validator Exporter
Prometheus exporter that scrapes downtime and meta information for a specified validator signer address from the Celo blockchain. All data is collected from a blockchain node via RPC.