Validator Performance Tracking

Offchain Labs
Offchain Labs
Published in
22 min readApr 19, 2023

--

Tl; dr: We explain in detail the different components that make up an Ethereum validator’s reward. We describe common situations that make stakers miss some of these rewards and show how to diagnose the — typically uncommon — scenarios in which there are actual problems with the software and action is required.

twitter: @potuz1

1. Introduction

A common concern of node operators is –Is my node performing well?–, or — Is my node performing at least as the average?–. The essence of the question is in fact –Should I be changing anything in my setup or not?–. It turns out that these questions do not really have an easy answer. There have been many attempts to try to create a single score that would measure a validator’s performance, but they all fail because of different reasons to answer the very basic questions mentioned above.

Instead of trying to track a single indicator (or a ton of useless ones and be worried about some small variance on any of those), validators should analyze why their performance took a hit: did I miss an attestation due to a late block, a reorg, something else out of my control? did I miss a proposal due to my EL being stressed or was my block reorged because it was received late by the network?

In this document, we will try to help node operators identify warnings and their causes, and make informed decisions on when to tweak their setup. In order to do this, we first start by describing — in some level of technical detail — the different components of a validator’s reward. We then move to analyze each one of them separately as they are susceptible to different external factors. Finally, we describe how to exploit existing monitoring tools (grafana, logs, beacon explorers, etc) to diagnose causes for drops in rewards.

The organization of this article is as follows:

  • Section 2: The statistical expected variance in income amongst perfectly working validators.
  • Section 3: The components that make attestation rewards, how much to expect in different scenarios of attestation inclusion, and how to diagnose a missing attestation.
  • Section 4: The analogous situation for proposals.
  • Section 5: Sync committee rewards overview
  • Section 6: Transaction tips
  • Section 7: Tools to monitor performance available on Prysm.
  • Section 8: Getting support from the Prysm team

2. Reward variance is evil

In this section, we describe the statistically expected variance in income of perfectly running validators, we show that tracking simply income as a measure of performance is a very poor indicator.

A validator earns income (resp. is penalized) for performing different duties (resp. failing to perform them). These duties are:

  1. Attesting
  2. Proposing
  3. Sign sync committee messages

There are other duties that an honest validator must perform, but do not translate either in earnings if performed, or penalties if not performed. These include aggregating attestations, voting for the right ETH1 data block, broadcasting slashable offenses, etc.

One might be tempted to simply consider as score the earnings over a fixed period of time. After all, this is really what node operators want to maximize. However, this is a very poor performance gauge as we briefly explain now.

For a perfectly running validator, on a perfectly running Ethereum network, and disregarding execution tips (we will talk about those below) rewards would be split as follows: 27/32 or approximately 85% for attestations, 1/8 or approximately 12% for proposals, and 1/32 or approximately 3% for sync committee signatures.

But even on a perfectly running Ethereum network, with one new block appearing each second. The expected time (at the time of writing this article, where there are 500 000 active validators) to propose a block is a little above 69 days. To make things worse: the probability of not proposing a single block during 120 days is still fairly high at almost 18% while the probability of proposing 4 blocks or more is also high at almost 10%. This feature of Poisson distributions with very little chance per try creates a fairly large variance between the top earning validators and the bottom ones, even if they were both optimally configured and working in flawless network conditions.

Notice that the above analysis, while true, is critically flawed! an operator would be interested in knowing the probability that some validator has proposed 4 blocks during the last 120 days, and not a particular one. It’s a nice exercise in high school mathematics to check that the probability that some validator will propose 4 or more blocks during 120 is about 31%.

During the Altair fork, the spread got even worse. A single validator expects to get 3% of its earnings from sync committee participation. However, only 512 validators form such a committee at a time, and this committee remains unchanged for a little over 27 hours! The probability of not being part of such a committee for two years is almost 52%! Of course, the probability that some validator will be at least once in a sync committee is 100%. Sync committees made the variance a worse problem also in a short-time scale. The expected time to be part of a sync committee is about 1111 days. That means that about 3% of earnings during that whole time would be earned during 256 Epochs (~27 hours) by a validator in a sync committee (and a lucky validator could be multiple times in the same committee!). So if a node operator decides to compare its earnings for say a period of 1 day, with respect to other validators, he does not stand a chance against the top validators in the sync committee during that day.

In addition to earnings in the beacon-chain, validators also earn transaction tips on the execution layer. This is the mother of all variance: this lucky validator not only proposed the merge block but also received 45 ETH for it.

Given the above, it is clear that using purely reward as an indicator of good performance or correct setup is a very poor choice. There were attempts into isolating the components with high variance and only count attestation rewards.

But if the reader disregards the very misleading axes chosen and instead focuses on the data (also disregarding that it’s a closed experiment that couldn’t be reproduced), we see that there is very little variation in this measure.

By far the most popular metric is Jim Mcdonald’s attestation effectiveness. This is a clever concept that takes into account the length of time it takes an attestation to be produced until it was included in a block. This was more relevant during Phase0 but later lost relevance during Altair. It also didn’t take missed attestations into account and was victim to the motif of this section: variance. Oftentimes, a single attestation may happen to be included very late, say 30 slots after it was produced. A perfectly running validator that happens to trigger this effect in a single attestation would drop its effectiveness to 70%. While a validator that was offline for half its time, thus earning zero in the best of cases, would have 100%.

So far, we have seen only one source of variance, and that is simply the sheer number of active validators. There are many other external factors that contribute to variance between validator rewards. It could be timeliness of block proposals, the performance of other validators during the same slot of our duty, the existence of reorgs close to our duty, and many more that we will talk about below. Adrian Sutton had an interesting idea to abstract these factors and proposed a single index that takes the percentage of what could have possibly been obtained by a validator into account. Such indices may be a good first approximation to distinguish if validators are simply not performing well at all, since most validators would score in the high 90% range, so being below this threshold is a good indicator of bad performance.

3. Anatomy of attestation rewards

In this section, we describe all the different components that make up an attestation reward. We explain how to compute each one and give the rough values in the current (at the time of writing) expected rewards for each component. We describe different scenarios of some component of the attestation not being rewarded and/or being penalized, and how to diagnose the reasons for this missed reward.

An attestation consists of three parts — the source, target, and head. Two of them, the source and the target, are Checkpoints and are known as the FFG part of the attestation. The last one, the head is a Blockroot. A Checkpoint is itself a pair, of an Epoch and a Blockroot, they have to be compatible from the validator’s point of view of the canonical chain. The rule is as follows: for a checkpoint of the form (Epoch: N, Blockroot: r), the validator looks up the block in the canonical chain that was proposed in the first slot of epoch N. If no canonical block was proposed during that slot, it then looks up the block proposed in the previous slot (that is, the last slot of epoch N-1), and continues until it finds a block canonical block b. Then r must be the Blockroot of b. Now suppose your validator is scheduled to attest in a slot S. And let S = 32 * N + p, where N and p are unique with the condition 0 <= p < 32. Then S is a slot during epoch N.

When constructing its attestation during S, the honest validator uses as source its last justified checkpoint. If you do not know about justification and finalization, we recommend reading (besides the Casper paper above) this great explanation by Ben Edington. An attestation that has the wrong source checkpoint, that is, from the point of view of the validators counting them, does not get any reward, it is not even considered, therefore validators will not include such attestations in their blocks. If your attestation voted for the wrong source, there is something wrong with your validator. The most common culprit for this is downtime in bandwidth or other networking issues. In the worst case, it would mean that the validator is following a bad fork of the network, this is something that happened only on testnets, like during the Medalla incident.

In Prysm, you can check the content of your attestation by searching the validator client logs. Each time the validator submits new attestations, it logs a message as the following:

Submitted new attestations   AggregatorIndices=[454743] AttesterIndices=[454743]   BeaconBlockRoot=0x2e1cf8ecf573 CommitteeIndex=61 Slot=5500683   SourceEpoch=171895 SourceRoot=0x4a44b3dba695 TargetEpoch=171896   TargetRoot=0x5bedf3d17c72 prefix=validator

You can check against the beacon explorer to see if the source checkpoint was correct. In the above example, during slot 5500683, that is at epoch 171896, the justified checkpoint (it is typically one epoch before) was on epoch 171895 and the first block of that epoch was proposed with Blockroot 0x4a44b3dba695... which coincides with the above vote.

Getting the source vote correctly is the very first thing a validator should do; otherwise, as noted, its attestation won’t be included at all. The next component of the attestation is the target vote, this is another checkpoint: the checkpoint of the current epoch at the time of the attestation. In the above example, the first slot of epoch 171896 was proposed in this block. We see that the Blockroot matches, hence the attestation had the right target.

Let us consider the typical example of a bad target vote. This is an attestation of the best validator ever

level=info msg="Submitted new attestations" AggregatorIndices=[] AttesterIndices=[7654] BeaconBlockRoot=0x6485a33e71ab CommitteeIndex=37 Slot=5610976 SourceEpoch=175342 SourceRoot=0x763c0a821a56 TargetEpoch=175343 TargetRoot=0x6485a33e71ab prefix=validator

The target epoch 175343 has a block proposed in its first slot, and this block is canonical. The Blockroot however is 0x0394845889f8... and does not match the voted Blockroot. This validator was penalized. But why did this validator vote wrong in this case? Was it its fault or was it the block proposer/network status? To find this out we need to dig out more information from our logs. The voted blockroot corresponds to the Parent Root field of the checkpoint block. That is, the validator voted for this block as the target checkpoint. This validator was supposed to vote for the block during slot 5610976 but ended up voting for the previous block. The most common reason for this to happen is that the block was received late (or took too long to process). For this, we dig our logs, this time of the beacon node, and find when we received that block.

level=debug msg="Received block" blockSlot=5610976 graffiti="" prefix=sync proposerIndex=207514 sinceSlotStartTime=4.263404416s"

AAAdvice:
Run with debug level logs to have access to this information!

We see that we have received that block at 4.2s, this is right at the boundary when attestations are sent: the validator sent its attestation 4 seconds into the slot, 200ms before the block arrived. This does not immediately mean that the block proposer is at fault and that our validating setup is correctly working. It may well be that we have peering issues. We do not know if we received that block late, or if the whole network received it late because it was proposed late. To find this out, we look at the explorer — in the votes pane, we see that only 321 validators voted for this block, while a typical block would get 15K votes. This is a good indication that most of the network didn’t see this block in time. In fact, if we look at the block before, at slot 5610975, we will find that it received about 30K votes! Both validators voting during that slot and the next one (as shown in the example above) voted for it. (Incidentally, you can disregard the explorer saying that this block received 45K votes, this is a bug in their counting votes algorithm that counts certain votes twice). Aimed with this information we can be certain that the block proposer was late.

This block was proposed using the FlashBots relay, validators that care for the health of the network may want to consider this before trying to extract MEV at all cost. At any rate, this is an unfortunate situation, blocks that arrive between 4 seconds and 12 seconds to a node, often times are not penalized and the validators that were supposed to vote during that slot are, since they vote for the previous block. Luckily, this is short-lived: there are plans to have all proposers orphan and dis-consider late blocks. Lighthouse and Prysm have already deployed this and are successfully reorging blocks on mainnet.

The last component of the attestation is the head vote. Honest validators attest with the Blockroot of the block that they see as the current head of the chain at their time to perform their duty. This is the easiest one to get wrong, for the same reasons as described above: if the block proposer is late, but not so late as to be missed by the next proposer, then chances are that the block is included and validators voted for its parent. The same analysis above can be carried out to see if your attestation had a wrong head vote because the block to which it attested was late. This is particularly frustrating to validators that need to attest during the first slot of an Epoch (those slots that are multiples of 32, colloquially people refer to these as slot zero). These blocks are more likely to be late as the producing beacon node was stressed right before, carrying out all computations for Epoch transition. In the above scenario, validators get both the target and the head votes incorrectly. During the middle of the epoch, when attesting to a late block, validators would typically get source and target correctly and head incorrectly.

So far we have talked about what is a correct vote. But voting correctly is not necessarily enough to get the corresponding reward. Validators’ attestations also need to be included on-chain relatively quickly. Once the source vote is correct, the reward that the attestation receives depends on the timeliness of that attestation. Timeliness is something that is not entirely up to the validator sending it. A validator performs its duty at a given slot S, this attestation can be included in blocks starting at S+1. It may be included multiple times (often times this would be a bug in the validator client software) but the only time it counts is the earliest block it is included. Say this attestation was included in slot S+k. Then we call k the inclusion distance. The beacon explorer shows the inclusion slot of each attestation in the Attestations pane.

We noted above that out of the total consensus layer rewards, 27/32 of them, the majority, are due to attestations. To obtain these rewards, the attestation needs to have the right corresponding vote and be included timely. This means:

  • For the source vote: it is timely if it has an inclusion distance of 5 or less. The attestation receives 7/32 of the total reward or 7/27 ~ 26% of the attestation reward.
  • For the target vote: it is timely if it has an inclusion distance of 32 or less. Since attestations can only be included up to 32 slots after they are created, this is equivalent to just being correct and included. The reward in this case is 13/32 of the total reward or 13/27 ~ 48% of the attestation reward.
  • For the head vote to be timely, the attestation needs to have the minimal inclusion distance, which is 1, included in the block right after the attestation was performed. This reward equals that for the source, which is 7/32 of the total reward or 7/27 ~ 26% of the attestation reward.

PPPenalty: Whenever the source and the target vote are not timely, then in addition to not receiving the reward, the validator is penalized the corresponding value. This is not the case for the head vote that is not penalized.

So far we have been talking about percentages of the total reward, but we have not talked about the absolute value of the reward. This is because the value of the reward depends not only on the attestation being correct and timely but also on the number of active validators that were correct and timely during the previous epoch. The beacon node keeps track of the number of validators (or rather their effective balance) and computes the base_reward. This reward is the ideal reward that validators would obtain if they perform all their duties correctly during this epoch. It decreases with the square root of the number of validators. At 500 000 validators, rounding off and assuming that all have 32 ETH of effective balance (that is, disregarding slashed validators or offline validators that have slightly less), this base_reward equals 16 190 Gwei.

HHHint: in many of the computations below we will use this value of 16 190 Gwei for the base_reward. In case you are reading this document and the network has many more validators, the formula we have used is

where NUM_VALIDATORS is the number of active validators at the time of computing. This is only an approximation disregarding slashings and similar, but it has been good enough throughout these last couple of years. The 10⁹ is to convert 32 ETH to Gwei, and the factor of 64 is to account for the BASE_REWARD_FACTOR preset constant.

Returning to our example, a perfectly working validator on a perfectly running network should expect about 27/32 * base_reward = 27/32 * 16 190 = 13 662 Gwei per epoch. This matches what validators currently receive, and this is because the beacon chain has been incredibly stable since its launch in December 2020 (a testament to the impossible engineering feat that teams and the community at large have achieved). But if the network wasn’t working correctly, that number would be multiplied by the percentage of validators that actually did perform their duties correctly. If enough validators do not perform their duties correctly then the network will start penalizing all validators (some a lot, some very little) until the badly behaving ones are ejected. For more information on these topics, we recommend reading Ben Edington’s text linked above. We will restrict our analysis to the realistic case of a smoothly working network with a bump here or there when a late block arrives or there is a short reorg.

To conclude this section on attestation rewards, let’s analyze a few typical scenarios:

  • The validator got all attestation rewards. This is the most common scenario and it should happen in almost all epochs for a correctly set-up validator. It gets 100% of the reward, currently at 13 662 Gwei
  • The validator gets the source and target but fails the head vote. This scenario is quite common, about 2% of votes miss the head reward. This is mostly due to the above case of late blocks, which coincidentally are about 2% of blocks on mainnet. In this scenario, the validator will receive 10/16 of the total reward or 8538 Gwei
  • The validator gets the source vote but fails the target and head vote. This is the typical situation when attesting in slot zero and the block is late as in the example above. In this case, the validator is penalized by 3/16 of the full reward, or by 2561 Gwei
  • The validator gets the target correctly but does not get the head nor source timely. This situation happens when the attestation is included in a very late block. The most common situation for this to occur is when the validator attested correctly but the first block that included this attestation was reorged out. When blocks are full, the next proposers may choose not to include this validator’s attestation. This is particularly true if the head vote was on a late block for example, and this validator happened to see it early. In this case, the next proposer is faced with an option to fill his own block: either include an aggregated attestation with lots of validators voting for one head, or this lone validator’s attestation (or possibly a small aggregate). Since proposers are paid by the number of attestations included, they choose not to include this reorged attestation. A later proposer, with space in its block, includes it, but already 5 slots have passed and that’s why is only rewarded for its target vote. In this case, the validator gets rewarded 3/16 of the total reward, or 2561Gwei
  • Finally, the worst possible outcome for a validator is that its attestation is not included at all. We will discuss this scenario (and the previous one) in detail below, in this case, the validator is penalized 10/16 of the total vote or 8538 Gwei

4. Proposer rewards

In this section, we cover validator proposals from the consensus layer perspective, how much a validator should expect to receive from rewards, and the different components that make up this consensus-level reward.

Validators receive in the long term 1/8 of their total rewards from proposals. Proposers receive consensus layer rewards by including

  • Attestations
  • Sync committee messages¹
  • Slashings

They receive a reward for each timely attestation that they include for the first time on chain. This is a case where the economical interests of attesting validators and proposing ones align: the attesters want to perform their duties as early as possible to propagate them and make them visible to the next proposer, and the proposers want to include them as early as possible to obtain their part of the reward. Assuming a correct and timely attestation they receive base_reward * 27/224 ~ 12% of the total reward, or about 1952 Gwei per timely attestation included (assuming a perfect network with 500 000 active validators as of today). In the perfect scenario where all attestations are timely and they all reach the proposers on time, a block will receive 500 000/32 such attestations. That is, the proposer receives approximately 30 493 391 Gwei from attestations alone. Of course, there are bumps here and there, a proposer that reorgs a block (or simply the previous block was skipped) can include attestations from the block being orphaned and nearly double this gain. A proposer that is not well-connected will not have seen many new attestations to include. There is very little that an operator can do to control these scenarios. However, if the reward when proposing a block is not in line with the numbers mentioned above, and the operator’s blocks include very few attestations, the likely culprit is having very few peers or a bad internet connection and the proposer is not getting enough aggregators in his mesh. Other parameters that may influence the total reward a proposer gets from including attestations is if the proposer is proposing at the time that many validators were offline or not attesting well as the proposer will get less per attestation included.

Another component of the proposer reward comes from sync committee messages. Assuming again perfect participation (in practice the sync committee participation has not been entirely perfect, but it has been above 96% consistently since the Altair fork) the proposer would get

which with 500 000 validators equals 1 129 384 Gwei.

Summing the perfect attestation plus sync committee inclusion rewards, a proposer gets

NUM_VALS * base_reward / 256

which at 500 000 validators totals 31 622 777 Gwei.

The last component of consensus layer rewards that a proposer receives is whistle-blower rewards from the inclusion of slashings. If your validator is lucky enough to include these in the event of an attack on the network, good for you for protecting it! However, these events are very rare, and in all known instances so far, they have been unintentionally caused by an operator error, so you should not feel fortunate from profiting from this. Slashings are so rare that it is worthless to include them in any performance evaluation.

5. Sync Committee Messages

The last reward that validators get from performing their consensus layer duties is by submitting sync committee messages. We have already digressed on how often a single validator expects to be in a sync committee, making the variance of this reward the highest among the consensus layer factors. Per signed message, a validator expects to get

NUM_VALS * base_reward / 2**19

which today is about 15 441 Gwei. Validators in the sync committee expect to get this per single slot during 256 epochs (~27 hours), this amounts to a perfect participation of 126 491 106 Gei, or 0.13 ETH. But there is a caveat — if they fail to perform their duty, they will be penalized in the same amount. So, if you plan to keep your validator offline for some time, make sure it is not in the current or next sync committee (there is a Beacon API endpoint for this).

6. Transactions Tips

We have covered the information on consensus layer rewards above and what makes up a good attestation and a good proposal / sync committee messages. Another reward that validators get relatively regularly is transaction tips from blocks they propose. These rewards vary highly from block to block and, while they may be affected by the CL client, they are mostly dictated by the status of the network at the time (was there an NFT drop at that time? was the network under heavy usage? were prices changing dramatically?). There isn’t much that can be done to debug performance with regard to transaction tips. If you are using MEV-Boost or an external relayer, there is very little, if anything that you can do to alter the performance of this metric. If you are using a local execution client, and your blocks systematically use less than the Gas target (unfortunately you will have to run several validators to measure this), then there could be something wrong with your EL most probably: the first culprits would be bad peering (do you have enough peers on your EL?), a stressed CPU or bad bandwidth. If those are under control and look healthy, and your blocks still underperform, please contact us so that we can debug the issue.

7. Monitoring tools

Prysm offers different monitoring tools to track the performance of your validators. The very first line of attack is your logs. As a matter of rule, you should try to run your beacon node with Debug level logs. You can achieve this by using the flag --verbosity=debug. We have already been through a detailed example of attestation analysis above, logs include information on when you missed an attestation reward, the block that you voted for, the arrival time of that block, the time it took to process it, etc. They have much more information than you would think: if there was a reorg, what were the weights of each tip at the time if the execution client was delayed for some reason; what was the error message, etc. On the validator client logs you will find out the exact time you submitted your attestation, how many peers you had at that time, for which block you voted, etc. Your first line of defense when analyzing performance should always be the logs. Whenever filing a bug report or requiring assistance to understand an issue, debug logs are oftentimes what would help you the most.

Besides logs, Prysm has plenty of metrics that are exposed to be grabbed by a client like Prometheus and can be visualized by software like Grafana. If you are experiencing systematic performance degradation, more than looking at attestation data, the first set of metrics you would want to see is if there was higher memory consumption, CPU utilization, network bandwidth oscillating, etc. A drop in peers typically means a short network outage and this directly affects the attestations that you are submitting at that time. A consistently low peer count would affect how many aggregators you are directly peered to, which will then affect your chances of getting your attestation included. After collecting this general information, if everything looks OK, but you still see lower performance than what you would expect. Then you can try to track attestation data on Grafana, it does not take too much time to build tables like the following:

where you can see if you have missed a particular head or target, and check your logs for the timeliness of the arriving blocks (although that is also exposed). You can track metrics like state_transition_processing_milliseconds_* that would tell you if there was a particular issue at the time you missed an attestation. Forkchoice information under doublylinkedtree_* is also helpful for understanding the status of the beacon chain at the time. You can know if there was an orphan block that affected your vote, and similar.

Finally, if you really want a very verbose logging of your validator performance, including validators that you may not be running in that node, you can use our validator monitor. This will report absolutely everything that the node sees related to the given validator indices. The beacon node running the validator-monitor may not need to be connected to the given validator indices.

8. Getting support

If you have gone over the above recommendations and you still believe that you are receiving fewer rewards than you should, or what a normally working validator would, you can contact us on our Discord server. Please post a description in the public channels, and DM one of our team members with your full unredacted logs, preferably debug logs from all three components: execution client, beacon-chain node, and validator client. We will be happy to assist you when provided with enough information.

Happy Staking!

--

--

Offchain Labs
Offchain Labs

We’re Building Arbitrum, a scaling solution for Ethereum. Learn more at https://offchainlabs.com/ and http://arbitrum.io/.