status

Cardano has proven to be a robust network

Published 23.1.2023

In the Cardano network, there was a transient anomaly that caused a disconnection and automatic restart of about 50% of the nodes in the network. The incident occurred between blocks 8300569 and 8300570. Both relay nodes and block producer nodes were affected. Some nodes disconnected from peers. Others automatically restarted after throwing an exception. The Cardano client and the network consensus are designed to deal with this kind of event. Most of the affected nodes automatically recovered and resumed their work. The network behaved exactly as expected. The impact of the incident on block production was minimal and short. The cause of the problem is unknown at the time of writing. There is speculation that a block appeared in the network that caused the nodes a problem after receiving it.

TLDR

  • There is probably a bug in the Cardano node that needs to be fixed.
  • The network automatically recovered itself without the coordinated efforts of the pool operators.
  • The incident did not affect network security.
  • We need more Cardano node implementations.

The power of decentralization

A decentralized network should be able to handle these types of incidents and be able to provide services to users at all times. No one should lose assets. An incident is both bad and good in this context. It is definitely bad that an unknown event caused nodes to disconnect from peers or reboot. There is a bug in the client that needs to be found and fixed. Once the root cause is found, it will very likely be an easy task for the developers.

On the positive side, the network behaved exactly as expected. In many cases, individual nodes automatically restarted and continued their work. The network was able to resynchronize. There was no need to reboot the network in any coordinated (i.e. centralized) way. The whole event lasted literally only a few minutes.

In events like this, the node that just becomes the slot leader may not be able to mint the block or propagate it to the network in time. This can theoretically happen to multiple nodes in a row. Users may notice that it takes a longer time for the transaction to be confirmed.

In this event, it very likely happened that some slot leaders did not mint a block. The affected pool operators will receive a smaller reward from the network. Cardano does not have slashing, so the network will not dramatically punish anyone for a mistake that was not the pool operator's fault.

This is not a serious problem for the network, as long as it is able to recover itself and the production of blocks is resumed within a short while. That is exactly what happened in this incident.

Decentralization is important because even if a significant portion of nodes is unable to provide a service, the rest of the functional nodes are able to do so. There may be only temporary degradation, but not a complete shutdown of the network. Should there be no automatic node restart, each individual pool operator can address the problem.

It is important to stress that it was not necessary to restart the network. The individual nodes were able to synchronize and continue to participate in the network consensus after the automatic restart. This is not common with other blockchains which may require a coordinated restart.

Network security was stable during the incident

The network was not at risk of a 51% attack during the incident. The live stake has no effect on the ongoing epoch in which the active stake is used. The attacker had no chance to purchase ADA coins and register new pools to use them for the attack.

Changes to the distribution of ADA coins are accepted when a new snapshot is taken, and the new snapshot is applied with a delay of one epoch.

User coins and tokens could not have been lost during the incident. These are protected by private keys. No one other than the owner of the private key is able to sign the transaction. Digital signatures are verified during network consensus. This is true even if the network's ability to produce blocks is somehow reduced.

We need more client implementations

The community should learn a lesson from the incident. We need more Cardano node implementations. If there were multiple Cardano node implementations with independent teams working on them, the network would be more robust. There would be a much better chance that the network would be better able to withstand these types of unexpected events and even attacks.

Every software contains bugs. Some bugs may not manifest themselves over the lifetime of the software. It may happen that a hacker discovers a bug and exploits it for an attack. For example, imagine that a hacker finds a bug in the part of the code that handles transaction validation. He may be able to build such a transaction that causes the node to crash.

If the entire network were to use the same version of a single node implementation, there is a good chance that a single purposefully crafted transaction could cause a problem for the entire network. An attacker can send the same transaction over and over and thus has a high chance of damaging the network for a longer period of time.

This problem can be solved through a larger number of independent implementations, as there is very little chance that different implementations will contain the same bug.

For example, if there were 4 Cardano node implementations and operators used them evenly, an exploited bug would cripple only 25% of the nodes.

Most current blockchains tend to have one dominant client implementation. Only some, for example, Ethereum, have multiple alternative implementations that are used by a significant part of the network.

Cardano desperately needs more client implementations. However, it is important to mention that the implementation from IOG is very high quality. The team proceeds very carefully and builds the software in the same way as mission-critical systems. The Cardano node was able to automatically restore its activity, which proves high resilience to similar cases. However, it is always smart to increase network robustness.

Conclusion

There is a theoretical possibility that the problem will recur. It is important to find the root cause and prepare a patch. Only then will we be able to consider this event over.

The incident can be interpreted differently and it is possible that someone will start spreading FUD. It is important to emphasize that a large number of nodes were restarted, but the network was not. Affected Cardano nodes were able to automatically resume their activity. The incident had minimal impact on the network. Cardano has proven to be a robust network that can autonomously deal with this kind of event. It's exactly what you'd expect from a mission-critical project.

Featured:

Related articles

Did you enjoy this article? Other great articles by the same author