Nicolas Mohnblatt

ZODA: An Explainer

Alex Evans, Nicolas Mohnblatt, Guillermo Angeris, Sanaz Taheri, Nashqueue

This article is cross-posted from the Bain Capital Crypto Insights.


Data availability sampling (DAS) is critical to scaling blockchains while maintaining decentralization [ASB18,HASW23]. In our previous post, we informally introduced DAS and showed how to leverage interaction to reduce overhead for a promising construction based on the FRI protocol. Our newest work, ZODA, short for ‘Zero-Overhead Data Availability,’ takes these ideas a step further. In particular, we show that, with a small change, an extended data square construction, such as that used in Celestia, can be turned into a (validity) proof of its own correctness. Importantly, the change incurs negligible additional work for encoders and no incremental communication costs for the network. This post is an informal introduction to the ZODA protocol.

Basics of encoding. To explain ZODA, we briefly sketch how DAS protocols work today. In production, DAS protocols, including those contemplated by Ethereum and implemented by Celestia, block data is organized into a matrix $\tilde X$. This matrix is then encoded into a larger matrix $Z$, called the extended data square. For example, we may first encode by encoding its columns using a Reed–Solomon code matrix $G$. We can then encode the rows of the resulting matrix using another Reed–Solomon matrix $G’$ to obtain a matrix $Z = GXG’^T$ . (This encoding is called the tensor code resulting from encoding columns using $G$ and rows using $G’$.) The procedure is depicted below. In this example, we use a rate of 1/2 for each encoding, meaning the data square $Z$ is $4$ times larger than $\tilde X$.

Sampling. Light nodes then sample entries from this extended data square, $Z$. The example below illustrates this procedure for two light nodes sampling independently. The first node’s samples are represented in blue while the second’s are in red. Purple entries represent entries which both nodes sampled. In this example, each node samples 30 entries and land on the same entry 5 times. If light nodes collectively sample enough distinct entries from the rows or from the columns (for example, at least half of each row or column will suffice), they can get together and erasure decode $\tilde X$. In the below example, the majority of each row and each column was sampled. If the two nodes pool their samples, they can reconstruct the original data $\tilde X$ either from the rows or from the columns, using a standard erasure decoding algorithm for $G$ or $G’$.

Encoding correctness. However, if $Z$ not correctly encoded, nodes might not be able to decode $\tilde X$ (or anything at all). For DAS to work, we need some way of ensuring that $Z$ was constructed correctly. Existing protocols deal with this problem in one of two ways.

Fraud-proof construction. The first idea, initially proposed in [ASB18] and now used in the Celestia protocol, is for full nodes to download all of $\tilde X$ and compute $Z$ themselves as per the first figure. If the result doesn’t match $Z$ at any point, they can alert light nodes with a (compact) fraud proof. Either the row or the column containing the error could be sent to a light node that can check the error directly.

This approach assumes that each light node is connected to at least one honest full node that can furnish a fraud proof in the event of a bad encoding. While there is zero prover overhead, each full node needs to download the entire data and re-compute the extended data square $Z$ itself. Since we assume that at least one of the $N$ full nodes connected to each light node is honest, we ideally want $N$ to be reasonably large. This, in turn means that a lot of redundant data is downloaded (and compute is spent) in aggregate across the network. Making blocks too big also reduces the pool of full nodes that can participate in securing the network, as larger machines with more bandwidth are required as blocks get larger.

2D KZG construction. Another approach, initially proposed in [But20], is to commit to each row and column of $Z$ using the polynomial commitment scheme of KZG [KZG10]. We then compute KZG opening proofs for each entry of $Z$. Each entry of the matrix comes with an opening proof that verifies against the commitment for its associated row or column. The drawback of this approach is that it requires computing KZG commitments for each row and column of the matrix, as well as opening proofs for each individual entry. These proofs are concretely slow and expensive to compute, imposing cost and latency on the network. While it is possible to parallelize the process over many provers, significant network-wide proving overhead (and indeed, proof overhead, as every sampled element carries an associated proof) remains.

Drawbacks. While both approaches to proving the correctness of $Z$ are considered sufficiently practical, they have drawbacks in terms of both efficiency and trust. The fraud-proof-based construction requires redundant bandwidth and computation while assuming that each light node is connected to an honest full node. The KZG-based construction imposes significant proving overhead, assumes a trusted setup, and is not plausibly post-quantum secure.

Other systems. Recently, hashing-based proof systems have emerged as a potential alternative for DAS thanks to the works of [HASW23,HASW24]. These constructions feature low prover overhead, don’t require a trusted setup, and are plausibly post-quantum secure. However, they impose extremely high overheads on the network by requiring each light node to redundantly download a (large) non-interactive proof of proximity. In our previous post, we showed how to leverage interaction to reduce this overhead substantially. ZODA takes this idea to its logical extreme: what if we leveraged interaction to the maximum extent possible to reduce the proof overhead to zero?

ZODA. The main idea for ZODA is simple. In fact, ZODA looks almost identical to the fraud-proof-based construction we presented in the first figure, with one small tweak. As before, we start by encoding the columns of $\tilde X$. In ZODA, we commit to this intermediate encoding (for example, using a Merkle tree). We then use entropy from this commitment to generate a random vector, sampled from some (large enough) field. We then construct a diagonal matrix $D_{\bar r}$ with this vector along the diagonal and multiply $\tilde X$ with $D_{\bar r}$. (This is a linear-time operation which is concretely very fast relative to the encoding itself.) Finally, we then encode the rows again, as before. This amended construction is depicted below.

Surprisingly, this tweak makes the encoding provably correct! Specifically, nodes sample random rows and columns and check that these are valid codewords which are also consistent at their intersection. Repeating this procedure for enough rows and columns establishes two properties with high probability. First, the matrix $Z$ is close to an encoding of a (unique) $\tilde X$ and, second, that the sampled rows and columns whose verification passes are symbols of this unique codeword. In the ZODA paper, we prove these properties by adapting the proximity test of the Ligero [AHIV17] protocol. Intuitively, injecting randomness from the intermediate commitment prevents a malicious encoder from behaving adaptively. Importantly, these rows and columns are also valid samples as part of DAS and can be used to reconstruct the original square. In other words, if enough nodes verify the encoding was correctly constructed, they will, with high probability, also have gathered enough data to reconstruct $\tilde X$. There is zero overhead to the proof: nearly every bit of data that the node downloads as part of the correct-encoding proof is also useful for reconstruction. In the example below, nodes can decode if they collectively sample and validate either half the rows or half the columns.

Figure 1: ZODA verification. A light node verifies that a row and column are valid codewords and are consistent at their intersection (marked with a green checkmark).

Figure 1: ZODA verification. A light node verifies that a row and column are valid codewords and are consistent at their intersection (marked with a green checkmark).

Computational overhead. The construction also has essentially zero prover overhead. This may seem counter-intuitive: usually, when considering general computation, going from a fraud-proof to a validity-proof system (for example, switching from an optimistic to a ZK rollup) imposes a few orders of magnitude of prover overhead. Indeed, if we were to prove the construction of $Z$ in a general-purpose succinct proof, we would also incur similar overhead. However, in the case of ZODA, the validity-provable data square and the fraud-provable data square take about the same time to compute. Why is this? In DAS, encoding $\tilde X$ and committing to the result is already part of the protocol. Moreover, nodes already need to collectively download a large portion of the data square $Z$ in order to ensure that data is available. These features shift the trade-off space to heavily favor proof systems that natively rely on error correcting codes and hashing. In particular, notice that the fraud-provable data square construction of [ASB18] and Celestia already very closely resembles a Ligero prover. Injecting some randomness into the encoding procedure lets us leverage the results of Ligero directly to then ensure that the square is correctly encoded.

Conclusion. What are the implications in practice? The most obvious points relate to increasing scalability and lowering trust assumptions. It is possible to run ZODA with cryptographic security (soundness $2^{−80}$ or $2^{−100}$). Doing so requires downloading only a fraction of the block, while getting similar guarantees to Celestia full nodes today. It may be possible to entirely replace full nodes in a protocol like Celestia with ZODA nodes, saving on network-wide bandwidth and compute, especially for large blocks exceeding 1GB. In the setting of consensus, ZODA nodes can download even less data, enabling significantly higher data throughput as suggested in [Val24]. In the paper, we also discuss a few techniques for enabling resource-constrained light nodes to use ZODA directly to gain higher assurances about their samples. We hope to expand on these ideas in future posts, so stay tuned.

Acknowledgments

We’d like to deeply thank John Adler, Dev Ojha, and Mustafa Al-Bassam for their time and all of the helpful conversations regarding Celestia, data availability sampling, and applications of ZODA to consensus and scaling, which were invaluable in writing this paper. We would also like to thank Kobi Gurkan for helpful edits and suggestions to this post.

Citations

[AHIV17] Scott Ames, Carmit Hazay, Yuval Ishai, and Muthuramakrishnan Venkitasubramaniam. Ligero: Lightweight sublinear arguments without a trusted setup. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2087–2104, Dallas Texas USA, October 2017. ACM.

[ASB18] Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. Fraud proofs: Maximising light client security and scaling blockchains with dishonest majorities. CoRR, abs/1809.09044, 2018.

[But20] Vitalik Buterin. 2d data availability with kate commitments, 2020. Accessed: 14 September 2024.

[HASW23] Mathias Hall-Andersen, Mark Simkin, and Benedikt Wagner. Foundations of data availability sampling. Cryptology ePrint Archive, Paper 2023/1079, 2023.

[HASW24] Mathias Hall-Andersen, Mark Simkin, and Benedikt Wagner. FRIDA: Data availability sampling from FRI. Cryptology ePrint Archive, Paper 2024/248, 2024.

[KZG10] Aniket Kate, Gregory M. Zaverucha, and Ian Goldberg. Constant-size commitments to polynomials and their applications. In Masayuki Abe, editor, Advances in Cryptology - ASIACRYPT 2010, pages 177–194, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[Val24] ValarDragon. Use DAS to speedup consensus throughput, 2024. Accessed: 2024-11-09.

#BCC #Data Availability Sampling