miden-crypto/benches
Qyriad b151773b0d
feat: implement concurrent Smt construction (#341)
* merkle: add parent() helper function on NodeIndex
* smt: add pairs_to_leaf() to trait
* smt: add sorted_pairs_to_leaves() and test for it
* smt: implement single subtree-8 hashing, w/ benchmarks & tests

This will be composed into depth-8-subtree-based computation of entire
sparse Merkle trees.

* merkle: add a benchmark for constructing 256-balanced trees

This is intended for comparison with the benchmarks from the previous
commit. This benchmark represents the theoretical perfect-efficiency
performance we could possibly (but impractically) get for computing
depth-8 sparse Merkle subtrees.

* smt: test that SparseMerkleTree::build_subtree() is composable

* smt: test that subtree logic can correctly construct an entire tree

This commit ensures that `SparseMerkleTree::build_subtree()` can
correctly compose into building an entire sparse Merkle tree, without
yet getting into potential complications concurrency introduces.

* smt: implement test for basic parallelized subtree computation w/ rayon

Building on the previous commit, this commit implements a test proving
that `SparseMerkleTree::build_subtree()` can be composed into itself not
just concurrently, but in parallel, without issue.

* smt: add from_raw_parts() to trait interface

This commit adds a new required method to the SparseMerkleTree trait,
to allow generic construction from pre-computed parts.

This will be used to add a generic version of `with_entries()` in a
later commit.

* smt: add parallel constructors to Smt and SimpleSmt

What the previous few commits have been leading up to: SparseMerkleTree
now has a function to construct the tree from existing data in parallel.
This is significantly faster than the singlethreaded equivalent.
Benchmarks incoming!

---------

Co-authored-by: krushimir <krushimir@reilabs.co>
Co-authored-by: krushimir <kresimir.grofelnik@reilabs.io>
2024-12-04 10:54:41 -08:00
..
hash.rs docs: update changelog and readme 2024-02-14 11:52:40 -08:00
merkle.rs feat: implement concurrent Smt construction (#341) 2024-12-04 10:54:41 -08:00
README.md docs: update changelog and readme 2024-02-14 11:52:40 -08:00
smt-subtree.rs feat: implement concurrent Smt construction (#341) 2024-12-04 10:54:41 -08:00
smt-with-entries.rs feat: implement concurrent Smt construction (#341) 2024-12-04 10:54:41 -08:00
smt.rs fix: clippy warnings (#280) 2024-02-21 20:55:02 -08:00
store.rs fix: clippy warnings (#280) 2024-02-21 20:55:02 -08:00

Miden VM Hash Functions

In the Miden VM, we make use of different hash functions. Some of these are "traditional" hash functions, like BLAKE3, which are optimized for out-of-STARK performance, while others are algebraic hash functions, like Rescue Prime, and are more optimized for a better performance inside the STARK. In what follows, we benchmark several such hash functions and compare against other constructions that are used by other proving systems. More precisely, we benchmark:

  • BLAKE3 as specified here and implemented here (with a wrapper exposed via this crate).
  • SHA3 as specified here and implemented here.
  • Poseidon as specified here and implemented here (but in pure Rust, without vectorized instructions).
  • Rescue Prime (RP) as specified here and implemented here.
  • Rescue Prime Optimized (RPO) as specified here and implemented in this crate.
  • Rescue Prime Extended (RPX) a variant of the xHash hash function as implemented in this crate.

Comparison and Instructions

Comparison

We benchmark the above hash functions using two scenarios. The first is a 2-to-1 (a,b)\mapsto h(a,b) hashing where both a, b and h(a,b) are the digests corresponding to each of the hash functions. The second scenario is that of sequential hashing where we take a sequence of length 100 field elements and hash these to produce a single digest. The digests are 4 field elements in a prime field with modulus 2^{64} - 2^{32} + 1 (i.e., 32 bytes) for Poseidon, Rescue Prime and RPO, and an array [u8; 32] for SHA3 and BLAKE3.

Scenario 1: 2-to-1 hashing h(a,b)

Function BLAKE3 SHA3 Poseidon Rp64_256 RPO_256 RPX_256
Apple M1 Pro 76 ns 245 ns 1.5 µs 9.1 µs 5.2 µs 2.7 µs
Apple M2 Max 71 ns 233 ns 1.3 µs 7.9 µs 4.6 µs 2.4 µs
Amazon Graviton 3 108 ns 5.3 µs 3.1 µs
AMD Ryzen 9 5950X 64 ns 273 ns 1.2 µs 9.1 µs 5.5 µs
AMD EPYC 9R14 83 ns 4.3 µs 2.4 µs
Intel Core i5-8279U 68 ns 536 ns 2.0 µs 13.6 µs 8.5 µs 4.4 µs
Intel Xeon 8375C 67 ns 8.2 µs

Scenario 2: Sequential hashing of 100 elements h([a_0,...,a_99])

Function BLAKE3 SHA3 Poseidon Rp64_256 RPO_256 RPX_256
Apple M1 Pro 1.0 µs 1.5 µs 19.4 µs 118 µs 69 µs 35 µs
Apple M2 Max 0.9 µs 1.5 µs 17.4 µs 103 µs 60 µs 31 µs
Amazon Graviton 3 1.4 µs 69 µs 41 µs
AMD Ryzen 9 5950X 0.8 µs 1.7 µs 15.7 µs 120 µs 72 µs
AMD EPYC 9R14 0.9 µs 56 µs 32 µs
Intel Core i5-8279U 0.9 µs 107 µs 56 µs
Intel Xeon 8375C 0.8 µs 110 µs

Notes:

  • On Graviton 3, RPO256 and RPX256 are run with SVE acceleration enabled.
  • On AMD EPYC 9R14, RPO256 and RPX256 are run with AVX2 acceleration enabled.

Instructions

Before you can run the benchmarks, you'll need to make sure you have Rust installed. After that, to run the benchmarks for RPO and BLAKE3, clone the current repository, and from the root directory of the repo run the following:

cargo bench hash

To run the benchmarks for Rescue Prime, Poseidon and SHA3, clone the following repository as above, then checkout the hash-functions-benches branch, and from the root directory of the repo run the following:

cargo bench hash