History

Krushimir 8b1593f0ba refactor: folder structure (#397 )		2025-03-15 17:13:58 -07:00
..
hash.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
merkle.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
README.md	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
smt-subtree.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
smt-with-entries.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
smt.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00
store.rs	refactor: folder structure (#397 )	2025-03-15 17:13:58 -07:00

README.md

Benchmarks

Hash Functions

In the Miden VM, we make use of different hash functions. Some of these are "traditional" hash functions, like BLAKE3, which are optimized for out-of-STARK performance, while others are algebraic hash functions, like Rescue Prime, and are more optimized for a better performance inside the STARK. In what follows, we benchmark several such hash functions and compare against other constructions that are used by other proving systems. More precisely, we benchmark:

BLAKE3 as specified here and implemented here (with a wrapper exposed via this crate).
SHA3 as specified here and implemented here.
Poseidon as specified here and implemented here (but in pure Rust, without vectorized instructions).
Rescue Prime (RP) as specified here and implemented here.
Rescue Prime Optimized (RPO) as specified here and implemented in this crate.
Rescue Prime Extended (RPX) a variant of the xHash hash function as implemented in this crate.

We benchmark the above hash functions using two scenarios. The first is a 2-to-1 (a,b)\mapsto h(a,b) hashing where both a, b and h(a,b) are the digests corresponding to each of the hash functions. The second scenario is that of sequential hashing where we take a sequence of length 100 field elements and hash these to produce a single digest. The digests are 4 field elements in a prime field with modulus 2^{64} - 2^{32} + 1 (i.e., 32 bytes) for Poseidon, Rescue Prime and RPO, and an array [u8; 32] for SHA3 and BLAKE3.

Scenario 1: 2-to-1 hashing `h(a,b)`

Function	BLAKE3	SHA3	Poseidon	Rp64_256	RPO_256	RPX_256
Apple M1 Pro	76 ns	245 ns	1.5 µs	9.1 µs	5.2 µs	2.7 µs
Apple M2 Max	71 ns	233 ns	1.3 µs	7.9 µs	4.6 µs	2.4 µs
Amazon Graviton 3	108 ns				5.3 µs	3.1 µs
Amazon Graviton 4	96 ns				5.1 µs	2.8 µs
AMD Ryzen 9 5950X	64 ns	273 ns	1.2 µs	9.1 µs	5.5 µs
AMD EPYC 9R14	83 ns				4.3 µs	2.4 µs
Intel Core i5-8279U	68 ns	536 ns	2.0 µs	13.6 µs	8.5 µs	4.4 µs
Intel Xeon 8375C	67 ns				8.2 µs

Scenario 2: Sequential hashing of 100 elements `h([a_0,...,a_99])`

Function	BLAKE3	SHA3	Poseidon	Rp64_256	RPO_256	RPX_256
Apple M1 Pro	1.0 µs	1.5 µs	19.4 µs	118 µs	69 µs	35 µs
Apple M2 Max	0.9 µs	1.5 µs	17.4 µs	103 µs	60 µs	31 µs
Amazon Graviton 3	1.4 µs				69 µs	41 µs
Amazon Graviton 4	1.2 µs				67 µs	36 µs
AMD Ryzen 9 5950X	0.8 µs	1.7 µs	15.7 µs	120 µs	72 µs
AMD EPYC 9R14	0.9 µs				56 µs	32 µs
Intel Core i5-8279U	0.9 µs				107 µs	56 µs
Intel Xeon 8375C	0.8 µs				110 µs

Notes:

On Graviton 3 and 4, RPO256 and RPX256 are run with SVE acceleration enabled.
On AMD EPYC 9R14, RPO256 and RPX256 are run with AVX2 acceleration enabled.

Sparse Merkle Tree

We build cryptographic data structures incorporating these hash functions. What follows are benchmarks of operations on sparse Merkle trees (SMTs) which use the above RPO_256 hash function. We perform a batched modification of 1,000 values in a tree with 1,000,000 leaves (with the smt_hashmaps feature to use the hashbrown crate).

Scenario 1: SMT Construction (1M pairs)

Hardware	Sequential	Concurrent	Improvement
AMD Ryzen 9 7950X	196 sec	15 sec	13x
Apple M1 Air	352 sec	57 sec	6.2x
Apple M1 Pro	351 sec	37 sec	9.5x
Apple M4 Max	195 sec	15 sec	13x

Scenario 2: SMT Batched Insertion (1k pairs, 1M leaves)

Function	Sequential	Concurrent	Improvement
AMD Ryzen 9 7950X	201 ms	19 ms	11x
Apple M1 Air	729 ms	406 ms	1.8x
Apple M1 Pro	623 ms	86 ms	7.2x
Apple M4 Max	212 ms	28 ms	7.6x

Scenario 3: SMT Batched Update (1k pairs, 1M leaves)

Function	Sequential	Concurrent	Improvement
AMD Ryzen 9 7950X	202 ms	19 ms	11x
Apple M1 Air	691 ms	307 ms	2.3x
Apple M1 Pro	419 ms	56 ms	7.5x
Apple M4 Max	218 ms	24 ms	9.1x

Notes:

On AMD Ryzen 9 7950X, benchmarks are run with AVX2 acceleration enabled.

Instructions

Before you can run the benchmarks, you'll need to make sure you have Rust installed. After that, to run the benchmarks for RPO and BLAKE3, clone the current repository, and from the root directory of the repo run the following:

cargo bench hash

To run the benchmarks for Rescue Prime, Poseidon and SHA3, clone the following repository as above, then checkout the hash-functions-benches branch, and from the root directory of the repo run the following:

cargo bench hash

To run the benchmarks for SMT operations, run the binary target with the executable feature:

cargo run --features=executable

The concurrent feature enables the concurrent benchmark, and is enabled by default. To run a sequential benchmark, disable the crate's default features:

cargo run --no-default-features --features=executable,smt_hashmaps

The benchmark parameters may also be customized with the -s/--size, -i/--insertions, and -u/--updates options.