Core#

Top-level utilities: the lazy-instantiation system that powers every config file, weight-init helpers, QK-norm and rotary embedding primitives, the QuACK-kernel capability probe, and testing helpers.

Lazy configuration#

The lazy-instantiation system lets configs declare _target_-shaped specs that are deferred until instantiate is called. This is what every experiment config and most modules/ constructors rely on.

LazyConfig(target)

Deferred-instantiation config builder.

instantiate(config, *[, recursive_instantiate])

Instantiate an object from a LazyConfig or __target__ dict.

Initialisation helpers#

Truncated-normal and Wang/SmallInit factories used by SIREN, MLP, and projection layers.

`trunc_normal_init`([std])	Truncated-normal initializer with fixed standard deviation.
`trunc_normal_init_factory`([std])	Factory that returns `fn(dim) -> fn(tensor)` for truncated-normal init.
`small_init`(dim)	Dim-dependent initializer from "Transformers without Tears" (Nguyen & Salazar, 2019).
`wang_init`(dim, num_layers)	Depth-scaled initializer (Wang et al.).
`partial_wang_init_fn_with_num_layers`(num_layers)	Factory that returns `partial(wang_init, num_layers=...)`.

QK normalization & rotary position embeddings#

Shared building blocks consumed by the attention and Hyena mixers.

L2Norm([dim, eps])

L2 normalisation layer — learnable-parameter-free, LazyConfig-friendly.

`apply_qk_norm`(query, key[, dim, eps])	L2-normalise query and key tensors along a given dimension.
`apply_rope_1d_bhl`(x, rope_1d_cache)	Apply 1D RoPE to a tensor laid out as [batch_size, hidden_dim, seq_len].
`apply_rope_2d_bhl`(x, rope_2d_cache)	Apply 2D RoPE to a tensor laid out as [batch_size, hidden_dim, H, W].
`apply_rope_3d_bhl`(x, rope_3d_cache)	Apply 3D RoPE to a tensor laid out as [batch_size, hidden_dim, D, H, W].
`apply_rope_1d_blh`(x, rope_1d_cache)	Apply 1D RoPE to a tensor laid out as [batch_size, seq_len, hidden_dim].
`apply_rope_2d_blh`(x, rope_2d_cache)	Apply 2D RoPE to a tensor laid out as [batch_size, H, W, hidden_dim].
`apply_rope_3d_blh`(x, rope_3d_cache)	Apply 3D RoPE to a tensor laid out as [batch_size, D, H, W, hidden_dim].
`construct_rope_1d_cache_bhl`(seq_len, dim, ...)	Construct the 1D RoPE cache for a given sequence length and hidden dimension.
`construct_rope_2d_cache_bhl`(height, width, ...)	Construct the 2D RoPE cache for a given (height, width) and per-axis dimension.
`construct_rope_3d_cache_bhl`(depth, height, ...)	Construct the 3D RoPE cache for given (depth, height, width) and per-axis dimension.
`construct_rope_1d_cache_blh`(seq_len, dim, ...)	Construct the 1D RoPE cache for a given sequence length and hidden dimension.
`construct_rope_2d_cache_blh`(height, width, ...)	Construct the 2D RoPE cache for a given (height, width) and per-axis dimension.
`construct_rope_3d_cache_blh`(depth, height, ...)	Construct the 3D RoPE cache for given (depth, height, width) and per-axis dimension.

QuACK capability probe#

cuda_supports_quack(device)

Return True if device supports QuACK fused kernels.

Testing helpers#

Small numerical-comparison helpers used by the test suite.

compute_relative_error(tensor1, tensor2)

Compute relative error between two tensors: ||t1 - t2|| / ||t1||.