Core#

Top-level utilities: the lazy-instantiation system that powers every config file, weight-init helpers, QK-norm and rotary embedding primitives, the QuACK-kernel capability probe, and testing helpers.

Lazy configuration#

The lazy-instantiation system lets configs declare _target_-shaped specs that are deferred until instantiate is called. This is what every experiment config and most modules/ constructors rely on.

LazyConfig(target)

Deferred-instantiation config builder.

instantiate(config, *[, recursive_instantiate])

Instantiate an object from a LazyConfig or __target__ dict.

Initialisation helpers#

Truncated-normal and Wang/SmallInit factories used by SIREN, MLP, and projection layers.

trunc_normal_init([std])

Truncated-normal initializer with fixed standard deviation.

trunc_normal_init_factory([std])

Factory that returns fn(dim) -> fn(tensor) for truncated-normal init.

small_init(dim)

Dim-dependent initializer from "Transformers without Tears" (Nguyen & Salazar, 2019).

wang_init(dim, num_layers)

Depth-scaled initializer (Wang et al.).

partial_wang_init_fn_with_num_layers(num_layers)

Factory that returns partial(wang_init, num_layers=...).

QK normalization & rotary position embeddings#

Shared building blocks consumed by the attention and Hyena mixers.

L2Norm([dim, eps])

L2 normalisation layer — learnable-parameter-free, LazyConfig-friendly.

apply_qk_norm(query, key[, dim, eps])

L2-normalise query and key tensors along a given dimension.

apply_rope_1d_bhl(x, rope_1d_cache)

Apply 1D RoPE to a tensor laid out as [batch_size, hidden_dim, seq_len].

apply_rope_2d_bhl(x, rope_2d_cache)

Apply 2D RoPE to a tensor laid out as [batch_size, hidden_dim, H, W].

apply_rope_3d_bhl(x, rope_3d_cache)

Apply 3D RoPE to a tensor laid out as [batch_size, hidden_dim, D, H, W].

apply_rope_1d_blh(x, rope_1d_cache)

Apply 1D RoPE to a tensor laid out as [batch_size, seq_len, hidden_dim].

apply_rope_2d_blh(x, rope_2d_cache)

Apply 2D RoPE to a tensor laid out as [batch_size, H, W, hidden_dim].

apply_rope_3d_blh(x, rope_3d_cache)

Apply 3D RoPE to a tensor laid out as [batch_size, D, H, W, hidden_dim].

construct_rope_1d_cache_bhl(seq_len, dim, ...)

Construct the 1D RoPE cache for a given sequence length and hidden dimension.

construct_rope_2d_cache_bhl(height, width, ...)

Construct the 2D RoPE cache for a given (height, width) and per-axis dimension.

construct_rope_3d_cache_bhl(depth, height, ...)

Construct the 3D RoPE cache for given (depth, height, width) and per-axis dimension.

construct_rope_1d_cache_blh(seq_len, dim, ...)

Construct the 1D RoPE cache for a given sequence length and hidden dimension.

construct_rope_2d_cache_blh(height, width, ...)

Construct the 2D RoPE cache for a given (height, width) and per-axis dimension.

construct_rope_3d_cache_blh(depth, height, ...)

Construct the 3D RoPE cache for given (depth, height, width) and per-axis dimension.

QuACK capability probe#

cuda_supports_quack(device)

Return True if device supports QuACK fused kernels.

Testing helpers#

Small numerical-comparison helpers used by the test suite.

compute_relative_error(tensor1, tensor2)

Compute relative error between two tensors: ||t1 - t2|| / ||t1||.