BlockDiagonalLearnableOmegaSIRENKernelND#

class BlockDiagonalLearnableOmegaSIRENKernelND(
out_dim,
data_dim,
mlp_hidden_dim,
num_layers,
embedding_dim,
L_cache,
use_bias,
num_blocks=8,
omega_0_min=1.0,
omega_0_max=12.0,
schedule='linear',
off_block_scale=0.1,
omega_0_per_block=None,
omega_0_scale_min=1e-2,
omega_0_scale_max=2.0,
hidden_omega_0=1.0,
apply_lr_scale=False,
film_cfg=None,
film_after_pos_embed=False,
)#

Bases: LearnableOmegaSIRENKernelND

Block-diagonal learnable-ω₀ SIREN kernel.

Combines two ideas:

  1. Block-diagonal MLP init (from BlockDiagonalMultiOmegaSIRENKernelND): every hidden linear and the output linear have their weights multiplied by a block mask — block-diagonal entries kept at 1.0, off-block entries scaled by off_block_scale.

  2. Learnable per-row ω₀ schedule (from LearnableOmegaSIRENKernelND): the first-layer scale is initialized to a per-block schedule. We absorb the largest block’s ω₀ into the constant · omega_0_max runtime factor and let the learnable scale carry the relative schedule, initialized to omega_0_per_block / omega_0_max so that the effective per-row ω₀ at init equals the original block-diagonal schedule. The scale is clamped to [omega_0_scale_min, omega_0_scale_max] (default [1e-2, 2]), giving every row room to up to double its effective ω₀ during training without any row ever collapsing to zero frequency.

With omega_0_scale_init left at its default and the schedule built from (omega_0_min, omega_0_max, schedule), the kernel at init matches BlockDiagonalMultiOmegaSIRENKernelND (modulo the fp32-mid-cast in the positional embedding’s forward).

embedding_dim, mlp_hidden_dim, and out_dim must all be divisible by num_blocks.

When apply_lr_scale=True the first-layer weight gets _lr_scale = 1/(2π · omega_0_max) — a single conservative scalar chosen to match the highest-frequency block. This is the SIREN-paper LR compensation; the lowest-ω₀ block trains relatively slower under this scheme but the gradient norm of every row is upper-bounded by that of the most-aggressive row, which is the dimension that sets the largest update step size in AdamW.

Parameters:
  • num_blocks (int) – Number of ω₀ blocks; must divide embedding_dim, mlp_hidden_dim, and out_dim.

  • omega_0_min (float) – Lower endpoint of the schedule. Ignored if omega_0_per_block is supplied (schedule endpoints are then read from the supplied vector).

  • omega_0_max (float) – Upper endpoint of the schedule; also sets the constant runtime · omega_0_max factor that is pulled out of the weight init.

  • schedule (str) – "linear" or "log".

  • off_block_scale (float) – Off-diagonal scaling for the hidden + output linear block masks. 0.0 → strict block-diagonal; 1.0 → equivalent to a dense LearnableOmegaSIRENKernelND.

  • omega_0_per_block (Sequence[float] | Tensor | None) – Optional explicit ω₀ schedule of length num_blocks. Overrides omega_0_min/omega_0_max/ schedule when supplied.

  • omega_0_scale_min (float) – Lower clamp on the per-row scale (default 1e-2). The strictly-positive floor keeps every row’s effective ω₀ above 1e-2 · omega_0_max so no row’s first-layer sine collapses to a constant.

  • omega_0_scale_max (float) – Upper clamp on the per-row scale (default 2).

  • apply_lr_scale (bool) – When True, attach _lr_scale = 1/(2π·omega_0_max) to the first-layer weight. Default False.

  • out_dim (int)

  • data_dim (int)

  • mlp_hidden_dim (int)

  • num_layers (int)

  • embedding_dim (int)

  • L_cache (int | Sequence[int])

  • use_bias (bool)

  • hidden_omega_0 (float)

  • film_cfg (LazyConfig | None)

  • film_after_pos_embed (bool)

All other constructor arguments (out_dim, data_dim, mlp_hidden_dim, num_layers, embedding_dim, L_cache, use_bias, hidden_omega_0, film_cfg, film_after_pos_embed) have the same meaning as in SIRENKernelND.

num_blocks#

Number of frequency blocks.

Type:

int

off_block_scale#

Off-diagonal weight scale applied at init.

Type:

float

omega_0_per_block#

Non-persistent float32 buffer of shape [num_blocks] holding the per-block omega_0 schedule.

Type:

torch.Tensor

positional_embedding#

First layer with learnable per-row omega_0 scale; omega_0_const is set to max(omega_0_per_block) and omega_0_scale is initialised to omega_0_per_block / omega_0_const per row.

Type:

LearnableOmegaSIRENPositionalEmbeddingND

hidden_linears, out_linear, film_generator

Inherited from SIRENKernelND; see that class.

__init__(
out_dim,
data_dim,
mlp_hidden_dim,
num_layers,
embedding_dim,
L_cache,
use_bias,
num_blocks=8,
omega_0_min=1.0,
omega_0_max=12.0,
schedule='linear',
off_block_scale=0.1,
omega_0_per_block=None,
omega_0_scale_min=1e-2,
omega_0_scale_max=2.0,
hidden_omega_0=1.0,
apply_lr_scale=False,
film_cfg=None,
film_after_pos_embed=False,
)#

Initialize the block-diagonal learnable-omega SIREN kernel; see the class docstring.

Parameters: