BlockDiagonalLearnableOmegaSIRENKernelND#
- class BlockDiagonalLearnableOmegaSIRENKernelND(
- out_dim,
- data_dim,
- mlp_hidden_dim,
- num_layers,
- embedding_dim,
- L_cache,
- use_bias,
- num_blocks=8,
- omega_0_min=1.0,
- omega_0_max=12.0,
- schedule='linear',
- off_block_scale=0.1,
- omega_0_per_block=None,
- omega_0_scale_min=1e-2,
- omega_0_scale_max=2.0,
- hidden_omega_0=1.0,
- apply_lr_scale=False,
- film_cfg=None,
- film_after_pos_embed=False,
Bases:
LearnableOmegaSIRENKernelNDBlock-diagonal learnable-ω₀ SIREN kernel.
Combines two ideas:
Block-diagonal MLP init (from
BlockDiagonalMultiOmegaSIRENKernelND): every hidden linear and the output linear have their weights multiplied by a block mask — block-diagonal entries kept at 1.0, off-block entries scaled byoff_block_scale.Learnable per-row ω₀ schedule (from
LearnableOmegaSIRENKernelND): the first-layer scale is initialized to a per-block schedule. We absorb the largest block’s ω₀ into the constant2π · omega_0_maxruntime factor and let the learnable scale carry the relative schedule, initialized toomega_0_per_block / omega_0_maxso that the effective per-row ω₀ at init equals the original block-diagonal schedule. The scale is clamped to[omega_0_scale_min, omega_0_scale_max](default[1e-2, 2]), giving every row room to up to double its effective ω₀ during training without any row ever collapsing to zero frequency.
With
omega_0_scale_initleft at its default and the schedule built from(omega_0_min, omega_0_max, schedule), the kernel at init matchesBlockDiagonalMultiOmegaSIRENKernelND(modulo the fp32-mid-cast in the positional embedding’s forward).embedding_dim,mlp_hidden_dim, andout_dimmust all be divisible bynum_blocks.When
apply_lr_scale=Truethe first-layer weight gets_lr_scale = 1/(2π · omega_0_max)— a single conservative scalar chosen to match the highest-frequency block. This is the SIREN-paper LR compensation; the lowest-ω₀ block trains relatively slower under this scheme but the gradient norm of every row is upper-bounded by that of the most-aggressive row, which is the dimension that sets the largest update step size in AdamW.- Parameters:
num_blocks (int) – Number of ω₀ blocks; must divide
embedding_dim,mlp_hidden_dim, andout_dim.omega_0_min (float) – Lower endpoint of the schedule. Ignored if
omega_0_per_blockis supplied (schedule endpoints are then read from the supplied vector).omega_0_max (float) – Upper endpoint of the schedule; also sets the constant runtime
2π · omega_0_maxfactor that is pulled out of the weight init.schedule (str) –
"linear"or"log".off_block_scale (float) – Off-diagonal scaling for the hidden + output linear block masks.
0.0→ strict block-diagonal;1.0→ equivalent to a denseLearnableOmegaSIRENKernelND.omega_0_per_block (Sequence[float] | Tensor | None) – Optional explicit ω₀ schedule of length
num_blocks. Overridesomega_0_min/omega_0_max/schedulewhen supplied.omega_0_scale_min (float) – Lower clamp on the per-row scale (default
1e-2). The strictly-positive floor keeps every row’s effective ω₀ above1e-2 · omega_0_maxso no row’s first-layer sine collapses to a constant.omega_0_scale_max (float) – Upper clamp on the per-row scale (default
2).apply_lr_scale (bool) – When
True, attach_lr_scale = 1/(2π·omega_0_max)to the first-layer weight. DefaultFalse.out_dim (int)
data_dim (int)
mlp_hidden_dim (int)
num_layers (int)
embedding_dim (int)
use_bias (bool)
hidden_omega_0 (float)
film_cfg (LazyConfig | None)
film_after_pos_embed (bool)
All other constructor arguments (
out_dim,data_dim,mlp_hidden_dim,num_layers,embedding_dim,L_cache,use_bias,hidden_omega_0,film_cfg,film_after_pos_embed) have the same meaning as inSIRENKernelND.- omega_0_per_block#
Non-persistent float32 buffer of shape
[num_blocks]holding the per-block omega_0 schedule.- Type:
- positional_embedding#
First layer with learnable per-row omega_0 scale;
omega_0_constis set tomax(omega_0_per_block)andomega_0_scaleis initialised toomega_0_per_block / omega_0_constper row.
- hidden_linears, out_linear, film_generator
Inherited from
SIRENKernelND; see that class.
- __init__(
- out_dim,
- data_dim,
- mlp_hidden_dim,
- num_layers,
- embedding_dim,
- L_cache,
- use_bias,
- num_blocks=8,
- omega_0_min=1.0,
- omega_0_max=12.0,
- schedule='linear',
- off_block_scale=0.1,
- omega_0_per_block=None,
- omega_0_scale_min=1e-2,
- omega_0_scale_max=2.0,
- hidden_omega_0=1.0,
- apply_lr_scale=False,
- film_cfg=None,
- film_after_pos_embed=False,
Initialize the block-diagonal learnable-omega SIREN kernel; see the class docstring.
- Parameters:
out_dim (int)
data_dim (int)
mlp_hidden_dim (int)
num_layers (int)
embedding_dim (int)
use_bias (bool)
num_blocks (int)
omega_0_min (float)
omega_0_max (float)
schedule (str)
off_block_scale (float)
omega_0_scale_min (float)
omega_0_scale_max (float)
hidden_omega_0 (float)
apply_lr_scale (bool)
film_cfg (LazyConfig | None)
film_after_pos_embed (bool)