KernelFiLMGenerator#
- class KernelFiLMGenerator(
- cond_dim,
- kernel_hidden_dim,
- num_film_layers,
- film_hidden_dim=64,
- no_weight_decay=False,
- init_type='identity',
- init_std=1e-4,
Bases:
ModuleMLP that generates per-layer FiLM (γ, β) pairs from a conditioning vector.
Given a conditioning signal
c ∈ ℝ^{cond_dim}(e.g. from register tokens processed byRegisterPoolingorRegisterCompressConcat), this module produces one(γ_l, β_l)pair per SIREN hidden layerl(SIREN = Sinusoidal Representation Network, Sitzmann et al. 2020, arXiv:2006.09661; seenvsubquadratic.modules.kernels_nd):h_l ← γ_l(c) ⊙ h_l + β_l(c)
The generator itself is a two-layer MLP with a GELU non-linearity:
- c → Linear(cond_dim, film_hidden_dim) → GELU
→ Linear(film_hidden_dim, num_film_layers × 2 × kernel_hidden_dim)
The flat output is split into
num_film_layerschunks; each chunk is further split in half to give(γ_l, β_l) ∈ ℝ^{kernel_hidden_dim}.Initialization strategy — The output layer is initialized so that at the start of training
γ_l = 1andβ_l = 0for every layer, making FiLM an identity modulation. This prevents early instability when the conditioning signal is still uninformative. The"small_random"variant perturbs the output weights slightly to break weight-symmetry while keeping the bias-induced identity.Weight-decay handling — All biases are permanently excluded from weight decay (
_no_weight_decay = True). Weight matrices can be excluded entirely (no_weight_decay=True) or assigned a custom decay value (no_weight_decay=<float>).Feature dimension of each SIREN hidden layer.
- Type:
- mlp#
Two-layer MLP mapping
[*, cond_dim]→[*, num_film_layers × 2 × kernel_hidden_dim]via afilm_hidden_dim-dimensional bottleneck (Linear → GELU → Linear).- Type:
nn.Sequential
- Parameters:
cond_dim (int) – Dimensionality of the conditioning input
c.kernel_hidden_dim (int) – Hidden dimension of the SIREN layers to modulate.
num_film_layers (int) – Number of (gamma, beta) pairs to produce (one per SIREN hidden layer). Must be ≥ 1.
film_hidden_dim (int) – Hidden dimension of the FiLM generator MLP (bottleneck).
no_weight_decay (bool | float) –
Controls weight decay for FiLM weight parameters. All biases are always excluded from weight decay regardless of this setting.
True: all parameters excluded from weight decay (_no_weight_decay=True).float: weight parameters placed in a dedicated optimizer group with this weight decay value (_weight_decay=<value>). Useful for mild regularization (e.g.1e-3) without full WD.False(default): weight parameters use the global optimizer weight decay.
init_type (Literal['identity', 'small_random']) –
How the output layer of the MLP is initialized:
"identity": Output weights=0, bias=(gamma=1, beta=0). Exact identity at init."small_random": Same bias but with output weights drawn from N(0,init_std) to break symmetry. Near-identity at init.
init_std (float) – Standard deviation for output-layer weight init when
init_type="small_random". Ignored for"identity".
- __init__(
- cond_dim,
- kernel_hidden_dim,
- num_film_layers,
- film_hidden_dim=64,
- no_weight_decay=False,
- init_type='identity',
- init_std=1e-4,
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- flop_count()#
Count FLOPs for the FiLM generator MLP (one sample).
The MLP maps a single conditioning vector
[cond_dim]to FiLM parameters[num_film_layers * 2 * kernel_hidden_dim]viaLinear(cond_dim, film_hidden_dim) -> GELU -> Linear(film_hidden_dim, out_dim).FLOPs breakdown:
First linear:
2 * cond_dim * film_hidden_dim(withcond_dim = self.mlp[0].in_featuresandfilm_hidden_dim = self.mlp[0].out_features).GELU activation:
film_hidden_dim(elementwise).Second linear:
2 * film_hidden_dim * out_dim, without_dim = num_film_layers * 2 * kernel_hidden_dim = self.mlp[2].out_features.
This runs once per sample per CKConvND layer that uses FiLM.
- Returns:
Total FLOPs as an integer.
- Return type:
- forward(conditioning)#
Generate per-layer FiLM parameters from the conditioning vector.
Runs the two-layer MLP on
conditioningand splits the flat output intonum_film_layers(γ, β)pairs. Each pair should be applied by the SIREN caller ash_l ← γ_l ⊙ h_l + β_l.- Parameters:
conditioning (Tensor) – Conditioning vector of shape
[B, cond_dim]. Typically produced byRegisterPoolingorRegisterCompressConcat.- Returns:
A list of
num_film_layerstuples(gamma, beta), where each tensor has shape[B, kernel_hidden_dim]. Index0corresponds to the first (shallowest) SIREN hidden layer and indexnum_film_layers - 1to the deepest.- Return type: