fftconv2d_bhl#

fftconv2d_bhl(x, kernel, shortcut=None)#

2D FFT convolution via subq_ops CUDA kernel, BHL layout [B, H, X, Y].

Drop-in replacement for nvsubquadratic.ops.fftconv.fftconv2d_fp32_bhl(). Accepts any input dtype; internally casts to fp32 for the CUDA kernel and returns the output in the original dtype of x.

Parameters:
  • x (Tensor) – Input tensor [B, H, X, Y].

  • kernel (Tensor) – Kernel tensor [1|B, H, Kx, Ky].

  • shortcut (Tensor | None) – Optional per-channel scale [H].

Returns:

Output tensor [B, H, X, Y] in x.dtype.

Return type:

Tensor