nvSubquadratic Documentation#

nvsubquadratic is a unified PyTorch-native library for subquadratic alternatives to quadratic attention. It consolidates efforts from across NVIDIA Research teams (nvResearch, NeMo, BioNeMo) into a single, consistent API. The current release supports multi-dimensional (1D, 2D, 3D) Hyena operators backed by optimized CUDA kernels from subquadratic_ops_torch. Hyena operators provide subquadratic alternatives to attention, achieving O(N log N) complexity compared with O(N^2) for traditional attention.

Installation#

The package is installed from source:

pip install -e .

To enable the optional fused RMSNorm kernel on Hopper / Blackwell GPUs:

pip install -e ".[quack]"

Requirements#

  • CUDA-compatible NVIDIA GPU (Ampere or newer)

  • CUDA Toolkit 12.0 or higher

  • Python 3.11 or higher

Where to go next#

  • Getting Started — install, requirements, and a minimal “Hello, Hyena” forward pass.

  • Architecture — the three-layer nvSubquadratic / subquadratic-ops / megatron-core story and the BHL/BLH naming conventions.

  • Package Overview — bottom-up tour of what’s inside nvsubquadratic/ (ops / modules / networks / parallel / utils).

  • Examples — per-dataset training recipes under examples/.

  • Benchmarks — ViT-5-Small throughput tables and FLOP scaling.

  • Reports — long-form technical reports backed by reproducible scripts and figures.

  • Ops Overview — math primer and decision tree for the FFT convolution primitives.

  • API Reference — auto-generated reference for the curated public surface organised by package (ops, modules, networks, parallel, core, experiments).

Contributor docs#

  • CONVENTIONS.md — Google-style docstring guide and PR checklist (lives at the repo root).

  • docs-tracker.md — documentation coverage status per file.