nvSubquadratic Documentation#

nvsubquadratic is a unified PyTorch-native library for subquadratic alternatives to quadratic attention. It consolidates efforts from across NVIDIA Research teams (nvResearch, NeMo, BioNeMo) into a single, consistent API. The current release supports multi-dimensional (1D, 2D, 3D) Hyena operators backed by optimized CUDA kernels from subquadratic_ops_torch. Hyena operators provide subquadratic alternatives to attention, achieving O(N log N) complexity compared with O(N^2) for traditional attention.

Installation#

The package is installed from source:

pip install -e .

To enable the optional fused RMSNorm kernel on Hopper / Blackwell GPUs:

pip install -e ".[quack]"

Requirements#

CUDA-compatible NVIDIA GPU (Ampere or newer)
CUDA Toolkit 12.0 or higher
Python 3.11 or higher

Where to go next#

Getting Started — install, requirements, and a minimal “Hello, Hyena” forward pass.
Architecture — the three-layer nvSubquadratic / subquadratic-ops / megatron-core story and the BHL/BLH naming conventions.
Package Overview — bottom-up tour of what’s inside nvsubquadratic/ (ops / modules / networks / parallel / utils).
Examples — per-dataset training recipes under examples/.
Benchmarks — ViT-5-Small throughput tables and FLOP scaling.
Reports — long-form technical reports backed by reproducible scripts and figures.
Ops Overview — math primer and decision tree for the FFT convolution primitives.
API Reference — auto-generated reference for the curated public surface organised by package (ops, modules, networks, parallel, core, experiments).

Contributor docs#

CONVENTIONS.md — Google-style docstring guide and PR checklist (lives at the repo root).
docs-tracker.md — documentation coverage status per file.

nvSubquadratic Documentation#

Installation#

Requirements#

Where to go next#

Contributor docs#

Related projects#