nvSubquadratic Documentation#
nvsubquadratic is a unified PyTorch-native library for subquadratic
alternatives to quadratic attention. It consolidates efforts from across
NVIDIA Research teams (nvResearch, NeMo, BioNeMo) into a single, consistent
API. The current release supports multi-dimensional (1D, 2D, 3D) Hyena
operators backed by optimized CUDA kernels from
subquadratic_ops_torch. Hyena operators provide subquadratic
alternatives to attention, achieving O(N log N) complexity compared with
O(N^2) for traditional attention.
Installation#
The package is installed from source:
pip install -e .
To enable the optional fused RMSNorm kernel on Hopper / Blackwell GPUs:
pip install -e ".[quack]"
Requirements#
CUDA-compatible NVIDIA GPU (Ampere or newer)
CUDA Toolkit 12.0 or higher
Python 3.11 or higher
Where to go next#
Getting Started — install, requirements, and a minimal “Hello, Hyena” forward pass.
Architecture — the three-layer nvSubquadratic / subquadratic-ops / megatron-core story and the BHL/BLH naming conventions.
Package Overview — bottom-up tour of what’s inside
nvsubquadratic/(ops / modules / networks / parallel / utils).Examples — per-dataset training recipes under
examples/.Benchmarks — ViT-5-Small throughput tables and FLOP scaling.
Reports — long-form technical reports backed by reproducible scripts and figures.
Ops Overview — math primer and decision tree for the FFT convolution primitives.
API Reference — auto-generated reference for the curated public surface organised by package (ops, modules, networks, parallel, core, experiments).
Contributor docs#
CONVENTIONS.md — Google-style docstring guide and PR checklist (lives at the repo root).
docs-tracker.md — documentation coverage status per file.