IterationSpeedCallback#

class IterationSpeedCallback(*args, **kwargs)#

Bases: Callback

Logs iteration throughput, fwd/bwd breakdown, and GPU memory to wandb.

Provides two families of metrics:

Wall-clock (perf/wc_*): cumulative counters that track true training throughput. The timer pauses during validation and resumes when training continues, so these are immune to variable-frequency validation skewing the numbers.
Windowed (perf/iter_per_sec, perf/fwd_ms, etc.): rolling averages over the last window_size batches. More responsive to local changes but can be noisy, especially during torch.compile warmup (the first warmup_batches steps are excluded).

GPU memory is sampled from torch.cuda.max_memory_allocated / memory_allocated and logged as perf/peak_gpu_mb / perf/current_gpu_mb.

Parameters:

log_every_n_steps (int) – How often to log speed metrics.
window_size (int | None) – Number of recent batch times to average over. Defaults to log_every_n_steps.
batch_size_per_gpu (int | None) – Batch size on each GPU (for samples/sec calc). If None, attempts to read from trainer.datamodule.

__init__( log_every_n_steps=10, window_size=None, batch_size_per_gpu=None, )#

Parameters:

on_train_batch_end( trainer, pl_module, outputs, batch, batch_idx, )#