IterationSpeedCallback#
- class IterationSpeedCallback(*args, **kwargs)#
Bases:
CallbackLogs iteration throughput, fwd/bwd breakdown, and GPU memory to wandb.
Provides two families of metrics:
Wall-clock (
perf/wc_*): cumulative counters that track true training throughput. The timer pauses during validation and resumes when training continues, so these are immune to variable-frequency validation skewing the numbers.Windowed (
perf/iter_per_sec,perf/fwd_ms, etc.): rolling averages over the lastwindow_sizebatches. More responsive to local changes but can be noisy, especially duringtorch.compilewarmup (the firstwarmup_batchessteps are excluded).
GPU memory is sampled from
torch.cuda.max_memory_allocated/memory_allocatedand logged asperf/peak_gpu_mb/perf/current_gpu_mb.- Parameters:
- __init__(
- log_every_n_steps=10,
- window_size=None,
- batch_size_per_gpu=None,
- on_train_batch_start(
- trainer,
- pl_module,
- batch,
- batch_idx,
- on_before_backward(trainer, pl_module, loss)#
- on_after_backward(trainer, pl_module)#
- on_train_batch_end(
- trainer,
- pl_module,
- outputs,
- batch,
- batch_idx,
- on_validation_start(trainer, pl_module)#
- on_validation_end(trainer, pl_module)#