ResumableSequentialLR#

class ResumableSequentialLR(
optimizer,
schedulers,
milestones,
last_epoch=-1,
)#

Bases: SequentialLR

SequentialLR with a corrected load_state_dict.

Bug (PyTorch <= 2.10, confirmed on 2.10.0+cu129):

SequentialLR.load_state_dict correctly deserializes its internal bookkeeping (_last_lr, sub-scheduler states, last_epoch) but never writes the restored learning rates back to ``optimizer.param_groups``. As a result, after loading a checkpoint the optimizer silently continues with the LR that the freshly constructed scheduler initialized (typically the warmup start value), rather than the LR the training had reached before the checkpoint was saved. In practice this means the LR schedule restarts from zero on every job resume.

Fix:

After the parent load_state_dict finishes, copy _last_lr into the optimizer’s param_groups so the next optimizer.step() uses the correct restored learning rate.

See tests/test_checkpoint_resume.py::TestResumableSequentialLR for round-trip verification and a sentinel test that confirms the upstream bug still exists.

Parameters:
  • optimizer (Optimizer)

  • schedulers (list[LRScheduler])

  • milestones (list[int])

  • last_epoch (int)

load_state_dict(state_dict)#

Load scheduler state and propagate restored LRs to the optimizer.

Calls the parent SequentialLR.load_state_dict, then immediately copies each value in self._last_lr into the corresponding optimizer.param_groups[i]["lr"]. This ensures that the first optimizer.step() after a resume uses the learning rate that was active when the checkpoint was saved, rather than the freshly initialised (warmup-start) value.

Parameters:

state_dict (dict) – Scheduler state dictionary as produced by state_dict(). Typically loaded from a checkpoint with torch.load and passed directly to this method.

Return type:

None