drop_path#

drop_path(x, drop_prob, training)#

Apply per-sample stochastic depth (functional form).

During training each sample in the batch is independently kept or dropped with probability 1 - drop_prob / drop_prob respectively. The kept samples are rescaled by 1 / (1 - drop_prob) to preserve the expected magnitude. At inference time the function is an identity.

Parameters:
  • x (Tensor) – Input tensor of shape [B, *] — any layout; the drop mask has shape (B, 1, …, 1) and broadcasts over all non-batch dimensions.

  • drop_prob (float) – Probability of dropping a sample’s contribution. 0.0 disables dropping; 1.0 zeros every sample (safe — the implementation guards against dividing by keep_prob when it is zero, so no inf/NaN is produced).

  • training (bool) – Whether the model is currently in training mode. Set to False (or call model.eval()) to disable dropping.

Returns:

Same shape and dtype as x. During training, approximately drop_prob * B samples are zeroed and the rest are rescaled. During inference, returns x unchanged.

Return type:

torch.Tensor