small_init# small_init(dim)# Dim-dependent initializer from “Transformers without Tears” (Nguyen & Salazar, 2019). Computes std = sqrt(2 / (5 * dim)) and returns a normal initializer with that standard deviation. Parameters: dim (int) – Layer width used to compute the standard deviation. Returns: A callable fn(tensor) -> tensor that initializes the tensor in-place with normal_(mean=0, std=sqrt(2 / (5 * dim))). Return type: Callable[[Tensor], Tensor]