small_init#

small_init(dim)#

Dim-dependent initializer from “Transformers without Tears” (Nguyen & Salazar, 2019).

Computes std = sqrt(2 / (5 * dim)) and returns a normal initializer with that standard deviation.

Parameters:

dim (int) – Layer width used to compute the standard deviation.

Returns:

A callable fn(tensor) -> tensor that initializes the tensor in-place with normal_(mean=0, std=sqrt(2 / (5 * dim))).

Return type:

Callable[[Tensor], Tensor]