wang_init#
- wang_init(dim, num_layers)#
Depth-scaled initializer (Wang et al.).
Computes
std = 2 / (num_layers * sqrt(dim))and returns a normal initializer with that standard deviation.- Parameters:
- Returns:
A callable
fn(tensor) -> tensorthat initializes the tensor in-place withnormal_(mean=0, std=2 / (num_layers * sqrt(dim))).- Return type: