2025-01-04 04:27
I have a creeping intuition that the residual connections, flowing through the network with no degradation/impediment, are somehow holding back modern large transformer architectures. ResNet was a breakthrough, but I wonder if there's another way that encourages better internal representations and specializations.