Neha Verma (JHU) – “Merging Feed-Forward Sublayers for Compressed Transformers”
Abstract With the rise and ubiquity of larger deep learning models, the need for high-quality compression techniques has been growing in order to deploy these models widely. The sheer parameter count of some models makes[…]