Neha Verma (JHU) – “Merging Feed-Forward Sublayers for Compressed Transformers”

When:
October 28, 2024 @ 12:00 pm – 1:15 pm
2024-10-28T12:00:00-04:00
2024-10-28T13:15:00-04:00
Where:
Hackerman Hall B17
3400 N CHARLES ST
Baltimore
MD 21218
Cost:
Free

Abstract

With the rise and ubiquity of larger deep learning models, the need for high-quality compression techniques has been growing in order to deploy these models widely. The sheer parameter count of some models makes it difficult to fit them into the memory constraints of some hardware. In this work, we present a novel approach to model compression by merging similar parameter groups within a model, rather than pruning away less important parameters. Specifically, we propose a straightforward method for selecting, aligning, and merging separate feed-forward sublayers in Transformer models, and test our method on language modeling, image classification, and machine translation. With our method, we demonstrate performance comparable to the original models across our three diverse tasks while combining more than a third of model feed-forward sublayers. For instance, we can remove over 21% of total parameters from a Vision Transformer, while maintaining 99% of its original performance. Additionally, we observe that some feed-forward sublayers often exhibit regions of high similarity between their activations, which may help explain their surprising mergeability.

Center for Language and Speech Processing