Jack Zhang (JHU) “Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements”
Abstract The current paradigm for safety alignment of large language models (LLMs) follows a one-size-fits-all approach: the model refuses to interact with any content deemed unsafe by the model provider. This approach lacks flexibility in[…]