An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks

DSpace Repository

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks

Author: Boreiko, Valentyn; Panfilov, Alexander; Voracek, Vaclav; Hein, Matthias; Geiping, Jonas
Tübinger Autor(en):
Boreiko, Valentyn
Voracek, Vaclav
Hein, Matthias
Geiping, Jonas
Issue year: 2025-05-30
Verlagsangabe: arXiv
Language: English
Full text: https://doi.org/10.48550/arXiv.2410.16222
DDC Classifikation: 004 - Data processing and computer science
Dokumentart: Preprint
Show full item record

This item appears in the following Collection(s)