LLM Security
LLM security aims to protect LLMs from vulnerabilities and misuse.
Types of attacks:
- Jailbreaks: Circumventing alignment, e.g. sending a harmful prompt in base64
- Prompt injection: Embedding harmful prompt in unexpected places
- Data poisoning/Backdoor: Poisoning (finetuning) dataset, e.g. with a code word
- Adversarial inputs
- Insecure output handling
- Data extraction & privacy
- Data reconstruction
- Denial of service
- Escalation
- Watermarking & evasion
- Model theft