Improving instruction hierarchy in frontier LLMs
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Wow
0
Sad
0
Angry
0
Comments (0)