Estimating worst case frontier risks of open weight LLMs
In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Wow
0
Sad
0
Angry
0
Comments (0)