caliper bench

Caliper Bench Leaderboard

A creative-writing benchmark for LLMs · prose craft, style, willingness
35 models · generated 2026-05-25 23:05 UTC · Methodology · Submit a model →
Creative Writing Role Playing
↑ higher is better ↓ lower is better Click any column header to sort · hover for description C0–C3 refusal rate · C2/C4 engagement rate · EngD harm density on engaged refusable runs