Building better AI benchmarks: How many raters are enough?

Algorithms & Theory

Liked Liked