LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts
Currently, benchmark comparisons are often ad-hoc and inconsistent making results untrustworthy and benchmark choice 🤮 BenchBench & our findings: www.alphaxiv.org/abs/2407.13696 offer standard and transparent comparisons to reduce variance and increase confidence in your evaluations!🎉
No need to manually gather and compare benchmark data! BenchBench provides a centralized platform with a curated database and standardized methodology for effortless benchmark agreement testing. You can also use them with our package here: github.com/IBM/BenchBench
GitHub - IBM/benchbench: A package dedicated for running benchmark agreement testing
A package dedicated for running benchmark agreement testing - IBM/benchbench
LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts