BLUE
Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts
LClchoshen.bsky.social

No need to manually gather and compare benchmark data! BenchBench provides a centralized platform with a curated database and standardized methodology for effortless benchmark agreement testing. You can also use them with our package here: github.com/IBM/BenchBench

GitHub - IBM/benchbench: A package dedicated for running benchmark agreement testing
GitHub - IBM/benchbench: A package dedicated for running benchmark agreement testing

A package dedicated for running benchmark agreement testing - IBM/benchbench

1
LClchoshen.bsky.social

Currently, benchmark comparisons are often ad-hoc and inconsistent making results untrustworthy and benchmark choice 🤮 BenchBench & our findings: www.alphaxiv.org/abs/2407.13696 offer standard and transparent comparisons to reduce variance and increase confidence in your evaluations!🎉

1
LClchoshen.bsky.social

People claim to only trust chatbot arena!🤖 This is impossible though, cheaper datasets give the same scores... So, what if you don't have an army of annotators at your disposal? 🤔 www.alphaxiv.org/pdf/2407.13696huggingface.co/spaces/ibm/b...#ML#evaluation#llm#llms#machinelearning#data#data

BenchBench Leaderboad - a Hugging Face Space by ibm
BenchBench Leaderboad - a Hugging Face Space by ibm

Discover amazing ML apps made by the community

1
LClchoshen.bsky.social

Yes, we totally agree here (well you put it extremely, but in essence)

0
LClchoshen.bsky.social

The feedback from all models will be open and collected in one pool, helping beyond the specialized models created to future research and general improvement

0
LClchoshen.bsky.social

We believe a successful ecosystem must center around feedback loops where anyone can spin up a community model, for storytelling, Bengali or anything else Others can use it, give feedback, and benefit from a model that keeps improving with the contributions

2
LClchoshen.bsky.social

Then, we hone on 6 crucial areas to develop open human feedback ecosystems: Incentives to contribute, reducing contribution efforts, getting expert and diverse feedback, ongoing dynamic feedback, privacy and legal issues.

1
LClchoshen.bsky.social

In our paper, we first learn from peer production efforts like wiki and stack overflow. These case studies tell us how important it is to align incentives of different bodies, allow the community to dictate the policies, etc.

1
LClchoshen.bsky.social

We define 5 axes of openness: Methodology (how its collected) Access (who can use it) Models (one\many) Contributors (as diverse as its uses?) Time (keeps updating? closed models improve over several feedback iterations, and of course, models change) Is current feedback open?🥶

1
Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts