LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts
Why evaluate on huge datasets when a fast check would get you most of the way? arxiv.org/abs/2402.14992arxiv.org/abs/2308.11696 e.g. (recent) evaluate on multi prompts
LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts