BLUE
Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts
LClchoshen.bsky.social

Pruning gets wors with overparametrization Testing their combinatorial method Zhang&Papayan( @stats285 ) find that when adding (unneeded) parameters you end up with more (absolute) number of parameters for the same performance. #ICML2024

1
LClchoshen.bsky.social

We are sample inefficient, well "we" are not, but our models are. What are we missing, the use of non-text grounding? Architecture? Curriculum? babylm.github.io Join babyLM challenge , pretrain with 100M tokens

1
LClchoshen.bsky.social

Want to use the current ~3M chats in the open? Or perhaps give back, two clicks and you share your chats: sharelm.github.io More on open(!) feedback soon

1
LClchoshen.bsky.social

One thing I am recently especially fascinated about is feedback, models that improve through interaction. We recently showed that chats already contain feedback and we can mine it (170K) huggingface.co/datasets/sha...

shachardon/ShareLM · Datasets at Hugging Face
shachardon/ShareLM · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

1
LClchoshen.bsky.social

If models are aligned we can combine them, you probably know TIES. Soon approach to deal with differences between loras in merging. Until then, do you have ideas on how to recycle models? Select models? Merge? Join the neurips competition: llm-merging.github.io

1
LClchoshen.bsky.social

You got till here? Great, share some of your thoughts and questions or like🥰

1
LClchoshen.bsky.social

Can we better understand LoRAs? Apparently you don't need to train A (but you need B) arxiv.org/abs/2402.16842 We compress Lots of Loras (lol😅) and show you can serve a 1000 at a fraction of the cost, due to their weight similarities

Asymmetry in Low-Rank Adapters of Foundation Models
Asymmetry in Low-Rank Adapters of Foundation Models

Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by...

1
LClchoshen.bsky.social

Why evaluate on huge datasets when a fast check would get you most of the way? arxiv.org/abs/2402.14992arxiv.org/abs/2308.11696 e.g. (recent) evaluate on multi prompts

1
LClchoshen.bsky.social

ICML FOMO? I'll share papers from here In #ICML2024 ? Talk to me e.g. on Tinybenchmarks🐭 LoRA's weight characteristics (asymmetry)☯️ Model merging♻️ open human feedback🗣️ BabyLM👼 Details (or highlights of recent research):🤖

1
Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts