BLUE

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts

LClchoshen.bsky.socialJul 24, 2024 7:34am

Pruning gets wors with overparametrization Testing their combinatorial method Zhang&Papayan( @stats285 ) find that when adding (unneeded) parameters you end up with more (absolute) number of parameters for the same performance. #ICML2024

LClchoshen.bsky.socialJul 23, 2024 8:49am

We are sample inefficient, well "we" are not, but our models are. What are we missing, the use of non-text grounding? Architecture? Curriculum? babylm.github.io Join babyLM challenge , pretrain with 100M tokens

LClchoshen.bsky.socialJul 23, 2024 8:48am

Want to use the current ~3M chats in the open? Or perhaps give back, two clicks and you share your chats: sharelm.github.io More on open(!) feedback soon

LClchoshen.bsky.socialJul 23, 2024 8:48am

One thing I am recently especially fascinated about is feedback, models that improve through interaction. We recently showed that chats already contain feedback and we can mine it (170K) huggingface.co/datasets/sha...

shachardon/ShareLM · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

LClchoshen.bsky.socialJul 23, 2024 8:46am

If models are aligned we can combine them, you probably know TIES. Soon approach to deal with differences between loras in merging. Until then, do you have ideas on how to recycle models? Select models? Merge? Join the neurips competition: llm-merging.github.io

LClchoshen.bsky.socialJul 23, 2024 8:45am

You got till here? Great, share some of your thoughts and questions or like🥰

LClchoshen.bsky.socialJul 23, 2024 8:44am

Following, we show that loras are not parameter efficient Take a lora➡️throw 80% parameters➡️make it binary➡️ improve result🤯 arxiv.org/abs/2311.13171 github.com/zipnn/zipnn

ComPEFT: Compression for Communicating Parameter Efficient Updates...

Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in...

LClchoshen.bsky.socialJul 23, 2024 8:43am

Can we better understand LoRAs? Apparently you don't need to train A (but you need B) arxiv.org/abs/2402.16842 We compress Lots of Loras (lol😅) and show you can serve a 1000 at a fraction of the cost, due to their weight similarities

Asymmetry in Low-Rank Adapters of Foundation Models

Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by...

LClchoshen.bsky.socialJul 23, 2024 8:41am

Why evaluate on huge datasets when a fast check would get you most of the way? arxiv.org/abs/2402.14992 arxiv.org/abs/2308.11696 e.g. (recent) evaluate on multi prompts

LClchoshen.bsky.socialJul 23, 2024 8:40am

ICML FOMO? I'll share papers from here In #ICML2024 ? Talk to me e.g. on Tinybenchmarks🐭 LoRA's weight characteristics (asymmetry)☯️ Model merging♻️ open human feedback🗣️ BabyLM👼 Details (or highlights of recent research):🤖

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts