We are sample inefficient, well "we" are not, but our models are. What are we missing, the use of non-text grounding? Architecture? Curriculum? babylm.github.io Join babyLM challenge , pretrain with 100M tokens
Want to use the current ~3M chats in the open? Or perhaps give back, two clicks and you share your chats: sharelm.github.io More on open(!) feedback soon
One thing I am recently especially fascinated about is feedback, models that improve through interaction. We recently showed that chats already contain feedback and we can mine it (170K) huggingface.co/datasets/sha...
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
If models are aligned we can combine them, you probably know TIES. Soon approach to deal with differences between loras in merging. Until then, do you have ideas on how to recycle models? Select models? Merge? Join the neurips competition: llm-merging.github.io
You got till here? Great, share some of your thoughts and questions or like🥰
Following, we show that loras are not parameter efficient Take a lora➡️throw 80% parameters➡️make it binary➡️ improve result🤯 arxiv.org/abs/2311.13171github.com/zipnn/zipnn
Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in...
Can we better understand LoRAs? Apparently you don't need to train A (but you need B) arxiv.org/abs/2402.16842 We compress Lots of Loras (lol😅) and show you can serve a 1000 at a fraction of the cost, due to their weight similarities
Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by...
Why evaluate on huge datasets when a fast check would get you most of the way? arxiv.org/abs/2402.14992arxiv.org/abs/2308.11696 e.g. (recent) evaluate on multi prompts