BLUE

Stella Biderman

@stellaathena.bsky.social

I study large language models

193 followers11 following10 posts

SBstellaathena.bsky.socialDec 8, 2023 6:51pm

100% It's also noteworthy that the decision to not release any details about the training data or model architecture allows them to avoid citing the work in that space which has been disproportionately done by non-profit and academic researchers.

MMmmitchell.bsky.socialDec 7, 2023 5:55pm

The recent Google "Gemini" work doesn't cite model cards or datasheets where they are clearly relevant. Allow me to take this moment to talk about how to fix patterns of exclusion in tech, which disproportionately affect (eg) women: 1/

SBstellaathena.bsky.socialNov 8, 2023 12:31am

A propos of "Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models," this meme has been making rounds in the EleutherAI Discord. arxiv.org/abs/2311.00871

SBstellaathena.bsky.socialOct 21, 2023 1:30am

It's really wild when people say stuff like "academia doesn't matter any more, only the big labs with the most money do." Recent inventions by non-profit researchers have brought massive improvements in large scale models: - Alibi - Scaled RoPE - Flash Attention - Parallel attention and MLP layers

SBstellaathena.bsky.socialSep 29, 2023 2:28pm

This is your daily reminder that only three orgs have ever trained a LLM and released the model and full data: EleutherAI BigScience (non-OS license) and Together Computer. Small orgs like these make science possible in the face of industry power.

Stella Biderman

@stellaathena.bsky.social

I study large language models

193 followers11 following10 posts