BLUE

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts

LClchoshen.bsky.socialOct 2, 2024 10:43pm

It's not a lower bound because if you are willing to do enough compute you can do better than SGD for training models its inefficient but possible. The paper claims it is very similar to Newton's method (based on 2nd order so gradient of gradient)

LClchoshen.bsky.socialOct 2, 2024 3:59pm

So, why do we care? Because we don't know any algorithm that computes only the gradient and converges that fast Transformers learn with their layers an algorithm that is better The paper has more theoretical and other measures to compare it to the newton's but the gist is up IMO

LClchoshen.bsky.socialOct 2, 2024 3:58pm

What they see is that if you take every layer to be comparable to steps of an algorithm, the layers succeed much faster than SGD, note that they are even curved down, not up like SGD (Super linearity to the rescue🦸‍♀️)

LClchoshen.bsky.socialOct 2, 2024 3:57pm

So how do you compare transformers to classic algorithms? You train them to do ICL on simple closed problems we understand and know how to solve in other ways (e.g. SGD or second-order derivatives like Newton's iterative) For example, compute a linear function from examples seen

LClchoshen.bsky.socialOct 2, 2024 3:56pm

In alphaxiv.org/pdf/2310.17086 they discuss what is internally learnt when you give a model examples and it manages to understand how to act upon it. also known as ICL

alphaXiv

Comment directly on top of arXiv papers.

LClchoshen.bsky.socialOct 2, 2024 3:55pm

It was claimed in-context-learning (ICL) is doing SGD inside the transformer layers. A new finding shows this is not possible. They must be doing something BETTER In fact, exponentially better than SGD, so second-order methods?🧵 🤖

alphaXiv

Comment directly on top of arXiv papers.

LClchoshen.bsky.socialSep 26, 2024 10:39am

I wish, in these days there are many days I don't get to read it, especially busy times.

LClchoshen.bsky.socialSep 23, 2024 12:46pm

We are all used by now to delve into topics full of LLM influence. But, did you know we started speaking like that too? In other words, LLMs drive language change, our language, not only the copy-pasted text. alphaxiv.org/abs/2409.13686 #LLM #LLMs #ML #machinelearning #nlproc #language #change 🤖

LClchoshen.bsky.socialSep 16, 2024 6:34pm

Want to incorporate your benchmark into BenchBench? Make a PR skeptical about the idea of BenchBench? comment! Details? Read: www.alphaxiv.org/abs/2407.13696 bsky.app/profile/lcho...

Do These LLM Benchmarks Agree? Fixing Benchmark Evaluation with BenchBench

Recent advancements in Language Models (LMs) have catalyzed the creation of multiple benchmarks, designed to assess these models' general capabilities. A crucial task, however, is assessing the...

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts