BLUE

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts

LClchoshen.bsky.socialOct 2, 2024 10:43pm

It's not a lower bound because if you are willing to do enough compute you can do better than SGD for training models its inefficient but possible. The paper claims it is very similar to Newton's method (based on 2nd order so gradient of gradient)

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

148 followers107 following317 posts