BLUE
Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts
LClchoshen.bsky.social

It's not a lower bound because if you are willing to do enough compute you can do better than SGD for training models its inefficient but possible. The paper claims it is very similar to Newton's method (based on 2nd order so gradient of gradient)

0

Profile banner
LC
Leshem Choshen
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
148 followers107 following317 posts