LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts
It's not a lower bound because if you are willing to do enough compute you can do better than SGD for training models its inefficient but possible. The paper claims it is very similar to Newton's method (based on 2nd order so gradient of gradient)
LC
Leshem Choshen
@lchoshen.bsky.social
148 followers107 following317 posts