[D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point Submitted by CPOOCPOS t3_yql3wl on November 9, 2022 at 2:52 PM in MachineLearning 35 comments 38
onedertainer t1_ivp6xgj wrote on November 9, 2022 at 4:31 PM This sounds like maybe mini-batch gradient descent, where you use the average or sum of a batch of points, or maybe something like Adam, where you use averaging over epochs to give your gradient descent some "momentum". Permalink 2
Viewing a single comment thread. View all comments