Adam revisited: a weighted past gradients perspective