Reinforcement Learning – A Reference(jakubhalmes.substack.com) |
Reinforcement Learning – A Reference(jakubhalmes.substack.com) |
Problem: SARSA pushes q-values towards the current policy, but ideally we'd want optimal values. Solution: Use the best action in TD-target calculation -> Q-learning
Perhaps someone else will find it helpful!
Only wish you publicised it before the exam haha :-)
492982