Reinforcement Learning as a fine-tuning paradigm(ankeshanand.com) |
Reinforcement Learning as a fine-tuning paradigm(ankeshanand.com) |
"The website has been blocked as per order of Ministry of Electronics and Information Technology under IT Act, 2000."
I don't see why this is important.
> It should have (and has shown to have) better scaling laws
is a statement based on two anecdotes but I don't see a compelling reason why this should be the case in general.
Active learning approaches are not mentioned even though they allow incorporating human feedback during the fine-tuning process and this can be done with a purely supervised approach.
IMO the last point is the only compelling one : having for example agents that can browse the web during learning could open a lot of possibilities. It would have been interesting to develop this last point more : what are the current difficulties in training such agents?