lecun大牛前一段时间发表了一段关于AI的观点:most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI. And that's just an obstacle we know about. What about all the ones we don't know about?
这段话对很多reinforcement learning的fans有点不舒服。看了下面两段话,还是有点信服的
It is hard to tell if he meant “cherry on the cake” as the Oxford Dictionary defines it: “Adesirable feature perceived as the finishing touch to something that is alreadyvery good”or he was just making a pointthat unsupervised learning is THE cake, everything else is add-on. Now, I can see why Reinforcement Learning would be a great add on to solve intelligence. We can teach computers through supervised learning, we can somewhat let them learn by themselves using (what we have of) unsupervised learning. Solving those two alone would allow us to create super intelligent agents, but we will still have to tell them what to learn, what to solve, and so on. Solving Reinforcement Learning allows us to “unleash” these smart agents to find out their own desires, follow their own dreams, pursue their own happiness. But hey, we are far from it. There is a lot of work ahead.
If we only use the reinforcement signal to guide training, then I agree with Yann LeCun that it is the cherry on the cake. Even worse: when using a global reinforcement signal that is not a known differentiable function of the representations (which is typically the case), there is a serious scaling problem in terms of the number of hidden units (or action dimensions) that can be trained with respect to that signal. The number of examples, random samples or trials of actions may have to grow at least linearly with the number of units in order to provide credit assignment of quality comparable to that obtained with back-propagation. If the action space is large, this is problematic. However, as Demis Hassabis said when Yann talked about the cake and cherry analogy, we should *also* do unsupervised learning, along with reinforcement learning. Then it becomes more credible that it can work on a large scale.最近看deepmind的一些最新论文,确实有把reinforcement和unserpersived结合的趋势。之前我一直认为reinforcement是能通过reward signal把representation学习出来,现在不再那么确定了。