Video games have become a proving ground for AIs and Uber has shown how its new type of reinforcement learning has succeeded where others have failed.
Some of mankind’s most complex games, like Go, have failed to challenge AIs from the likes of DeepMind. Reinforcement learning trains algorithms by running scenarios repeatedly with a ‘reward’ given for successes, often a score increase.
Two classic games from the 80s – Montezuma’s Revenge and Pitfall! – have thus far been immune to a traditional reinforcement learning approach. This is because they have little in the way of notable rewards until later in the games.
Applying traditional reinforcement learning typically results in a failure to progress out the first room in Montezuma’s Revenge, while in Pitfall! it fails completely.
One way researchers have attempted to provide the necessary rewards to incentivise the AI is by adding them in for exploration, what’s called ‘intrinsic motivation’. However, this approach has shortcomings.
“We hypothesize that a major weakness of current intrinsic motivation algorithms is detachment,” wrote Uber’s researchers. “Wherein the algorithms forget about promising areas they have visited, meaning they do not return to them to see if they lead to new states.”
Uber’s AI research team in San Francisco developed a new type of reinforcement learning to overcome the challenge.
The researchers call their approach ‘Go-Explore’ whereby the AI will return to a previous task or area to assess whether it yields a better result. Supplementing with human knowledge to guide it towards notable areas sped up its progress dramatically.
If nothing else, the research provides some comfort us feeble humans are not yet fully redundant and the best results will be attained by working hand-in-binary with our virtual overlords.
“Uber’s AI beats troublesome games with new type of reinforcement learning”