Reinforcement learning (RL) technology is going to do great things in the near future. This can be something like creating new tools or building intelligent agents that are trained to perform better than humans (at least in an economic sense) on multiple tasks.
But this idea is nothing new, in fact, it has been around for more than a century. It was first observed in nature by psychologist Edward Thorndike over a hundred years ago. He conducted an experiment where he placed cats inside boxes where they only had one way to escape, and that was by pressing a lever.
Although the cats first kept meowing and pacing, they finally stepped on the lever by chance. Once they learned how to associate this behavior with the desired outcome, they escaped rapidly. This is similar to some of the earliest research into artificial intelligence (AI).
At the time researchers believed that this could be replicated by machines and went on to build machines to do it. Over 65 years ago, one of the founding fathers of AI, Marvin Minsky (then a student at Harvard), built a machine that used the same simple form of reinforcement learning that mimicked rats navigating through a maze.
This was a simple machine called the Stochastic Neural Analogy Reinforcement Computer (SNARC) which was made up of the following:
- Dozens of tubes
- Clutches (that could simulate the behavior of 40 synapses and neurons)
In this scenario, when a simulated rat ran through a virtual maze, some synaptic connections were made strong enough and increased to reinforce the underlying behavior. But soon after this, work in this area slowed down significantly for several decades.
Fast forward to 1992, a researcher at IBM, namely Gerald Tesauro, developed a program that used this same technique to play Backgammon. In time, it became good enough to rival some of the best human players.
Although this was a landmark event in AI, RL became difficult to scale when it came to tackling complex problems. So progress in the field stalled yet again, but this time it didn’t take as long to get going again.
In 2016, Alphabet’s (Google) DeepMind came out with AlphaGo, a program that was trained to use RL to play the board game Go. AlphaGo got so good at playing the game that it was able to annihilate one of the best Go players of all time.
This turn of events generated an enormous amount of interest in RL and AI as it’s more or less impossible to build a Go-playing program to beat the best human player using traditional programming methods. Further, the game is also extremely complex that even accomplished players struggle, so you can’t exactly write code for it easily.
RL works because researchers have figured out how to get a computer to calculate the value that should be assigned to a certain event. So for example, it was able to keep track of each right turn or wrong turn that a rat might make while trying to make its way out of the maze.
So a right decision or a wrong one is stored as a value in a large table that is updated as the computer learns. While it was once impractical for big complicated tasks, it has now become possible through deep learning (DL). It’s a highly efficient way to identify patterns in the data regardless of whether it’s a maze or a Go board.
What are the Challenges Faced by Deep Reinforcement Learning?
The main challenge here is the sheer volume of problems and possible actions. Hypothetically we may only think about a couple of problems or maybe ten at most. But real-life applications require millions of possible actions which create major complexities in the Large Action Space before it’s rewarded.
This will be a major scientific and engineering problem that needs to be solved before we can see scalable methods to train multi-purpose agents to do anything you can imagine. This can range from building a car to sorting out your mail.
It can also be the technology that plans your retirement or plans your next vacation overseas. But getting it to do complex manufacturing processes and perfecting it will still be a problem as a result of the hidden intricacies.
But a solution to this problem could be allowing AI to figure out the best way to go about building complex things. Once they figure out how to resolve this issue, it should have huge commercial implications.
The Emergence of a Unified Interface
For us humans, our interface for the world is our arms, legs, eyes, and ears. The brain learns how to interpret signals and control the body. This is pretty much done in the same way for every human being.
The same rules apply for a united interface for an intelligent agent. So if there are many of them, it will be much easier to make them interact with the world and themselves. But building a bunch of unified interfaces for automated tasks with agents may not be the best idea. Instead, it might just be a multiplier of machine learning (ML) research (where progress has slowed down as a result of technical obstacles).
Further, connected RL agents enabled with computer vision systems will be able to quickly extract information from the real-world. This scenario is a lot like how we use mobile devices to work together with technology to teach from real-world experience.
So this means that you can essentially connect to millions of smartphones where people can teach machines to learn overnight by human actions that are repeated over and over again. But right now, the technology available isn’t advanced enough to achieve this.
A unified interface won’t actually be a single protocol or a single library. Instead, it will be a set of specifications that need to be met to allow an agent to seamlessly integrate into existing APIs (a lot like OpenAI).
RL is gaining a lot of attention lately as it works really well with autonomous driving, but it will take a while before it shows up on an industrial scale. While the whole idea gains traction and the industry goes through a period of acceleration, there is still an enormous amount of work to be done.