Will Transformers Take Over Artificial Intelligence?
Imagine going to your local hardware store and seeing a new kind of hammer on the shelf. You’ve heard about this hammer: It pounds faster and more accurately than others, and in the last few years it’s rendered many other hammers obsolete, at least for most uses. And there’s more! With a few tweaks — an attachment here, a twist there — the tool changes into a saw that can cut at least as fast and as accurately as any other option out there. In fact, some experts at the frontiers of tool development say this hammer might just herald the convergence of all tools into a single device.
A similar story is playing out among the tools of artificial intelligence. That versatile new hammer is a kind of artificial neural network — a network of nodes that “learn” how to do some task by training on existing data — called a transformer. It was originally designed to handle language, but has recently begun impacting other AI domains.
The transformer first appeared in 2017 in a paper that cryptically declared that “Attention Is All You Need.” In other approaches to AI, the system would first focus on local patches of input data and then build up to the whole. In a language model, for example, nearby words would first get grouped together. The transformer, by contrast, runs processes so that every element in the input data connects, or pays attention, to every other element. Researchers refer to this as “self-attention.” This means that as soon as it starts training, the transformer can see traces of the entire data set.
Before transformers came along, progress on AI language tasks largely lagged behind developments in other areas. “In this deep learning revolution that happened in the past 10 years or so, natural language processing was sort of a latecomer,” said the computer scientist Anna Rumshisky of the University of Massachusetts, Lowell. “So NLP was, in a sense, behind computer vision. Transformers changed that.”
Transformers quickly became the front-runner for applications like word recognition that focus on analyzing and predicting text. It led to a wave of tools, like OpenAI’s Generative Pre-trained Transformer 3 (GPT-3), which trains on hundreds of billions of words and generates consistent new text to an unsettling degree.
The success of transformers prompted the AI crowd to ask what else they could do. The answer is unfolding now, as researchers report that transformers are proving surprisingly versatile. In some vision tasks, like image classification, neural nets that use transformers have become faster and more accurate than those that don’t. Emerging work in other AI areas — like processing multiple kinds of input at once, or planning tasks — suggests transformers can handle even more.
“Transformers seem to really be quite transformational across many problems in machine learning, including computer vision,” said Vladimir Haltakov, who works on computer vision related to self-driving cars at BMW in Munich.
Just 10 years ago, disparate subfields of AI had little to say to each other. But the arrival of transformers suggests the possibility of a convergence. “I think the transformer is so popular because it implies the potential to become universal,” said the computer scientist Atlas Wang of the University of Texas, Austin. “We have good reason to want to try transformers for the entire spectrum” of AI tasks.