Multitasking in artificial intelligence : what has changed?
deep learning; Considering the diversity of data modes as well as the multitude of tasks that are defined every day, it has developed as a highly fragmented structure until recently. This may have resulted in fragmentation and inefficiency of resources while maximizing the number of experiments carried out in the field. However, in this article, I would like to leave the past aside and talk about the change that is taking place today.
A single-task approach that has been painstakingly optimized to accomplish a single task and continues to dominate the deep learning space; it has to deal with and solve many problems such as overfitting, insufficient data, overly narrow problem definition and high sensitivity to changing conditions. That's why they struggle surprisingly with many tasks that we humans can do easily. I wonder if we are hiding data from artificial intelligence models that may be important for solving the problem while aiming to simplify the problem?
The Google Brain team, one of the important AI teams within Google, made an important determination when they were training 8 sub-models, each of which will be trained to perform one of 8 different tasks, in the MultiModel, which they shared their code with Tensor2Tensor in 2017: They are quite different from each other such as image, text and sound. While being trained to perform very different tasks such as speech recognition, object detection and translation with the data in the modes, the performance of the sub-models of the MultiModel increased even more contrary to the expectation that this situation would be negatively affected, and it was observed that especially the tasks with insufficient data benefited from this situation. The method here didn't make much noise at the time as it wasn't easily extendable and adaptable, but I believe it gave researchers a lot of encouragement when it comes to multitasking. Multitasking has become more popular with the BERT released in 2018, which can be easily fine-tuned for numerous natural language processing tasks, followed by dozens of BERT-like models. With BERT and similar models, the easy extensibility problem was solved, but the easy adaptability problem remained largely the same.
In 2020, it was possible to adapt all tasks as a text-to-text task with the T5 model. The clever prefix mechanism makes multitasking "embarrassingly simple" - to put it in English. In a program I attended on TRT Radio 1 , I said that what we should be excited about is T5, not GPT-3. As a matter of fact, the Multitask Unified Model (MUM) model, introduced at the keynote of this year's Google I/O, can process visual data while expanding the T5 architecture to 75 languages, and it is said that in the future, video and audio data can also be processed with MUM. Combining multilingualism, multimodality and multitasking, MUM can complete tasks by transferring the knowledge available in one language or one mode from another language or mode.This hypothesis, validated in 2017, will find practical application in MUM, helping users find answers to their questions with less Google searches for complex information extraction. But models like MUM, for which we have yet to find a way to gain common sense, need to be extensively tested against societal stereotypes, discriminatory statements, and any other bias.
Google has not yet put the model into use for these tests. On the other hand, the size of the model raises questions about its costs and carbon footprint during estimation due to the computational power it requires. In line with these concerns, it is possible that it will not be put into use at all. But whatever the outcome, it's clear that much remains to be done for AI models that are multilingual, multimodal, multitasking and just as common sense.
See you in our next article.