Automated Machine Learning has become essential in data-driven decision-making, allowing domain experts to use machine learning without requiring considerable statistical knowledge. Nevertheless, a major obstacle that many current AutoML systems encounter is the efficient and correct handling of multimodal data. There are currently no systematic comparisons between different information fusion approaches and no generalized frameworks for multi-modality processing; these are the main obstacles to multimodal AutoML. The significant resource consumption of Multimodal Neural Architecture Search (NAS) hinders the effective construction of pipelines.
Addressing this challenge, researchers from Eindhoven University of Technology have introduced a novel method that leverages the power of pre-trained Transformer models, a proven success in various domains such as Computer Vision and Natural Language Processing. This innovative approach holds promise for revolutionizing the field of Automated Machine Learning.
This study thoroughly answers the two problems with AutoML’s multimodal data processing: integrating pre-trained Transformer models effectively and minimizing reliance on costly NAS approaches. An improvement in AutoML for dealing with complicated data modalities, including tabular-text, text-vision, and vision-text-tabular configurations, the proposed method simplifies and guarantees the efficiency and adaptability of multimodal ML pipelines. A flexible search space (pipeline) for multimodal data is designed, pre-trained models are strategically incorporated into the pipeline topologies, and warm-starting for SMAC using metadata from previous evaluations is implemented.
The researchers aimed to enable AutoML across unimodal and multimodal data by integrating pre-trained (Transformer) models into AutoML systems. To address the challenge of multimodal data processing, a CASH issue, which stands for Combined Algorithm Selection and Hyperparameter Optimization, is created. This issue is crucial in achieving optimal performance in AutoML. It involves fine-tuning the hyperparameters of learning algorithms that are part of a set. This set contains both classical and pre-trained deep models, and by addressing this issue, it is possible to ensure that the AutoML system is efficient and adaptable across different data modalities.
Using datasets from the tabular-text, text-vision, and tabular-text-vision modalities, task-specific variations of multimodal pipeline designs built using a particular pipeline structure are assessed. Researchers also tested these different pipeline designs on tasks such as VQA, Image Text Matching (ITM), regression, and classification. Three distinct pipeline versions, tailored to other modalities and tasks, make up their platform.
A meta-dataset is built by recording scalar performances for each of the three pipeline variations mentioned above across a set of tasks, including classification, regression, ITM, and VQA tasks. This collection was chosen after the pipeline variants had been designed. In its simplest form, a meta-dataset is a nested Python dictionary object; its keys are the names of hyperparameters or algorithms, and its values are the numerical or categorical values of the recorded experimental data. In addition to the names of the conventional ML models used in the pipeline and the pre-trained model, meta-dataset keeps track of these names in string format.
AutoML technology can only synthesize effective machine-learning pipelines after first constructing the configuration space. The Sequential Model-Based Optimization (SMBO) approach uses it as a search space. It contains hierarchically structured components, including pre-trained models, feature processors, and classical ML models.
The findings on text-vision tasks using datasets such as Flickr30k and SBU Image Captioning show that the framework quickly converges to optimal configurations across different modalities. Results from 23 different datasets show that the proposed methodology consistently produces high-quality multimodal pipeline designs while staying within computational limits, as evidenced by the high NAUC and ALC scores. In time-limited circumstances, the comparisons with classic NAS methods show that the new framework is more efficient, highlighting the strengths of warm-starting and partial dependence on NAS and areas that could be improved. Following the framework’s success in resource-limited situations, it is necessary to do further research and validate it in varied environments.
The team acknowledges their work’s limits and addresses them in the proposed AutoML framework by using pre-trained models with their weights frozen using a warm-start technique. Compared to cold-starting, which uses random initial configurations, warm-starting uses informed configurations generated from prior information to initiate the optimization process in AutoML’s CASH problem. The term’ warm-starting’ refers to using previous results or data from related projects to speed up the current optimization job, reducing the time and computing resources needed to find the best solution. In this context, it means that during optimization, any changes in performance will be due to tweaks to hyperparameters and not changes to the model’s (pre-trained) weights, thereby ensuring that the model’s learned representations are not lost during the optimization process.
The researchers investigate the effect of these hyperparameters on the performance of a static, pre-trained model. Instead of tweaking the weights of the pre-trained models, they evaluate how various hyperparameter settings take advantage of them to create and use latent representations of data in vision, text, or mixed modalities. Using this strategy, they guarantee clear attribution in these findings by isolating performance changes to hyperparameter impacts.
To keep up with the ever-changing needs of AutoML solutions, future work will improve the framework’s capabilities and broaden its application to different scenarios, such as parameter-space sampling.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 47k+ ML SubReddit
Find Upcoming AI Webinars here
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.