Browsing: AI News

AI News July 26, 2024

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models

Artificial intelligence, particularly in training large multimodal models (LMMs), relies heavily on vast datasets that include sequences of images and text. These datasets enable the development…

AI News July 25, 2024

SF-LLaVA: A Training-Free Video LLM that is Built Upon LLaVA-NeXT and Requires No Additional Fine-Tuning to Work Effectively for Various Video Tasks

Video large language models (LLMs) have emerged as powerful tools for processing video inputs and generating contextually relevant responses to user commands. However, these models face…

AI News July 24, 2024

Visual Haystacks Benchmark: The First “Visual-Centric” Needle-In-A-Haystack (NIAH) Benchmark to Assess LMMs’ Capability in Long-Context Visual Retrieval and Reasoning

A significant challenge in the field of visual question answering (VQA) is the task of Multi-Image Visual Question Answering (MIQA). This involves generating relevant and grounded…

AI News July 23, 2024

ProcTag: A Data-Oriented AI Method that Assesses the Efficacy of Document Instruction Data

Effectively evaluating document instruction data for training large language models (LLMs) and multimodal large language models (MLLMs) in document visual question answering (VQA) presents a significant…

AI News July 21, 2024

DiT-MoE: A New Version of the DiT Architecture for Image Generation

Recently, diffusion models have become powerful tools in various fields, like image and 3D object generation. Their success comes from their ability to handle denoising tasks…

AI News July 19, 2024

From Diagrams to Solutions: MAVIS’s Three-Stage Framework for Mathematical AI

Large Language Models (LLMs) and their multi-modal counterparts (MLLMs) have made significant strides in advancing artificial general intelligence (AGI) across various domains. However, these models face…

AI News July 19, 2024

MMLongBench-Doc: A Comprehensive Benchmark for Evaluating Long-Context Document Understanding in Large Vision-Language Models

Document understanding (DU) focuses on the automatic interpretation and processing of documents, encompassing complex layout structures and multi-modal elements such as text, tables, charts, and images.…

AI News July 16, 2024

Exploring Robustness: Large Kernel ConvNets in Comparison to Convolutional Neural Network CNNs and Vision Transformers ViTs

Robustness is crucial for deploying deep learning models in real-world applications. Vision Transformers (ViTs) have shown strong robustness and state-of-the-art performance in various vision tasks since…

AI News July 15, 2024

RTMW: A Series of High-Performance AI Models for 2D/3D Whole-Body Pose Estimation

Whole-body pose estimation is a key component for improving the capabilities of human-centric AI systems. It is useful in human-computer interaction, virtual avatar animation, and the…

AI News July 14, 2024

A Decade of Transformation: How Deep Learning Redefined Stereo Matching in the Twenties

A fundamental topic in computer vision for nearly half a century, stereo matching involves calculating dense disparity maps from two corrected pictures. It plays a critical…

What's Hot

Radical Simplicity in Data Engineering | by Cai Parry-Jones | Jul, 2024

Why the Newest LLMs use a MoE (Mixture of Experts) Architecture

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models

Browsing: AI News

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models

SF-LLaVA: A Training-Free Video LLM that is Built Upon LLaVA-NeXT and Requires No Additional Fine-Tuning to Work Effectively for Various Video Tasks

Visual Haystacks Benchmark: The First “Visual-Centric” Needle-In-A-Haystack (NIAH) Benchmark to Assess LMMs’ Capability in Long-Context Visual Retrieval and Reasoning

ProcTag: A Data-Oriented AI Method that Assesses the Efficacy of Document Instruction Data

DiT-MoE: A New Version of the DiT Architecture for Image Generation

From Diagrams to Solutions: MAVIS’s Three-Stage Framework for Mathematical AI

MMLongBench-Doc: A Comprehensive Benchmark for Evaluating Long-Context Document Understanding in Large Vision-Language Models

Exploring Robustness: Large Kernel ConvNets in Comparison to Convolutional Neural Network CNNs and Vision Transformers ViTs

RTMW: A Series of High-Performance AI Models for 2D/3D Whole-Body Pose Estimation

A Decade of Transformation: How Deep Learning Redefined Stereo Matching in the Twenties

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

Radical Simplicity in Data Engineering | by Cai Parry-Jones | Jul, 2024

Why the Newest LLMs use a MoE (Mixture of Experts) Architecture

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models

Unlocking the Power of Hugging Face for NLP Tasks | by Ravjot Singh | Jul, 2024

Our Picks

Radical Simplicity in Data Engineering | by Cai Parry-Jones | Jul, 2024

Why the Newest LLMs use a MoE (Mixture of Experts) Architecture

MINT-1T Dataset Released: A Multimodal Dataset with One Trillion Tokens to Build Large Multimodal Models