Artificial intelligence (AI) has transformed traditional research, propelling it to unprecedented heights. However, it has a ways to go regarding other spheres of its application. A critical issue in AI is training models to perform causal reasoning. Traditional methods heavily depend on large datasets with explicitly marked causal relationships, which are often expensive and challenging to obtain. Researchers aim to find innovative approaches to train AI models to comprehend and apply causal reasoning using more accessible data sources. This problem is pivotal as it directly impacts the efficiency and accuracy of AI systems in understanding and reasoning about cause-and-effect relationships in various applications.
Existing AI models typically use vast datasets where causal relationships are explicitly indicated or inferred through statistical patterns. For instance, large language models (LLMs) like GPT-4 have demonstrated some capabilities in causal reasoning. However, these models often need help with unseen or complex causal structures. Current methods include direct intervention data or pre-training models on datasets rich in causal information. Despite these efforts, the limitations remain significant, especially regarding the models’ ability to generalize across different causal scenarios.
Researchers from Microsoft Research, IIT Hyderabad, and MIT have introduced a novel method called axiomatic training to tackle these challenges. This approach involves training models on multiple demonstrations of causal axioms or rules rather than relying solely on inductive biases or inferred data values. By exposing AI models to various examples of these hypotheses, the researchers aim to enhance the models’ ability to generalize causal reasoning to new and more complex scenarios. This method is particularly innovative as it shifts the focus from data-intensive training to a more principle-based approach.
The axiomatic training approach devised by the research team involves generating diverse training data that includes multiple demonstrations of a causal axiom. For example, the transitivity axiom is utilized, where if A causes B and B causes C, then A should cause C. To improve their generalization capabilities, the models were trained on linear causal chains with variations, including noise and reversed orders. This comprehensive training aims to enable models to apply learned axioms to larger and more intricate causal graphs, even those not encountered during training. The researchers designed different evaluation sets to test the models’ abilities, encompassing causal sequences with lengths beyond the training data and sequences with shuffled orders to assess structural understanding and application of the transitivity axiom to more complex networks.
The performance and results of this research are remarkable. A 67 million parameter transformer model, trained using axiomatic demonstrations, showed exceptional generalization capabilities. It could extend its understanding to longer causal chains, reversed sequences, and complex branching structures, even outperforming larger models like GPT-4 and Gemini Pro in specific tests. For instance, the model achieved an accuracy rate of 0.85 for standard chains and 0.78 for randomly flipped chains of lengths 14-15. These results highlight the model’s ability to handle unseen scenarios effectively. Furthermore, the model demonstrated competitive performance compared to GPT-4, with substantial accuracy in causal chains of sizes 7-13, surpassing other LLMs like Gemini Pro and Phi-3 in various tasks.
To conclude, the research emphasizes the potential of axiomatic training in enhancing AI models’ causal reasoning abilities. By training models on fundamental causal axioms, researchers demonstrated that AI could effectively navigate complex causal structures. This method offers a more efficient and scalable approach to teaching causal reasoning, potentially transforming how AI systems are trained for causal inference tasks. The success of this method indicates a promising direction for future research and applications in AI, highlighting the importance of principle-based training over traditional data-intensive methods.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter.
Join our Telegram Channel and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 46k+ ML SubReddit
Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.