With the rise of Large Language Models (LLMs) in recent years, generative AI has made significant strides in the field of language processing, showcasing impressive abilities in a wide array of tasks. Given their potential in solving complex tasks, researchers have made quite a number of attempts to apply these models in the field of drug discovery to optimize the task. However, molecule optimization is one critical aspect of drug discovery that the LLMs have failed to affect significantly.
The existing methods generally focus on the patterns in the chemical structure provided by the data instead of leveraging the expert’s feedback and experience. This poses a problem as the drug discovery pipeline involves incorporating feedback from domain experts to refine the process further. In this work, the authors have tried to address the gaps in previous works by focusing on human-machine interaction and leveraging the interactivity and generalizability of powerful LLMs.
Researchers from Tencent AI Lab and Department of Computer Science, Hunan University released MolOpt-Instructions, which is a large instruction-based dataset for fine-tuning LLMs on molecule optimization tasks. This dataset has an adequate amount of data covering tasks associated with molecule optimization and ensures similarity constraints and a substantial difference in properties between molecules. Additionally, they have also proposed DrugAssist, a Llama-2-7B-Chat-based molecule optimization model capable of performing optimization interactively through human-machine dialogue. Through the dialogues, experts can further guide the model and optimize the initially generated results.
For evaluation, the researchers compared DrugAssist with two previous molecule optimization models and with three LLMs on metrics like solubility and BP and success rate and validity, respectively. As per the results, DrugAssist constantly achieved promising results in multi-property optimization and maintained optimized molecular property values within a given range.
Furthermore, the researchers demonstrated the exceptional capabilities of DrugAssist through a case study as well. Under the zero-shot setting, the model was asked to increase the values of two properties, BP and QED, by at least 0.1 simultaneously, and the model was successfully able to achieve the task even when it was exposed to the data during training only.
Additionally, DrugAssist also successfully increased the logP value of a given molecule by 0.1, even though this property was not included in the training data. This showcases the good transferability of the model under zero-shot and few-shot settings, giving the users an option to combine individual properties and optimize them simultaneously. Lastly, in one of the interactions, the model generated a wrong answer by providing a molecule that did not meet the requirements. However, it corrected its mistake and provided a correct response based on human feedback.
In conclusion, DrugAssist is a molecule optimization model based on the Llama-2-7B-Chat model and is capable of interacting with humans in real time. It demonstrated exceptional results in single as well as multi-property optimizations and showed great transferability and iterative optimization capabilities. Lastly, the authors have aimed to improve the capabilities of the model further through multimodal data handling, which will significantly enhance and optimize the process of drug discovery.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.