Communication through natural language is crucial to machine intelligence [9]. The recent progress in computational language models (LMs) has enabled strong performance on tasks with limited interaction, like question-answering and procedural text understanding [10]. Recognizing that interactivity is an essential aspect of communication, the community has turned its attention towards training and evaluating agents in interactive fiction (IF) environments, like text-based games, which provide a unique testing ground for investigating the reasoning abilities of LMs and the potential for Artificial Intelligence (AI) agents to perform multi-step real-world tasks in a constrained environment. For instance, in Figure 1, an agent must pick a fruit in the living room and place it in a blue box in the kitchen. In these games, agents navigate complex environments using text-based inputs, which demands a sophisticated understanding of natural language and strategic decision-making from AI agents. To succeed in these games, agents must manage their knowledge, reason, and generate language-based actions that produce desired and predictable changes in the game world.
Prior work has shown that Reinforcement Learning- and Language Model-based agents struggle to reason about or to explain science concepts in IF environments [1], which raises questions about these models’ ability to generalize to unseen situations beyond what has been observed during training [2]. For example, while tasks such as ‘retrieving a known substance’s melting (or boiling) point’ may be relatively simple, ‘determining an unknown substance’s melting (or boiling) point in a specific environment’ can be challenging for these models. To improve generalization, it may be effective to incorporate world knowledge, e.g., about object affordances, yet no prior work has investigated this direction. In addition, existing models struggle to learn effectively from environmental feedback. For instance, when examining the conductivity of a specific substance, the agent must understand that it has already obtained the necessary wires and the particular substance so that it then proceeds to locate a power source. Therefore, there is a need for a framework that can analyze and evaluate the effectiveness of different types of knowledge and knowledge-injection methods for text-based game agents.
Our paper, “Knowledge-enhanced Agents for Interactive Text Games,” introduces a novel framework to enhance AI agents’ performance in these IF environments.
Published Version: https://dl.acm.org/doi/10.1145/3587259.3627561
We are proud to announce that our paper has been awarded the Best Student Paper at the KCAP 2023 Conference, a testament to our team’s innovative research and dedication. 🏆🏆🏆
Our work introduces a unique framework to augment AI agents with specific knowledge. The framework comprises two key components:
- Memory of Correct Actions (MCA): This feature enables AI agents to remember and leverage past correct actions. The agent can formulate more effective strategies and avoid repetitive mistakes by maintaining a memory of what has worked before. MCA is determined by the environment feedback. If an action yields a reward, then it is considered correct. Therefore correct actions cannot be fed to the agent initially, but are instead stored in memory as the agent progresses through the (train/test time) episode.
- Affordance Knowledge (Aff): Understanding the potential interactions with objects in the game world is crucial. We expect that affordances can help models learn better by listing the possible interactions with the objects around them. Unlike historical knowledge, the environment does not provide the affordances, and they need to be retrieved from external sources. For this purpose, we use ConceptNet and obtain its capableOf and usedFor relations for the objects in a given IF game episode.
We implemented this framework in two AI agent architectures:
- Online Policy Optimization through Rewards (RL Methods)
- Single-step Offline Prediction (LM Methods)
Pure RL-based Model — DRRN [3] (Fig. 2)
The baseline DRRN model uses only the inputs of observation, inventory, and task description to compute Q-values for each action. To enhance the DRRN baseline, we have injected external knowledge into the model and created three new variations of DRRN:
- aff: Using a distinct GRU encoding layer, we introduce the affordances of the objects presented in the inputs to the baseline model.
- mca: A separate GRU encoding layer is utilized in this model to pass all previously correct actions to the baseline model.
- aff ⊕ mca: The encoding of this architecture is comprised of both the agent’s previous correct actions and the affordance as distinct components.
RL-enhanced KG Model — KG-A2C [4] (Fig. 3)
As baseline, we use a modified version of KG-A2C, where we utilize a single golden action sequence provided by the environment as the target, even though there may exist multiple possible golden sequences. We found this target to perform better than the original target of predicting a valid action. We devise the following knowledge-injection strategies to incorporate
memory of correct actions and affordance knowledge for KG-A2C:
- mca: On top of the baseline, we incorporate all previously correct
actions by using a separate GRU encoding layer and concatenate the
output vector along with other output representations. - aff: The KG component in the KG-A2C model provides us with a convenient way to add more knowledge. In particular, we directly add the affordance knowledge into the KG as additional triples on top of the
baseline model. For example, given the existing relation in the KG
(living room, hasA, apple) we can add the affordance relation: (apple,
usedFor, eating). In this way, the KG encoding network can produce
a more meaningful representation of the game state and potentially
guide the model to produce better actions. In our experiments, we
compare this approach to adding affordance knowledge using a
separate GRU encoding layer, similar to the DRRN case. - aff ⊕ mca: We include both affordances in the KG and the memory of all
previous correction actions with a separate GRU encoding layer.
Pre-trained LM — RoBERTa [5] (Fig. 4)
Here we view the task as multiple-choice QA. At each step, the current game state is treated as the question and must predict the next action from a set of candidates. Similar to RL agents, the model is given the environment observation (𝑜𝑏𝑣), inventory (𝑖𝑛𝑣), and task description (𝑑𝑒𝑠𝑐) at every step. Then we concatenate it with each action and let the LM select the action with the highest score. Given the large set of possible actions, we only randomly select 𝑛=4 distractor actions during training to reduce the computational burden, the LM is trained with cross-entropy loss to select the correct action. At inference time, the model assigns scores for all valid actions, and we use top-p sampling for action selection to prevent it from being stuck in an action loop. We formalize three knowledge-injection strategies for the baseline RoBERTa model.
- mca: Here, we enable the LM to be aware of its past correct actions by incorporating an MCA that lists them as a string, appended to the original input. Due to token limitations of RoBERTa, we use a sliding window with size 𝐴=5, i.e., at each step, the model sees at most the past
𝐴 correct actions. - aff: We inject affordance knowledge into the LM by first adapting it on a subset of the Commonsense Knowledge Graph containing object utilities. We adapt the model via an auxiliary QA task following prior knowledge injection work [6]. We use pretraining instead of simple concatenation for input due to the substantial volume of affordance knowledge triples, which cannot be simply concatenated to the input of RoBERTa due to limited input length. Pre-training on affordances through an auxiliary QA task alleviates this challenge, while still enabling the model to learn the relevant knowledge. We then finetune our task model on top of the utility-enhanced model, as described in the baseline.
- aff ⊕ mca: This variation simply combines mca and aff.
Instruction-tuned LM — Flan T5 [7][8] (Fig. 5)
The Swift model inherently integrates the historical context of the preceding ten actions. Notably, in contrast to the three previously examined models that exclusively consider the history of the last ten correct actions, the Swift model adheres to its original design by encompassing the entire history of the ten previous actions. To establish a comparable baseline model to the methodology applied in the preceding three architectures, we omit the action history from the Swift model. The unaltered variation of Swift is herein denoted as the mca version. Additionally, incorporation of affordance into the baseline model yields the aff model. Similarly, integration of affordances within the mca version led to the formation of the aff ⊕ mca model. These affordances are introduced into the primary input sequence immediately following the inventory data and preceding information about visited rooms.
Environment: We have used ScienceWorld [1], a complex text-based virtual world presented in English. It features 10 interconnected locations and houses 218 unique objects, including various items from instruments and electrical components to plants, animals, and everyday objects like furniture and books. The game offers a rich array of interactions, with 25 high-level actions and up to 200,000 possible combinations per step, though only a few are practically valid. ScienceWorld has 10 tasks with a total set of 30 sub-tasks. Due to the diversity within ScienceWorld, each task functions as an individual benchmark with distinct reasoning abilities, knowledge requirements, and varying numbers of actions needed to achieve the goal state. Moreover, each sub-task has a set of mandatory objectives that need to be met by any agent (such as focusing on a non-living object and putting it in a red box in the kitchen). For experimentation purposes, we selected a single representative sub-task from each of the 10 tasks. Task details are mentioned in Appendix (at the end of this article).
Rewards and Scoring System: The reward system in ScienceWorld is designed to guide the agent towards preferred solutions. The environment provides a numeric score and a boolean indicator of task completion for every action performed. An agent can take up to 100 steps (actions) in each episode. The final score, ranging between 0 and 100, reflects how well the agent achieves the episode goal and its sub-goals. An episode concludes, and the cumulative score is calculated when the agent completes the task or reaches the 100-step limit.
- Knowledge injection helps agents in text-based games — In 34 out of 40 cases, our knowledge injection strategies improve over the baseline models.
- Affordance knowledge is more beneficial than the memory of correct actions — Affordance models obtain the best results in 15 cases, followed by including MCA (8 cases). Including both knowledge types together led to the best results in 11 cases
- In terms of the overall impact across tasks, the LM variants, RoBERTa and Swift, benefit the most on average from including affordances, leading to a relative increase of 48% and 8% respectively, over the baselines. An example is illustrated in Fig. 6, where LM models are greatly benefitted from affordance addition.
- Variable effect across tasks depends on the task relevance of the injected knowledge — The variable effect across tasks was frequently due to the relevance of the injected knowledge to the task at hand, with certain tasks (e.g., electricity) benefiting more from the injection.
- Injecting affordances is most effective via KGs; incorporating them as raw inputs increased the learning complexity for the models — We explore multiple variations of injecting affordance knowledge into KG-A2C (Fig. 7): by adding it as input into the observation, inventory, and description, creating a separate GRU encoding layer for affordance, and adding affordance to the KG itself. We evaluate the performance of each method on three sub-tasks: easy, medium, and hard.
Our research represents a significant stride toward more sophisticated AI agents. By equipping them with the ability to learn from past actions and understand their environment deeply, we pave the way for AI that plays games and interacts intelligently and intuitively in various aspects of our lives. The framework can be extended to other AI applications, such as virtual assistants or educational tools, where understanding and interacting with the environment is crucial.
Few-shot prompting of large LMs has recently shown promise on reasoning tasks, as well as clear benefits from interactive communication and input clarification. Exploring their role in interactive tasks, either as solutions that require less training data or as components that can generate synthetic data for knowledge distillation to smaller models, is a promising future direction.
If you like our work, please cite it 😁
@inproceedings{chhikara,
author = {Chhikara, Prateek and Zhang, Jiarui and Ilievski, Filip and Francis, Jonathan and Ma, Kaixin},
title = {Knowledge-Enhanced Agents for Interactive Text Games},
year = {2023},
doi = {10.1145/3587259.3627561},
booktitle = {Proceedings of the 12th Knowledge Capture Conference 2023},
pages = {157–165},
numpages = {9},
series = {K-CAP '23}
}
[1] Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. 2022. ScienceWorld: Is your Agent Smarter than a 5th Grader? EMNLP (2022).
[2] Peter Jansen, Kelly J. Smith, Dan Moreno, and Huitzilin Ortiz. 2021. On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings. In Proceedings of EMNLP.
[3] Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. 2016. Deep Reinforcement Learning with a Natural Language Action Space. In Proceedings of ACL.
[4] Prithviraj Ammanabrolu and Matthew Hausknecht. 2020. Graph Constrained Reinforcement Learning for Natural Language Action Spaces. In ICLR.
[5] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. (2019).
[6] Filip Ilievski, Alessandro Oltramari, Kaixin Ma, Bin Zhang, Deborah L McGuinness, and Pedro Szekely. 2021. Dimensions of commonsense knowledge. Knowledge-Based Systems 229 (2021), 107347.
[7] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al . 2022. Scaling instruction-finetuned language models.
[8] Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2023. SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks.
[9] Noam Chomsky 2014. Aspects of Theory of Syntax. Vol. 11. MIT press.
[10] Yifan Jiang, Filip Ilievski and Kaixin Ma. 2023. Transferring Procedural Knowledge across Commonsense Tasks. In ECAI
Task Descriptions
- Task 1 — Matter: Your task is to freeze water. First, focus on the substance. Then, take actions that will cause it to change its state of
matter. - Task 2 — Measurement: Your task is to measure the melting point of chocolate, which is located around the kitchen. First, focus on the thermometer. Next, focus on the chocolate. If the melting point of chocolate is above -10.0 degrees, focus on the blue box. If the melting point of chocolate is below -10.0 degrees, focus on the orange box. The boxes are located around the kitchen.
- Task 3 — Electricity: Your task is to turn on the red light bulb by powering it using a renewable power source. First, focus on the red light bulb. Then, create an electrical circuit that powers it on.
- Task 4 — Classification: Your task is to find a(n) non-living thing. First, focus on the thing. Then, move it to the red box in the kitchen.
- Task 5 — Biology I: Your task is to grow a apple plant from seed. Seeds can be found in the kitchen. First, focus on a seed. Then, make changes to the environment that grow the plant until it reaches the reproduction life stage.
- Task 6 — Chemistry: Your task is to use chemistry to create the substance ‘salt water’. A recipe and some of the ingredients might be found near the kitchen. When you are done, focus on the salt water.
- Task 7 — Biology II: Your task is to find the animal with the longest life span, then the shortest life span. First, focus on the animal with the longest life span. Then, focus on the animal with the shortest life span. The animals are in the ’outside’ location.
- Task 8 — Biology III: Your task is to focus on the 4 life stages of the turtle, starting from earliest to latest.
- Task 9 — Forces: Your task is to determine which of the two inclined planes (unknown material C, unknown material H) has the most
friction. After completing your experiment, focus on the inclined plane with the most friction. - Task 10 — Biology IV: Your task is to determine whether blue seed color is a dominant or recessive trait in the unknown E plant. If the trait is dominant, focus on the red box. If the trait is recessive, focus on the green box.
ScienceWorld Gameplay Example
Task: 4 (find a non-living thing)
Variation: 239 (DRRN baseline)
Description: Your task is to find a(n) non-living thing. First, focus on the thing. Then, move it to the purple box in the workshop.