HARMONIC: Cognitive and Control Collaboration in Human-Robotic Teams

3 minute presentation. Unmute for audio explanation.

Key Contributions

HARMONIC Architecture

A cognitive-robotic architecture, which enables a flexible integration of OntoAgent, a cognitive architecture with low-level robot planning and control using Behavior Trees.

Multi-Robot Planning

A cognitive strategy for multi-robot planning and execution with metacognition, communication, and explainability. This includes the robots' natural language understanding and generation, their reasoning about plans, goals, and attitudes, and their ability to explain the reasons for their own and others' actions.

Simulation Evaluation

Simulation experiments on a search task carried out by a robotic team, which showcases the human-robot team's interactions and ability to handle complex, real-world scenarios.

Overview

We introduce HARMONIC (Human-AI Robotic Team Member Operating with Natural Intelligence and Communication), a cognitive-robotic architecture that integrates the OntoAgent cognitive framework with general-purpose robot control systems applied to human-robot teaming (HRT). We also present a cognitive strategy for robots that incorporates metacognition, natural language communication, and explainability capabilities required for collaborative partnerships in HRT. Through simulation experiments involving a joint search task performed by a heterogeneous team of a UGV, a drone, and a human operator, we demonstrate the system's ability to coordinate actions between robots with heterogeneous capabilities, adapt to complex scenarios, and facilitate natural human-robot communication. Evaluation results show that robots using the OntoAgent architecture within the HARMONIC framework can reason about plans, goals, and team member attitudes while providing clear explanations for their decisions, which are essential prerequisites for realistic human-robot teaming.

An overview of the HARMONIC framework, showing the Strategic and Tactical components representing the high-level planning (System 2) and low-level execution (System 1), respectively.

OntoAgent in Strategic Layer (System - 2)

OntoAgent is a content-centric cognitive architecture designed for social intelligent agents through computational cognitive modeling. It dynamically acquires, maintains, and expands large-scale knowledge bases essential for perception, reasoning, and action.

Its memory structure has three components: a Situation Model for active concept instances, Long-Term Semantic Memory, and Episodic Memory. Goal-directed behavior is managed via a goal agenda and prioritizer that selects stored plans or initiates first-principles reasoning when needed.

Built on a service-based infrastructure, it integrates native and external capabilities such as robotic vision and text-to-speech. The pivotal element is OntoGraph, a knowledge base API that unifies knowledge representation through inheritance, flexible organization into "spaces," and efficient querying via a graph database view. This underpins robust natural language understanding and meaning-based text generation using a semantic lexicon to resolve complex linguistic phenomena, as well as ontological interpretation of visual inputs.

Individual Capabilities

Attention management
Perception interpretation
Utility-based and analogical decision-making, enhanced by metacognitive abilities
Supported by a cognitive architecture, OntoAgent
Prioritizing goals and plans
Executing and monitoring plans

Team-Oriented Capabilities

Natural language communication
Explaining decisions
Assessing decision confidence
Evaluating the trustworthiness of teammates
Placeholder
Placeholder

Behavior Trees in Tactical Layer (System - 1)

The tactical layer implements a blackboard through the State Manager which maintains condition and state variables. Sensory inputs are continuously written into this State Manager, where the attention module filters and packages perception data for the strategic layer. While the strategic layer runs asynchronously, real-time control is achieved through LiveBT accessing data from the State Manager in parallel, choosing appropriate policies and actions stored in the Action Manager. This structure enables robots to maintain responsive behavior while implementing higher-level commands.

Visual demonstration of the tactical component, showing low-level planning and execution in the HARMONIC framework.

Capabilities

Low-level planners and control algorithms for decision making.
Manages skills and policies.
Processes sensor inputs.
Planning motor actions to execute high-level commands received from the strategic layer (e.g., "pickup that screwdriver").
Handles reactive responses, such as collision avoidance.
Takes care of the needs of the robot.

Evaluation Results

View in full screen mode for clarity.

For further details on OntoAgent, please refer to the following books:

MIT Press, 2021

MIT Press, 2024

Acknowledgments

This research was supported in part by grant #N00014-23-1-2060 from the U.S. Office of Naval Research. Any opinions or findings expressed in this material are those of the authors and do not necessarily reflect the views of the Office of Naval Research.

References

N. Sharma, J. K. Pandey, and S. Mondal, "A review of mobile robots: Applications and future prospect," International Journal of Precision Engineering and Manufacturing, vol. 24, no. 9, pp. 1695–1706, 2023.
Y. Rivero-Moreno, S. Echevarria, C. Vidal-Valderrama, L. Pianetti, J. Cordova-Guilarte, J. Navarro-Gonzalez, J. Acevedo-Rodríguez, G. Dorado-Avila, L. Osorio-Romero, C. Chavez-Campos et al., "Robotic surgery: a comprehensive review of the literature and current trends," Cureus, vol. 15, no. 7, 2023.
E. M. Wetzel, J. Liu, T. Leathem, and A. Sattineni, "The use of boston dynamics spot in support of lidar scanning on active construction sites," in ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction, vol. 39. IAARC Publications, 2022, pp. 86–92.
Y. Tong, H. Liu, and Z. Zhang, "Advancements in humanoid robots: A comprehensive review and future prospects," IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 301–328, 2024.
A. Torreno, E. Onaindia, A. Komenda, and M. Štolba, "Cooperative multi-agent planning: A survey," ACM Computing Surveys (CSUR), vol. 50, no. 6, pp. 1–32, 2017.
L. Antonyshyn, J. Silveira, S. Givigi, and J. Marshall, "Multiple mobile robot task and motion planning: A survey," ACM Computing Surveys, vol. 55, no. 10, pp. 1–35, 2023.
M. Natarajan, E. Seraj, B. Altundas, R. Paleja, S. Ye, L. Chen, R. Jensen, K. C. Chang, and M. Gombolay, "Human-robot teaming: grand challenges," Current Robotics Reports, vol. 4, no. 3, pp. 81–100, 2023.
S. Nirenburg, T. Ferguson, and M. McShane, "Mutual trust in human-ai teams relies on metacognition," in Metacognitive Artificial Intelligence, H. Wei and P. Shakarian, Eds. Cambridge University Press, 2024.
M. McShane, S. Nirenburg, and J. English, Agents in the Long Game of AI: Computational Cognitive Modeling for Trustworthy, Hybrid AI. MIT Press, 2024.
S. Nirenburg, M. McShane, K. W. Goodman, and S. Oruganti, "Explaining explaining," arXiv preprint arXiv:2409.18052, 2024.
J. English and S. Nirenburg, "OntoAgent: implementing content-centric cognitive models," in Proceedings of the Annual Conference on Advances in Cognitive Systems, 2020.
M. McShane and S. Nirenburg, Linguistics for the Age of AI. MIT Press, 2021.
M. Colledanchise and P. Ögren, Behavior trees in robotics and AI: An introduction. CRC Press, 2018.
J. E. Laird, K. R. Kinkade, S. Mohan, and J. Z. Xu, "Cognitive robotics using the soar cognitive architecture." in CogRob@ AAAI, 2012.
F. E. Ritter, F. Tehranchi, and J. D. Oury, "Act-r: A cognitive architecture for modeling cognition," Wiley Interdisciplinary Reviews: Cognitive Science, vol. 10, no. 3, p. e1488, 2019.
M. Scheutz, J. Harris, and P. Schermerhorn, "Systematic integration of cognitive and robotic architectures," Advances in Cognitive Systems, vol. 2, pp. 277–296, 2013.
P. W. Schermerhorn, J. F. Kramer, C. Middendorff, and M. Scheutz, "DIARC: a testbed for natural human-robot interaction." in AAAI, vol. 6, 2006, pp. 1972–1973.
B. Horling and V. Lesser, "A survey of multi-agent organizational paradigms," The Knowledge engineering review, vol. 19, no. 4, pp. 281–316, 2004.
R. I. Brafman and C. Domshlak, "From one to many: Planning for loosely coupled multi-agent systems." in ICAPS, vol. 8, 2008, pp. 28–35.
C. Amato, G. Konidaris, G. Cruz, C. A. Maynor, J. P. How, and L. P. Kaelbling, "Planning for decentralized control of multiple robots under uncertainty," in 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp. 1241–1248.
C. Le Pape, "A combination of centralized and distributed methods for multi-agent planning and scheduling," in Proceedings., IEEE International Conference on Robotics and Automation. IEEE, 1990, pp. 488–493.
E. Schneider, E. I. Sklar, S. Parsons, and A. T. Özgelen, "Auction-based task allocation for multi-robot teams in dynamic environments," in Towards Autonomous Robotic Systems: 16th Annual Conference, TAROS 2015, Liverpool, UK, September 8-10, 2015, Proceedings 16. Springer, 2015, pp. 246–257.
C. Boutilier and R. I. Brafman, "Partial-order planning with concurrent interacting actions," Journal of Artificial Intelligence Research, vol. 14, pp. 105–136, 2001.
G. K. Soon, C. K. On, P. Anthony, and A. R. Hamdan, "A Review on Agent Communication Language," Lecture Notes in Electrical Engineering, vol. 481, pp. 481–491, 2019.
O. Lemon, "Conversational ai for multi-agent communication in natural language," AI Communications, vol. 35, no. 4, pp. 295–308, 2022.
H. H. Clark, Using language. Cambridge university press, 1996.
G. Klein, P. J. Feltovich, J. M. Bradshaw, and D. D. Woods, "Common ground and coordination in joint activity," Organizational simulation, vol. 53, pp. 139–184, 2005.
J. Allen, G. Ferguson, and A. Stent, "An architecture for more realistic conversational systems," in Proceedings of the 6th international conference on Intelligent user interfaces, 2001, pp. 1–8.
D. Roy, "Grounding words in perception and action: computational insights," Trends in cognitive sciences, vol. 9, no. 8, pp. 389–396, 2005.
P. Lindes, A. Mininger, J. R. Kirk, and J. E. Laird, "Grounding language for interactive task learning," in Proceedings of the First Workshop on Language Grounding for Robotics, 2017, pp. 1–9.
S. Nirenburg and V. Lesser, "Providing intelligent assistance in distributed office environments," ACM SIGOIS Bulletin, vol. 7, no. 2-3, pp. 104–112, 1986.
L. A. Dennis, M. Fisher, N. K. Lincoln, A. Lisitsa, and S. M. Veres, "Practical verification of decision-making in agent-based autonomous systems," Automated Software Engineering, vol. 23, pp. 305–359, 2016.
M. Iannotta, D. C. Domínguez, J. A. Stork, E. Schaffernicht, and T. Stoyanov, "Heterogeneous full-body control of a mobile manipulator with behavior trees," arXiv preprint arXiv:2210.08600, 2022.
M. Olsson, "Behavior trees for decision-making in autonomous driving," 2016.
S. Oruganti, R. Parasuraman, and R. Pidaparti, "KT-BT: a framework for knowledge transfer through behavior trees in multirobot systems," IEEE Transactions on Robotics, 2023.
S. Oruganti, R. Parasuraman, and R. Pidaparti, "IKT-BT: indirect knowledge transfer behavior tree framework for multi-robot systems through communication eavesdropping," arXiv preprint arXiv:2312.11802, 2023.
M. Colledanchise and L. Natale, "On the implementation of behavior trees in robotics," IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5929–5936, 2021.
D. Kahneman, Thinking, fast and slow. Macmillan, 2011.
S. Oruganti, R. Parasuraman, and R. Pidaparti, "Impact of heterogeneity in multi-robot systems on collective behaviors studied using a search and rescue problem," in 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). IEEE, 2020, pp. 290–297.