KR2024Proceedings of the 21st International Conference on Principles of Knowledge Representation and ReasoningProceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning

Hanoi, Vietnam. November 2-8, 2024.

Edited by

ISSN: 2334-1033
ISBN: 978-1-956792-05-8

Sponsored by
Published by

Copyright © 2024 International Joint Conferences on Artificial Intelligence Organization

Learning Robust Reward Machines from Noisy Labels

  1. Roko Parać(Imperial College London)
  2. Lorenzo Nodari(University of Brescia)
  3. Leo Ardon(Imperial College London)
  4. Daniel Furelos-Blanco(Imperial College London)
  5. Federico Cerutti(University of Brescia, Cardiff University)
  6. Alessandra Russo(Imperial College London)

Keywords

  1. Symbolic Reinforcement Learning-General
  2. Neuro-Symbolic AI-General
  3. Autonomous Decision Making-General
  4. Combination of different Methods (probabilistic, temporal, qualitative, ...)-General

Abstract

This paper presents PROB-IRM, an approach that learns

robust reward machines (RMs) for reinforcement learning

(RL) agents from noisy execution traces. The key aspect

of RM-driven RL is the exploitation of a finite-state ma-

chine that decomposes the agent’s task into different sub-

tasks. PROB-IRM uses a state-of-the-art inductive logic pro-

gramming framework robust to noisy examples to learn RMs

from noisy traces using the Bayesian posterior degree of be-

liefs, thus ensuring robustness against inconsistencies. Piv-

otal for the results is the interleaving between RM learning

and policy learning: a new RM is learned whenever the RL

agent generates a trace that is believed not to be accepted by

the current RM. To speed up the training of the RL agent,

PROB-IRM employs a probabilistic formulation of reward

shaping that uses the posterior Bayesian beliefs derived from

the traces. Our experimental analysis shows that PROB-IRM

can learn (potentially imperfect) RMs from noisy traces and

exploit them to train an RL agent to solve its tasks success-

fully. Despite the complexity of learning the RM from noisy

traces, agents trained with PROB-IRM perform comparably

to agents provided with handcrafted RMs.