Hanoi, Vietnam. November 2-8, 2024.
ISSN: 2334-1033
ISBN: 978-1-956792-05-8
Copyright © 2024 International Joint Conferences on Artificial Intelligence Organization
This paper presents PROB-IRM, an approach that learns
robust reward machines (RMs) for reinforcement learning
(RL) agents from noisy execution traces. The key aspect
of RM-driven RL is the exploitation of a finite-state ma-
chine that decomposes the agent’s task into different sub-
tasks. PROB-IRM uses a state-of-the-art inductive logic pro-
gramming framework robust to noisy examples to learn RMs
from noisy traces using the Bayesian posterior degree of be-
liefs, thus ensuring robustness against inconsistencies. Piv-
otal for the results is the interleaving between RM learning
and policy learning: a new RM is learned whenever the RL
agent generates a trace that is believed not to be accepted by
the current RM. To speed up the training of the RL agent,
PROB-IRM employs a probabilistic formulation of reward
shaping that uses the posterior Bayesian beliefs derived from
the traces. Our experimental analysis shows that PROB-IRM
can learn (potentially imperfect) RMs from noisy traces and
exploit them to train an RL agent to solve its tasks success-
fully. Despite the complexity of learning the RM from noisy
traces, agents trained with PROB-IRM perform comparably
to agents provided with handcrafted RMs.