Learning Robust Reward Machines from Noisy Labels

KR2024

Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning

Hanoi, Vietnam. November 2-8, 2024.

Edited by

ISSN: 2334-1033
ISBN: 978-1-956792-05-8

Keywords

Symbolic Reinforcement Learning-General
Neuro-Symbolic AI-General
Autonomous Decision Making-General
Combination of different Methods (probabilistic, temporal, qualitative, ...)-General

This paper presents PROB-IRM, an approach that learns

robust reward machines (RMs) for reinforcement learning

(RL) agents from noisy execution traces. The key aspect

of RM-driven RL is the exploitation of a finite-state ma-

chine that decomposes the agent’s task into different sub-

tasks. PROB-IRM uses a state-of-the-art inductive logic pro-

gramming framework robust to noisy examples to learn RMs

from noisy traces using the Bayesian posterior degree of be-

liefs, thus ensuring robustness against inconsistencies. Piv-

otal for the results is the interleaving between RM learning

and policy learning: a new RM is learned whenever the RL

agent generates a trace that is believed not to be accepted by

the current RM. To speed up the training of the RL agent,

PROB-IRM employs a probabilistic formulation of reward

shaping that uses the posterior Bayesian beliefs derived from

the traces. Our experimental analysis shows that PROB-IRM

can learn (potentially imperfect) RMs from noisy traces and

exploit them to train an RL agent to solve its tasks success-

fully. Despite the complexity of learning the RM from noisy

traces, agents trained with PROB-IRM perform comparably

to agents provided with handcrafted RMs.