Deep reinforcement learning systems are approaching or surpassing human-level performance in specific domains, from games to decision support to continuous control, albeit in non-critical environments. Most of these systems require random exploration and state-action-value-based exploitation of the environment. However, in important real-life domains, like medical decision support or patient rehabilitation, every decision or action must be fully justified and certainly not random.
We propose to develop neural networks that learn causal models of the environment relating action to effect, initially using offline data. The models will then be interfaced with reinforcement learning and decision support networks, so that every action taken online can be explained or justified based on its expected effect. The causal model can then be refined iteratively, enabling to better predict future cascading effects of any action chain. The system, subsequently termed CausalXRL, will only propose actions that can be justified on the basis of beneficial effects. When the immediate benefit is uncertain, the system will propose explorative actions that generate most-probable future benefit. CausalXRL thus supports the user in choosing actions based on specific expected outcomes, rather than as prescribed by a black box.
We will validate CausalXRL on publicly available offline datasets of realistic environments, e.g., hospital intensive care datasets [MIMIC-III, eICU]. Further, we will apply CausalXRL to closed-loop post-stroke neuro-rehabilitation via non-invasive brain stimulation. We will also adapt CausalXRL to bio-plausible spiking neural networks, that are mechanistically close to the systems being modelled, thus enhancing explainability, and useful for implementation on low-power neuromorphic devices for portable decision support and rehabilitation.
We are an interdisciplinary team, involving early-career and established researchers. First, using the expertise of Moritz Grosse-Wentrup in causal inference and Aditya Gilra in dynamical model learning, we will develop the theory and neural network implementations that learn causal models of the environment from offline data. Further, with the expertise of Philippe Preux in reinforcement learning and explainable decision making, and of Eleni Vasilaki in brain-like RL and sparse representations, we will interface these models with reinforcement learning decision-making systems, testing them in simple virtual environments. Finally, the validated CausalXRL model will be applied to closed-loop brain-stimulation, as established in the lab of Moritz Grosse-Wentrup.
Start date: (36 months)
Funding support: 708 084 €