Carlos Peña-Reyes

Abstract

Artificial deep neural networks are a powerful tool, able to extract information from large datasets and, using this acquired knowledge, make accurate predictions on previously unseen data. As a result, they are being applied in a wide variety of domains ranging from genomics to autonomous driving, from speech recognition to gaming. Many areas, where neural network-based solutions can be applied, require a validation, or at least some explanation, of how the system makes its decisions. This is especially true in the medical domain where such decisions can contribute to the survival or death of a patient. Unfortunately, the very large number of parameters required by deep neural networks is extremely challenging to cope with for explanation methods, and these networks remain for the most part black boxes. This demonstrates the real need for accurate explanation methods able to scale with this large quantity of parameters and to provide useful information to a potential user. Our research aims at providing tools and methods to improve the interpretability of deep neural networks.

In this context, we developed a method allowing a user to interrogate a trained neural network and reproduce internal representations, at various depths within the network. This allows for the discovery of biases that might have been overlooked in the training dataset and enable the user to verify and potentially discover new features that have been captured from the data by the network.

Another tool, based on rule extraction, is a method that emphasizes the regions of an image that are relevant to a certain class, through a local approximation of a neural net. This method is of particular interest when the detection of a certain feature or characteristic is particularly complex, and where artificial neural nets exceed human performance. This is especially the case in some medical diagnosis tasks.

To understand how features extracted by the network are combined to produce specific predictions, a third approach aims at extracting logical rules that reflect the behavior of the network’s fully connected layers. Such approach consists in (1) using a trained network to extract features from a set of images, (2) training a Random Forest to create a set of rules, based on those features, that behave in the same manner than the network, and (3) ranking those rules according to their contribution to the prediction. An analyst can then select the top-N rules allowing for an interpretation.