NeuralSolver
We propose NeuralSolver, a novel architecture that is able to consistently extrapolate to higher observations in both same and different-size tasks. The results show for itself.
Extrapolation accuracy in same-size tasks
Extrapolation accuracy in different-size tasks
Average reward returns on the Minigrid Doorkey environment, with different sizes during execution.
Looking inside the recurrent state
To understand the extrapolation process in NeuralSolver, and how information propagates along the model, we analyse the value of the internal recurrent state (in the recurrent convolutional module) as a function of the number of iterations.
Notice how the hidden state values propagate in a way that seems to process first the dead ends. The model is certain of the correct action after the propagation of information passes the green point, representing the agent in the maze.
We can notice that the model chooses by default the up action, while across the iterations a signal is sent from the goal position to the positions above, propagating in a circular/oval shape. Thus the model appears to learn an algorithm that attempts to signal the player if it should go down. On the other hand, in the case in which the player is on the left or on the right of the goal, a horizontal line with a slightly different contrast ranging from the goal position appears to communicate across that line the player should go left or right to reach the goal.
The model appears to learn a similar algorithm to the model in the GoTo task. The model starts by predicting that the paddle should move right. The model prediction changes once the information propagation reaches the paddle, changing it to the left action.
We notice that these visualizations are harder to interpret than the other ones. This might also occur since to solve this task, the agent needs to follow a complex sequence of tasks, thus requiring multiple algorithms to find where the positions of the objects are and which action to do next.
Affiliations
1INESC-ID 2Insituto Superior Técnico 3KTH Royal Institute of Technology
Citation
BibTeX citation:
@inproceedings{
esteves2024NeuralSolver,
title={NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks},
author={Bernardo Esteves and Miguel Vasco and Francisco S. Melo},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=IxRf7Q3s5e}
}
Acknowledgements
This work was partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with ref. UIDB/50021/2020 and the project RELEvaNT, ref. PTDC/CCI-COM/5060/2021. The first author acknowledges the FCT PhD grant 2023.02298.BD. This work has also been supported by the Swedish Research Council, Knut and Alice Wallenberg Foundation and the European Research Council (ERC-BIRD).