Abstract

We contribute NeuralSolver, a novel recurrent solver that can efficiently and consistently extrapolate, i.e., learn algorithms from smaller problems (in terms of observation size) and execute those algorithms in large problems. Contrary to previous recurrent solvers, NeuralSolver can be naturally applied in both same-size problems, where the input and output sizes are the same, and in different-size problems, where the size of the input and output differ. To allow for this versatility, we design NeuralSolver with three main components: a recurrent module, that iteratively processes input information at different scales, a processing module, responsible for aggregating the previously processed information, and a curriculum-based training scheme, that improves the extrapolation performance of the method. To evaluate our method we introduce a set of novel different-size tasks and we show that NeuralSolver consistently outperforms the prior state-of-the-art recurrent solvers in extrapolating to larger problems, considering smaller training problems and requiring less parameters than other approaches.

NeuralSolver

We propose NeuralSolver, a novel architecture that is able to consistently extrapolate to higher observations in both same and different-size tasks. The results show for itself.

Extrapolation accuracy in same-size tasks

Extrapolation accuracy in different-size tasks

Average reward returns on the Minigrid Doorkey environment, with different sizes during execution.

Example trajectories of the different methods when extrapolating to a Minigrid Doorkey environment with an image observation of size 64×64. Only NeuralSolver is able to consistently extrapolate to larger observations without loss of performance.

Looking inside the recurrent state

To understand the extrapolation process in NeuralSolver, and how information propagates along the model, we analyse the value of the internal recurrent state (in the recurrent convolutional module) as a function of the number of iterations.

1S-Maze. Top: difference between the current iteration and last iteration of the recurrent state of NeuralSolver in a maze-like environment, where the goal is for the agent (red) to find the goal position (green). Higher differences are represented in deep blue color. Bottom: predicted action probabilities outputted by the model at different iterations, where the agent can move right (R), down (D), left (L), or up (U).

Notice how the hidden state values propagate in a way that seems to process first the dead ends. The model is certain of the correct action after the propagation of information passes the green point, representing the agent in the maze.

Visualization in dark blue of the difference between the current iteration and last iteration (iteration number 200, not represented) of the recurrent state of NeuralSolver, on the GoTo task of size 64×64. The black pixels represent the pixel positions of the recurrent state that converged to a fixed final state. Below the figure are the predicted action probabilities outputted by the model at respective iteration, namely Left (L), Up (U), Right (R), and Down (D).

We can notice that the model chooses by default the up action, while across the iterations a signal is sent from the goal position to the positions above, propagating in a circular/oval shape. Thus the model appears to learn an algorithm that attempts to signal the player if it should go down. On the other hand, in the case in which the player is on the left or on the right of the goal, a horizontal line with a slightly different contrast ranging from the goal position appears to communicate across that line the player should go left or right to reach the goal.

Visualization in dark blue of the difference between the current iteration and last iteration (iteration number 200, not represented) of the recurrent state of NeuralSolver, on the Pong task of size 64×64. The black pixels represent the pixel positions of the recurrent state that converged to a fixed final state. Below the figure are the predicted action probabilities outputted by the model at respective iteration, namely Left (L), Stay (S), and Right (R).

The model appears to learn a similar algorithm to the model in the GoTo task. The model starts by predicting that the paddle should move right. The model prediction changes once the information propagation reaches the paddle, changing it to the left action.

Visualization in dark blue of the difference between the current iteration and last iteration (iteration number 200) of the recurrent state of NeuralSolver, on the Doorkey task of size 64×64. The black pixels represent the pixel positions of the recurrent state that converged to a fixed final state. Below the grid are the predicted action probabilities outputted by the model at respective iteration, namely Forward (L), Rotate Right (R), Pickup (P), and Toggle (T).

We notice that these visualizations are harder to interpret than the other ones. This might also occur since to solve this task, the agent needs to follow a complex sequence of tasks, thus requiring multiple algorithms to find where the positions of the objects are and which action to do next.

Affiliations

¹INESC-ID ²Insituto Superior Técnico ³KTH Royal Institute of Technology

Citation

BibTeX citation:

@inproceedings{
  esteves2024NeuralSolver,
  title={NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks},
  author={Bernardo Esteves and Miguel Vasco and Francisco S. Melo},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=IxRf7Q3s5e}
}

Acknowledgements

This work was partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with ref. UIDB/50021/2020 and the project RELEvaNT, ref. PTDC/CCI-COM/5060/2021. The first author acknowledges the FCT PhD grant 2023.02298.BD. This work has also been supported by the Swedish Research Council, Knut and Alice Wallenberg Foundation and the European Research Council (ERC-BIRD).