connectionist world models

This site contains 3 simple demonstrations of a connectionist world model. For details please see
M. Toussaint (2004): Learning a world model and planning with a self-organizing dynamic neural system [ps.gz]. In Advances of Neural Information Processing Systems 17 (NIPS 2003), 929-936, MIT Press, Cambridge. For a draft see nlin.AO/0306015.

The following 3 movies visualize the experiments with the CWM on a maze problem. The speed of the movies directly corresponds to the speed of the experiment done online on a 2GHz Pentium (the code though is not optimized for speed).

self-organization [avi]
The movie displays the growth of the CWM during self-organization. On the left, you find the agent exploring the maze via a random walk. On the right, the central layer of the CWM is displayed; the color of the connections corresponds to their weights (red=1 and blue=0).
maze


planning [avi]
The movie visualizes the planning process with the CWM. On the left, you find the maze; the current goal is marked by a red spot. The agent (the white spot) moves straight to the goal. On the right, you find the value field on the central layer visualized (red=1 and blue=0). Whenever the agent reaches the goal, the goal is changed to a new random position. The value field rearranges quickly and relaxes to its fixed point (which corresponds to the Bellman equation). Given this stationary value field, the agent chooses actions that lead `uphill' towards the goal.

learning [avi]
The movie displays how the CWM is capable to learn changes of the world. On the right, the color of the connections visualizes their weights (red=1 and blue=0). As before the goal randomly set within the maze and changes whenever the agent reaches the goal. However, at some time, a trespass in the upper left part of the maze is shut. It occurs that the agent tries to move through this trespass; it re-adapts its world model (note how some connections become blue!); and, if the readaptation of these weights induces a sufficient change of the relaxed value field, the agent moves around the blockade to the goal. Thereafter, the agent has to learn that the trespass is blocked also when approaching from the left. In the lower right part of the maze, another trespass was blocked and the agent analogously learns this blockade (see that all connections in this region turn blue). Once the two blockades have been learned, the agent never explores them again.
move 1 move 2 move 3 move 4 move 5