Neural Networks: Reservoir Computing

23 important questions on Neural Networks: Reservoir Computing

What are properties of recurrent neural networks?

  • Activation is not determined solely from input, can have self sustained temporal activation. (Can be seen as a dynamical system vs. function.)
  • Can have dynamical memory.
  • Are used for different study purposes: Modelling biological brain or as an engineering/machine learning tool.

What are the two main classes of recurrent neural networks?

  • Unsupervised approaches (e.g. Hopfield networks, Boltzmann machines)
  • Supervised learning of (non-linear) time series.

What is the formalisation of a recurrent neural network?

K input units u(n) with weights Win (an NxK matrix) point to N internal units x(n)W, with W NxN matrix. These point to L output y(n) times Wout, a Lx(K+L) matrix. Then it can go back with Wback which is an NxL matrix.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How is the activation for RNN determined for x?



Where f can be a sigmoid (or sometimes linear).

How does back propagation work through time?

  • Create feed forward NN.
  • Apply the same scheme, average weights over different time points.

Does back propagation work for RNN?

Not really:
  • Convergence can typically not be guaranteed.
  • Long training times.
  • Long term memory hard to learn(propagation dissolves over time)
  • More advanced algorithms require a lot of global parameters to be set.

Due to the bad properties of back propagation for RNN, a net type of RNN emerged. What is this?

Reservoir computing.

What is reservoir computing?

  • Create a random RNN (random weights, nodes) called the reservoir. This provides a non-linear transformation of input history.
  • Desired output signal is a linear combination of the neuron signal from the reservoir.
  • Only the mapping between the reservoir and the output is adapted.
  • This will lead to that you only have to learn Wout (Lx(K+N) matrix).

What are two different flavours for reservoir computing?

Echo State Networks (ESN):
  • More machine learning and mathematically oriented.

Liquid State Machines (LSM):
  • Computational Neuroscience.
  • Reservoir often biologically plausible and with complex function (e.g. spiking functions).

What is an important property of the echo state for the network?

The effect of a previous state x(n) and a previous input u(n) on a future state x(n+k) should vanish gradually as time passes (k→∞) and not persist or even get amplified.
This depends on training data and setup of weights matrices Win, W and Wback.

How an echo state be established?

There is no condition which decides whether a network has the echo state property. However, Jaeger said in 2001: Assume an untrained network (Win, W, Wback) with state update according to the calculation rule and with transfer functions tanh. Let W have a spectral radius |lambda max|>1, where lambda max is the largest absolute eigenvalue of an eigenvector of W. (also defined as rho(W)). Then the network has no echo states with respect to any input/output interval UxD containing the zero input/output (0,0).

Does it help what Jaeger proposed in 2001?

In practice: when using rho(W) < 1 we almost always see a network that satisfied the echo state property.

What is the procedure for initialising W?

1. Randomly initialise an internal weight matrix W0.
2. Normalise W0 to matrix W1 with unit spectral radius by putting W1 = 1/rho(W0)W0.
3. Scale W1 to W = alphaW1 where alpha < 1, whereby rho(W) = alpha.
4. Then W is a network with the echo state property ('has always found to be'). 

Lower alpha 'fast dynamics' of the reservoir, and vice versa.

What should be held in account for the training procedure for reservoir computing?

We do not optimise performance (y(n)-d(n)) for the first number of time points. We only look after a washout time T0. Ultimately we should of course use a test set. Note: this is an example of a training procedure, many more exist.

What are the properties of reservoir computing?

Modeling accuracy
  • Has significantly outperformed other methods of nonlinear systems identification.

Modeling capacity
  • RC is computationally universal for continuous-time, continuous-value real time systems modeled with bounded resources.

Biologically plausibility
  • Can be used to explain human information processing.

Extensibility and parsimony
  • Output units can just be added.

What is some research done on reservoir computing?

Research has mainly been devoted  to improving the reservoir.
Random is not optimal.
You can however not improve accuracy for al possible problems (no free lunch).
Approaches to generating good reservoirs?
  • Generic (task independent).
  • Unsupervised(only use input u(n).
  • Supervised (use u(n) and d(n)) 

What are some examples of generic reservoir recipes?

Approach we have discussed before, resulting in a:
  • big (many different chars).
  • sparse (loosely couples).
  • randomly connected (different) reservoir.

Modularity can be introduced (different reservoirs with inhibitory connections).

What are some examples of unsupervised reservoir adaptation?

  • Adapt the connections in the reservoir based on Hebbian learning (local method).
  • Predict the next state of the reservoir and feed that back to the reservoir.

What can you do to pre-train a supervised reservoir?

  • Use evolution to pick the best reservoir.
  • 'Greedy pruning'.
  • Reinforcement learning.

What are extreme learning machines?

Variant where the network is a single feedforward layer, it comes with nice theoretical properties:
  • Learn very fast (only one layer needs to be learned using least square).
  • Reaches the smallest training error.
  • Reaches smallest norm of output weights (i.e. generalises well).

What are some theorems that should be assumed for extreme learning machines?

  • N training examples and L hidden neurons and m output neurons.
  • Assume an NxL matrix H where G(aj,bj,xj) represents the output of a hidden neuron i for training example j.
  • Assume an Lxm matrix for the weights where betaoj expresses the weight from neuron j to output o.
  • An Nxm matrix representing the targets for the N training example where ton represents the desired value for output o of example n.

What has been proven of extreme learning machines, given any activation function?

Activation function g: R → R which is infinitely differentiable in any interval and N arbitrary distinct samples (xj,tj) ∈ Rd x Rm there exists L≤N such that for any {aj,bj}^L (i=1) randomly generates from any intervals of Rd x R, according to any continuous probability distribution with probability one.

What else has been proven of extreme learning machines?

  • Extreme learning machines with fixed network architectures where the output parameters are determined by ordinary least square can work as universal approximators provided some conditions on the function g and the span of the values for g.
  • Given certain constraints, the norm of the output weights is minimised.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo