Neural Networks: Reservoir Computing
23 important questions on Neural Networks: Reservoir Computing
What are properties of recurrent neural networks?
- Activation is not determined solely from input, can have self sustained temporal activation. (Can be seen as a dynamical system vs. function.)
- Can have dynamical memory.
- Are used for different study purposes: Modelling biological brain or as an engineering/machine learning tool.
What are the two main classes of recurrent neural networks?
- Unsupervised approaches (e.g. Hopfield networks, Boltzmann machines)
- Supervised learning of (non-linear) time series.
What is the formalisation of a recurrent neural network?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
How is the activation for RNN determined for x?
Where f can be a sigmoid (or sometimes linear).
How does back propagation work through time?
- Create feed forward NN.
- Apply the same scheme, average weights over different time points.
Does back propagation work for RNN?
- Convergence can typically not be guaranteed.
- Long training times.
- Long term memory hard to learn(propagation dissolves over time)
- More advanced algorithms require a lot of global parameters to be set.
Due to the bad properties of back propagation for RNN, a net type of RNN emerged. What is this?
What is reservoir computing?
- Create a random RNN (random weights, nodes) called the reservoir. This provides a non-linear transformation of input history.
- Desired output signal is a linear combination of the neuron signal from the reservoir.
- Only the mapping between the reservoir and the output is adapted.
- This will lead to that you only have to learn Wout (Lx(K+N) matrix).
What are two different flavours for reservoir computing?
- More machine learning and mathematically oriented.
Liquid State Machines (LSM):
- Computational Neuroscience.
- Reservoir often biologically plausible and with complex function (e.g. spiking functions).
What is an important property of the echo state for the network?
This depends on training data and setup of weights matrices Win, W and Wback.
How an echo state be established?
Does it help what Jaeger proposed in 2001?
What is the procedure for initialising W?
2. Normalise W0 to matrix W1 with unit spectral radius by putting W1 = 1/rho(W0)W0.
3. Scale W1 to W = alphaW1 where alpha < 1, whereby rho(W) = alpha.
4. Then W is a network with the echo state property ('has always found to be').
Lower alpha 'fast dynamics' of the reservoir, and vice versa.
What should be held in account for the training procedure for reservoir computing?
What are the properties of reservoir computing?
- Has significantly outperformed other methods of nonlinear systems identification.
Modeling capacity
- RC is computationally universal for continuous-time, continuous-value real time systems modeled with bounded resources.
Biologically plausibility
- Can be used to explain human information processing.
Extensibility and parsimony
- Output units can just be added.
What is some research done on reservoir computing?
Random is not optimal.
You can however not improve accuracy for al possible problems (no free lunch).
Approaches to generating good reservoirs?
- Generic (task independent).
- Unsupervised(only use input u(n).
- Supervised (use u(n) and d(n))
What are some examples of generic reservoir recipes?
- big (many different chars).
- sparse (loosely couples).
- randomly connected (different) reservoir.
Modularity can be introduced (different reservoirs with inhibitory connections).
What are some examples of unsupervised reservoir adaptation?
- Adapt the connections in the reservoir based on Hebbian learning (local method).
- Predict the next state of the reservoir and feed that back to the reservoir.
What can you do to pre-train a supervised reservoir?
- Use evolution to pick the best reservoir.
- 'Greedy pruning'.
- Reinforcement learning.
What are extreme learning machines?
- Learn very fast (only one layer needs to be learned using least square).
- Reaches the smallest training error.
- Reaches smallest norm of output weights (i.e. generalises well).
What are some theorems that should be assumed for extreme learning machines?
- N training examples and L hidden neurons and m output neurons.
- Assume an NxL matrix H where G(aj,bj,xj) represents the output of a hidden neuron i for training example j.
- Assume an Lxm matrix for the weights where betaoj expresses the weight from neuron j to output o.
- An Nxm matrix representing the targets for the N training example where ton represents the desired value for output o of example n.
What has been proven of extreme learning machines, given any activation function?
What else has been proven of extreme learning machines?
- Extreme learning machines with fixed network architectures where the output parameters are determined by ordinary least square can work as universal approximators provided some conditions on the function g and the span of the values for g.
- Given certain constraints, the norm of the output weights is minimised.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding