Home / Summaries / Resit 2018 / data-set-neurons

Predictive Modeling

3 important questions on Predictive Modeling

Imagine that we have collected datasets around three people, identified by qs1, qs2, and qs3.

(4 pt) Imagine we want to apply predictive modeling on a population level for unseen data of known users, and we assume a temporal ordering in the dataset. Specify what data would go into our training set and what data would go into our test set (you can assume a 60/40 split). Argue how you came to your answer.

We are interested in predicting unseen data of known users. This means

that we will train on part of the data of each user. Given that we have a dataset with temporal ordering, we take the first 60% of each user as training set (so we combine these chunks of 60% data over all users) and the remaining 40% as test data.

(4 pt) Explain the concept of PAC learnability and how it relates to the VC dimension.

PAC learnability stands for Probably Approximately Correct learnability.

A hypothesis set is said to be PAC learnable when it can be shown that given any value of δ, ε there is an n where with probability 1 − δ the dif- ference between the in-sample and out-of-sample error is less than ε. The VC dimension relates to hypothesis sets and the number of input vectors that can be shattered. It can be shown that any hypothesis set with a finite VC dimension is also PAC learnable.

(4 pt) When we do parameter tuning on recurrent neural networks and echo state networks, which of the two networks would you expect to require the largest number of neurons? Argue why.

The echo state network would require more neurons as the connections

between the neurons in the reservoir are randomly initialized (rather than learned like in the recurrent neural networks). This means that more neurons are needed to make sure that the right signal is produced by the reservoir to learn the problem properly.

The question on the page originate from the summary of the following study material:

Resit 2018

View summary