Reinforcement Learning
4 important questions on Reinforcement Learning
We are going to apply reinforcement learning to support a user in becoming more active. We measure the activity level and activity type of a person and want to provide suggestions to that person based on his measured state (examples of advices could be: do activity x, stop activity y, etc.).
(3 pt) Explain what the Markov Property means (you can relate your explanation to this specific example or you can also explain it in general if you want).
(4 pt) Explain how the one step Q-learning algorithm works.
(4 pt) Some of the measurements we perform are continuous (specifically, the activity level is), would this be a problem for SARSA or Q-learning? Argue why (not).
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
(4 pt) We have the choice to either apply an ε-greedy approach or a softmax approach to select the actions. We know that the person we are supporting does not change at all in terms of responses to messages. Which one of the two approaches would be most suitable to use? Argue your choice.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding