Classical analysis of item scores
16 important questions on Classical analysis of item scores
When you make a distribution of scores of the total sample, on a given item in which the answer mode is scaled, by which concepts can this distribution be described?
- location
- dispersion
- shape
How do we call the place on the scale where the distribution of the item scores is centered?
How do we call the scatter of the item scores in a distribution, de verdeling?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What can be determined using the location of the item score distribution?
- the classical item difficulty, in maximum performance tests, or
- the classical item attractiveness, in typical performance tests
How do you calculate the mean of a dichotomotously scored item, and how is this expressed?
- sum all the scores, divide by total sample
- p = the proportion who answered the question correct
What are the steps of item analysis?
- lower bound of the test reliability is determined
- items are rewritten or removed
- stop when a criterium is reached, For instance:
- – A certain reliability (E.g., 0.8)
- – A certain number of items (E.g., 25)
In a maximum performance test, how do we call items that a lot of test takers answer incorrect?
In a typical performance test, how do we call items that a lot of test taker respond low to?
How do we know the item difficulty/attractiveness?
When creating a test, there are two sets of guidelines, describe them.
- – General measurement instrument:
• “something for everyone”
– = you should have a proportional number of easy/difficulty
attractive/unattractive items
- – Instrument for cut-off decisions (e.g., hiring a new employee):
- • Only consider items with a difficulty/attractiveness that is relevant for the required decision
- • E.g., if selecting a manager, dont ask simple arithmatic questions
- • E.g., if selecting highly depressed subjects, dont ask too attractive questions (“I sometimes feel sad”)
What results in high test reliability?
- Large item correlations result in high reliability
- Items with larger variances contribute more to the reliability
How do we call the concept that describes How well can a given item distinguish between people that differ on the underlying construct, So: how well can a given item predict the construct?
What does it mean when items discriminate well?
When using the item-test correlation, why is the correlation biased upwards?
- because you're also correlating the item with itself, so even bad items would get some correlation.
How can you fix the fact that you're biasing upwards when using the item-test correlation?
Why are items with a high variance more useful than items with a low variance?
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding