Principles of Psychometric Testing — What is test reliability and why is it so important?

Test Reliability
Reliability, in the context of psychometric instruments, refers to the degree to which a test consistently measures the subject matter it is designed to measure. The greater reliability an instrument possesses, the more confidence one has that the scores obtained from a particular administration are essentially the same scores that would manifest if the test were re-administered to the same test takers.
Conversely, if the instrument is unreliable, it would provide inconsistent information about performance and the scores would be expected to fluctuate unpredictably with each administration. In other words, if, in the absence of any permanent change in a person, due to growth, learning, disease, or injury scores on the instrument vary with the occasion or situation, the test is deemed unreliable in explaining or predicting an individual’s behavior.
Reliability is actually a property of the scores obtained when the test is administered to a particular group of people on a specific occasion and under certain conditions. Additionally, it is best conceptualized as a continuum, ranging from minimal consistency of measurement to near perfect repeatability.
Reliability is not the same as stability. In determining reliability, it is assumed that the instrument is measuring a relatively stable characteristic. Unlike instability, unreliability is the result of measurement errors produced by temporary internal states, such as low motivation or indisposition, or external conditions, such as a distracting or uncomfortable testing environment.
The reliability of a set of test scores is expressed numerically as a reliability coefficient — a positive decimal number ranging from .00, indicating a total unreliability of measurement, to 1.00 indicating perfect reliability. High reliability coefficients indicate that the individuals’ scores reflect their true status with respect to the variable being measured and that the effect of errors of measurement is small.
Because the variance of true scores cannot be computed directly, reliability is usually estimated by analyzing the effects of variations in conditions of administration and test content on observed scores. As such, reliability is not influenced by systematic changes in scores that have a similar effect on all examinees, but only by unsystematic changes that have different effects on different examinees. Such unsystematic factors influence the error variance of the test and hence the reliability of the scores.
Each of the various methods of estimating reliability — test-retest, parallel forms, internal consistency — takes into account somewhat different conditions that can produce unsystematic changes in observed scores and thereby affect the error variance and reliability coefficient obtained when the test is administered under certain conditions to a specific group of people.
Validity indicates the appropriateness of a test, while reliability indicates consistency of the scores produced. Both are important for judging the suitability of an instrument. A valid test is always reliable, but a reliable test is not always valid. In other words, if a test is measuring what it is supposed to measure, it will be reliable, but a reliable test can consistently measure the “wrong” thing, thus making it invalid.






