How to Construct a Psychological Test
Here are the basic steps to constructing a useful psychological test:
1) Determine the trait, ability, emotional state, disorder, interests, or attitude that you want to assess. Psychological tests can be created that measure —
Abilities, such as musical skill, writing skill, intelligence, or reading comprehension.
Personality Traits, such as extroversion, creativity, or deviousness,
Disorders, such as anxiety, depression, psychotic thought disorder,
Emotions, such as happiness and anger,
Attitudes, such as authoritarianism or prejudice,
Interests, such as career-related interests.
2) Decide how you want to measure the construct you selected. In general, the best measures sample the behavior of interest. For instances, if you want to determine how aggressive a person is, the best measure would be to provide a frustrating situation, and see whether the person reacts aggressively. It’s not always practical or ethical to directly measure constructs, so instead, tests rely on a person’s self-report of their behavior.
A number of other factors need to be considered. Should the test be written, or should it be administered orally? Should the responses be discrete (a rating scale, or Yes/No answers), or should it allow open-ended answers that can be reliably rated? Should the responses be oral, written, or nonverbal?
3) Does the construct that you want to measure have only one dimension, or can it be broken down into several dimensions? For instance, intelligence is usually considered multi-dimensional, consisting of several different verbal abilities and nonverbal abilities.
4) Once you’ve made decisions about the factors above, you can begin creating your test items. If the items are measuring a particular area of knowledge, then you will review textbooks or consult subject-matter experts in that area. If you are measuring a personality trait or emotional state, then the items should be consistent with a theory or agreed upon description of what you are measuring. It’s generally best for several experts to generate items.
5) After generating items, it often makes sense to have experts rate the quality of the items, and to retain only the items with the highest ratings. The experts can also suggest revisions. If your items measure depression, the experts should be mental health professionals. If your items measure business skill, your experts should be business executives and managers.
6) Your test is then ready to be tested on a sample of people. Your sample should be a good cross-section of the people that you will want to compare test-takers to. After you administer your test to a sample of people:
-Determine the correlation between each item and the sum of the other items. If your test has subscales, do this separately for each subscale. Eliminate items that do not correlate well with the rest of their scale.
-Eliminate items that are too easy or too hard. If almost everyone agrees with an item or gets the correct answer, it is not a useful item.
-This procedure will maximize the test’s internal consistency, one measure of reliability. You should calculate coefficient alpha. This statistic measures the degree to which a test scale measures a single construct, and the degree to which the test items are all measuring the same ability or trait. Alpha has a theoretical maximum of +1.00. A good test alpha is greater than .70.
7) The final test should be cross-validated on a new sample. During cross-validation, you can demonstrate test validity:
-You should be able to show that your test scores correlate with what they are supposed to correlate with. For instance, a test of math skill should yield higher scores for students with higher math grades. A test of depression should yield higher scores for people who have been diagnosed with Major Depression.
-Factor analysis can be used to demonstrate that the test subscales group together (inter-correlate) in the way that theory would predict.
8) When the test is cross-validated, you can also calculate normative data. You can calculate the mean (average) score for test-takers, and calculate the standard deviation to determine how spread out the scores are around the mean. These statistics are extremely useful, because now any individual’s score can be compared to the scores of people in general.
If your test has subscales, you will find the mean and standard deviation for each subscale. It is also often useful to find separate normative data for different groups of potential test takers. Many tests have norms according to gender, ethnic group, and age.
