| |
|
| Measurement |
Measurement
is the process of systematically assigning numbers to objects
or persons for the purpose of indicating differences among them
in the degree to which they possess the characteristic being measured.
The result of a measurement is a number - by definition.
BACK |
| |
|
| Nominal
Measurement |
Nominal
measurement consists of assigning items to groups or categories.
No quantitative information is conveyed and no ordering of the
items is implied. Nominal scales are therefore qualitative rather
than quantitative. Religious preference, race, and sex are all
examples of nominal scales. Frequency distributions are usually
used to analyze data measured on a nominal scale. The main statistic
computed is the mode. Variables measured on a nominal scale are
often referred to as categorical or qualitative variables.
The
numbers in nominal measurement are assigned as labels and have
no specific numerical value or meaning. For example, in referring
to methods of transportation we might code automobiles = 1, airplanes
= 2, boats = 3. This does not mean that 3 cars equals a boat.
No form of mathematical computation may be performed on
Nominal measures. [BACK] |
| |
|
| Ordinal
Measurement |
Measurements with ordinal scales are ordered in the sense that
higher numbers represent higher values. However, the intervals
between the numbers are not necessarily equal. For example, on
a five-point rating scale measuring attitudes toward gun control,
the difference between a rating of 2 and a rating of 3 may not
represent the same difference as the difference between a rating
of 4 and a rating of 5. There is no "true" zero point
for ordinal scales since the zero point is chosen arbitrarily.
The lowest point on the rating scale in the example was arbitrarily
chosen to be 1. It could just as well have been 0 or -5.
No
form of mathematical computations may done with numbers representing
ordinal measures. All that can be done with such measures is to
represent "greater than" or "less than" comparisons.
[BACK] |
| |
|
| Interval
Measurement |
On interval measurement scales, one unit on the scale represents
the same magnitude on the trait or characteristic being measured
across the whole range of the scale. For example, if anxiety were
measured on an interval scale, then a difference between a score
of 10 and a score of 11 would represent the same difference in
anxiety as would a difference between a score of 50 and a score
of 51. Interval scales do not have a "true" zero point,
however, and therefore it is not possible to make statements about
how many times higher one score is than another. For the anxiety
scale, it would not be valid to say that a person with a score
of 30 was twice as anxious as a person with a score of 15. True
interval measurement is somewhere between rare and nonexistent
in the behavioral sciences. No interval-level scale of anxiety
such as the one described in the example actually exists. A good
example of an interval scale is the Fahrenheit scale for temperature.
Equal differences on this scale represent equal differences in
temperature, but a temperature of 30 degrees is not twice as warm
as one of 15 degrees.
Interval
measures may be added or subtracted - but may not be used in any
computation requiring multiplication or addition. The reason for
this is an interval measurement scale does not have a "zero"
value. In the Farenheit scale, for instance, "Zero"
degrees does not mean there is no heat at all. [BACK] |
| |
|
| Ratio
Measurement |
Numbers are assigned that have all the attributes of ordinal,
nominal, and interval measures PLUS are based on a true "zero"
point. Ratio scales are like interval scales except they have
true zero points. A good example is the Kelvin scale of temperature.
This scale has an absolute zero. Thus, a temperature of 300 Kelvin
is twice as high as a temperature of 150 Kelvin.
A
"zero" value in a ratio measurement means there is a
complete absence of the variable being measured. Any form of mathematical
computation may be carried out on ratio measures. [BACK] |
| |
|
| Evaluation |
Evaluation
is the process of interpreting a measure (or aggregate of measures)
by means of a specific value (or set of values) to determine whether
the measure(s) represent a desirable or undesirable condition.
The result of an evaluation is a judgment. [BACK] |
| |
|
| Reliability |
The
degree to which an instrument is measuring whatever it is measuring,
consistently. A reliable instrument will provide consistent measures
of an object or person as long as there is no change in the object
or person on the dimension or characterstic being measured. For
example, suppose I get on my bathroom scale and the scale reads
(unfortunately) 196 lbs. Then I get off, wait a minute, and get
back on and it reads 196 lbs again the bathroom scale may be said
to be reliable. Reliability values can range from -1.00
- 0 - +1.00. [BACK] |
| |
|
| Split
Halves Reliability |
To
determine the reliability of an instrument using the Split Halves
method, the instrument administered (measures are taken using the
instrument). The instrument is then divided into two sets of items
(top half and bottom half, or some random distributing of items
into two groups) and the responses from the two groups are correlated.
The correlation between the responses on the top (first) half and
the bottom (second) half is the split-halves reliability index of
the instrument. A split-half reliability of -1.00 means
that those that scored highly on the first half scored poorly on
the second half and vice versa. A split-half reliability of 0.00
means that there was no correlation between the first and second
halves. A split-half reliability of 1.00 means that there was a
perfect positive correlation between the first and second halves.
[BACK] |
| |
|
| Test-Retest
Reliability |
To
determine the reliabiity of an instrument using the Test-Retest
method the instrument is administered to a specific group of individuals.
Then, at a later time, the instrument is administered to the same
group making certain that nothing has happened to the group to affect
the characteristic or dimension being gathered. The data from the
first administration of the instrument is correlated with the data
from the second administration. This correlation is the Test-Retest
reliability index of the instrument. A test-retest reliability
of -1.00 means that those that scored highly on the first administration
scored poorly on the second administration and vice versa. A test-retest
reliability of 0.00 means that there was no correlation
between the first and second administrations. A test-retest
reliability of 1.00 means that there was a perfect
positive correlation between the first and second administrations
of the instrument. [BACK] |
| |
|
| Kuder-Richardson
Reliability |
In
computing the reliability using the Kuder-Richardson approach, each
and every item is considered an individual 'measure'. Then every
possible pair of individual measures (items) are considered and
the correlations computed. The average correlation of all such correlations
is the reliability index of the instrument as computed by the appropriate
Kuder-Richardson formula. [BACK] |
| |
|
| Validity |
A
valid measurement instrument is one that, in fact, measures (reliably
and with an acceptable degree of accuracy) what it was desgined
to measure. NOTE: A measurement instrument may be reliable and accurate
but NOT valid. That is, it may be measuring SOMETHING reliably and
accurately, but not the thing that was intended. For example, a
bathroom scale may be a very reliable and accurate measure - but
of WEIGHT not HEIGHT or IQ. [BACK] |
| |
|
| Face
Validity |
The
degree to which an instrument appears to measure
what it is intended to measure. Face validity is usually determined
by providing the instrument to 1) experts in the issue to be measured
and 2) a sample of people of the type who will be completing the
instrument, and asking their judgment as to whether the instrument
appears to measuring the issue or characteristic of interest. [BACK] |
| |
|
| Criterion-Related
Reliability (predictive validity) |
The
degree to which the results of the instrument correlate with another
measure that is an unquestioned measure of the issue or characteristic
of interest. For example, the Stanford Binet Intelligence Test is
a long, exhaustive measure requiring a high degree of training to
administer and score and is a long-accepted measure of intelligence.
If we develop a short, easy-to-score IQ test we can determine its
predictive or criterion-related validity by administering it to
people who have already taken the Stanford Binet and then determining
whether our instrument is highly related to, or would have predicted,
their Stanford Binet scores. [BACK] |
| |
|
| Construct
Validity |
The
degree to which the results of the instrument follow a pattern predicted
by a model or theory. The degree to which a measure relates to other
variables as xpected withn a system of theoretical relationships.
[BACK] |
| |
|
| Content
Validity |
The
degree to which a measure covers the range of meanings included
within a concept. (Is every aspect or dimension that defines the
concept being measured by some item or set of items?) [BACK] |
| |
|
| Variable |
A variable is any measured characteristic or attribute that differs
for different subjects. For example, if the height of 30 trees were
measured, then height would be a variable. [BACK] |
| |
|
| Continuous
Variable |
A continuous variable is one for which, within the limits the variable
ranges, any value is possible. For example, a person's height is
a continuous variable ("height" exists anywhere along
the range of values possible). [BACK] |
| |
|
| Discrete
Variable |
A
discrete variable is one that cannot take on all values within the
limits of the variable. For example, responses to a five-point rating
scale can only take on the values 1, 2, 3, 4, and 5. The variable
cannot have the value 1.7. [BACK] |
| |
|
| Quantitative
Variable |
Quantitative
variables are measured on an ordinal, interval, or ratio scale.
If fifty-five-year old subjects were asked to name their favorite
actress, then the variable would be qualitative. If the time it
took them to respond were measured, then the variable would be quantitative.
[BACK] |
| |
|
| Qualitative
Variable |
Qualitative
variables are measured on a nominal scale. If fifty-five-year old
subjects were asked to name their favorite actress, then the variable
would be qualitative. If the time it took them to respond were measured,
then the variable would be quantitative. [BACK] |
| |
|
| Independent
Variable |
Variables
that are manipulated in research studies are referred to as independent
variables. The dependent variable is the one that you expect to
change as a function of an independent (or intervention) variable.
For example,
in a study examining whether using a hand calculator improves
learning statistics, student performance (as measured by a statistics
test) would be the dependent variable, and using (or not using)
a hand calculator would be the independent variable.
In a research study, a variable that changes as a function of
some intervention (independent variable) is the dependent variable.
(See Independent Variable discussion above)[BACK]
|
| |
|
| Dependent
Variable |
When
an experiment is conducted, some variables are manipulated by the
experimenter and others are measured from the subjects. The former
variables are called independent variables; or factors, the latter
are called dependent variables or dependent
measures. (see Independent Variable discussion
above) [BACK] |
| |
|
| SCALES:
General Discussion |
This
section is exerpted from Mehrens and Lehmann's "Measurement
and Evaluation in Education and Psychology" 3rd edition,
1984, published by Holt, Rinehart & Winston.
Scales
designed to measure attitudes (beliefs, perceptions, philosophical
positions, orientation, prejudices, etc.) are classified in terms
of their method of constuction. There are three major proceduresor
techniques for constructing attitude scales: summated ratings
such as the Minnesota Scale for the Survey of Puplic Opinion
(Likert tyope); equal-appearing interval scales such as the Thurston
and Remmers scales (Thurston type); and cumulative scales (Guttman
type). In addition, the Semantic Differential, though not a type
of scale construction, is also used.
These
techniques differ primarily in their format: in the positioning
of the statemetns or adjectives along a continuum versus only
at the extremes; and whether or not the statements are cumulative
(such as the Bogardus social distance scale). There are advantages
and disadvantages associated with each of these techniques. For
example, the Thurston method places a premium on logic and empiricism
in its construction but unfortunately is somewhat laborious to
develop such an instrument.
In
the Likert, Thurston, and Guttman methods, statements are written
and assembled into a scale and the subject responds (either positively
or negatively) to each statement. On the basis of the subject's
responses, an inference is made about the respondent's attitude
toward some object(s). In the Semantic Differential, the subject
rates a particular attitude object(s) on a series of bipolar semantic
scales such as good-bad, sweet-sour, strong-weak. Each of these
approaches to constructing a scale is different. Each has its
own advantages and limitations. Each of the techniques makes different
assumptions abou the kind of test items used and the information
provided, even though there are some assumptions that are basic
and common regardless of the method used. For example, each method
assumes that subjective attitudes can be measured quantitatively,
thereby permitting a numerical representation (score) of a person's
attitude. Each method assumes that a particular test item has
the same meaning for all respondents, and therefore a
given score to a particular item will connote the same attitude.
"Such assmptions may not always be justified
but as yet, no measurement technique has been developed which
does not include them." [BACK] |
| |
|
| Thurstone
Scale |
a
way of measuring people's attitudes along a single dimension by
asking them to indicate that they agree or disagree with each
of a large set of statements (e.g. 100) that are about that attitude.
The statements are designed to be parallel in construction, but
some toward one end of the scale and some toward the other end,
and each trying to indicate the attitude in a slightly different
way.
This can be contrasted with a Likert scale which
asks someone to indicate their degree of agreement or disagreement
with a single statement, e.g. a Likert scale would be "Please
rate on a scale of 1 (Strongly Disagree) to 4 (Strongly Agree)
the statement:
This software
was easy to use."
The corresponding
Thurstone scale would state this question in
multiple ways, eg.:
* I had trouble finding what I wanted.
* I liked how easy the software was.
* The software has many convenient features.
* The software was confusing.
* etc.
Finally, to choose the statements people respond to, you need
to validate them. For instance, you'd have expert judges (or pre-testing
subjects) rate each of the statements in terms of to what extent
they reflect either extreme of the attitude being measured. [BACK]
|
| |
|
| Likert
Scale |
A
rating scale measuring the strength of agreement with a clear
statement. Often administered in the form of a questionnaire used
to gauge attitudes or reactions.
For example:
Question: "I found the software easy to use..."
1 Strongly Disagree
2 Disagree
3 Agree
4 Strongly Agree
[BACK]
|
| |
|
| Semantic
Differential |
a
type of survey question where respondents are asked to rate their
opinion on a linear scale between 2 endpoints, typically with
7 levels. For example:
Please rate this software on the following dimensions:
easy to use 1 2 3 4 5 6 7 hard to use
-or-
easy to use 3 2 1 0 1 2 3 hard to use
[BACK] |
| |
|
| Guttman
Scale |
The
Guttman scale is a comparative scaling technique developed by
researcher Louis Guttman in 1944.
In a Guttman scale, a unidimensional set of items are ranked in
order, much like a Likert scale; items range from least extreme
to most extreme position. It is implicit that those who agree
with a more extreme position also agree with the less extreme
positions preceding it. The rating is scaled by summing all responses
until the first negative response in the list.
The Guttman scale has become less popular in recent years, although
is still used occasionally
Here is a hypothetical (extreme)example of the scale:
- Some children
occasionally require physical restraint when unruly. (Least
extreme)
- Slapping
a child's hand is an effective discipline technique.
- Spanking
is sometimes necessary to control children.
- Sometimes
children require firm discipline with a belt or whip.
- Some children
need a regular vigorous beating to keep them in line. (Most
extreme)
[BACK] |
| |
|
| Bogardus
Social Distance Scale |
A
Bogardus Social Distance Scale is comprised of a set of questions
that increase in terms of closeness of contact that the respondent
may or may not want with members of another racial or ethnic group.
The differences in intensity of contact presume that if the respondent
is willing to accept a given kind of association, he or she would
be willing to accept all those preceding it in the list of questions
– those with lesser intensities. For example, the person
willing to permit members of a different race or ethnicity to
live in the neighborhood will surely accept them in the community
or nation but may or may not accept them as next-door neighbors
or relatives. There is a logical structure of intensity inherent
in the set of questions.
[BACK] |
| |
|
| |
|