Validity of Your Survey Results

NBRIValidity is important because it determines what survey questions to use, and helps ensure that researchers are using questions that truly measure the issues of importance.  The validity of a survey is considered to be the degree to which it measures what it claims to measure.

Several different types of validity must be considered when designing and deploying survey research instruments.

Construct validity refers to the extent to which operationalizations of a construct (i.e., practical surveys developed from a theory) do actually measure what the theory says they do.  For example, to what extent is an IQ questionnaire actually measuring “intelligence”?

Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct.  Such lines of evidence include statistical analyses of the internal structure of the survey including the relationships between responses to different survey items.  They also include relationships between the survey and measures of other constructs.  Construct validity is not distinct from the support for the substantive theory of the construct that the survey is designed to measure.  As such, experiments designed to reveal aspects of the causal role of the construct also contribute to construct validity evidence.

Convergent validity refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.

Content validity is a non-statistical type of validity that involves “the systematic examination of the survey content to determine whether it covers a representative sample of the behavior domain to be measured” (Anastasi & Urbina, 1997 p. 114).  For example, does an IQ questionnaire have items covering all areas of intelligence discussed in the scientific literature?

Content validity evidence involves the degree to which the content of the survey matches a content domain associated with the construct.  For example, a survey of the ability to add two numbers should include a range of combinations of digits.  A survey with only one-digit numbers, or only even numbers, would not have good coverage of the content domain.  Content related evidence typically involves subject matter experts (SME’s) evaluating survey items against the survey specifications.

A survey has content validity built into it by careful selection of which items to include (Anastasi & Urbina, 1997).  Items are chosen so that they comply with the survey specification which is drawn up through a thorough examination of the subject domain.  Foxcroft, Paterson, le Roux & Herbst (2004, p.49) note that by using a panel of experts to review the survey specifications and the selection of items the content validity of a survey can be improved.  The experts will be able to review the items and comment on whether the items cover a representative sample of the behavior domain.

Representation validity, also known as translation validity, is about the extent to which an abstract theoretical construct can be turned into a specific practical survey.

Face validity is an estimate of whether a survey appears to measure a certain criterion; it does not guarantee that the survey actually measures phenomena in that domain.  Measures may have high validity, but when the survey does not appear to be measuring what it is, it has low face validity.  Indeed, when a survey is subject to faking (malingering), low face validity might make the survey more valid.  Considering one may get more honest answers with lower face validity, it is sometimes important to make it appear as though there is low face validity whilst administering the measures.

Face validity is very closely related to content validity.  While content validity depends on a theoretical basis for assuming if a survey is assessing all domains of a certain criterion (e.g. does assessing addition skills yield a good measure for mathematical skills? – To answer this you have to know, what different kinds of arithmetic skills mathematical skills include.), face validity relates to whether a survey appears to be a good measure or not.  This judgment is made on the “face” of the survey, thus it can also be judged by the amateur.

Face validity is a starting point, but should NEVER be assumed to be provably valid for any given purpose, as the “experts” have been wrong before—the Malleus Malificarum (Hammer of Witches) had no support for its conclusions other than the self-imagined competence of two “experts” in “witchcraft detection,” yet it was used as a “survey” to condemn and burn at the stake tens of thousands of women as “witches.”

Criterion validity evidence involves the correlation between the survey and a criterion variable (or variables) taken as representative of the construct.  In other words, it compares the survey with other measures or outcomes (the criteria) already held to be valid.  For example, employee selection surveys are often validated against measures of job performance (the criterion), and IQ surveys are often validated against measures of academic performance (the criterion).

If the survey data and criterion data are collected at the same time, this is referred to as concurrent validity evidence.  If the survey data are collected first in order to predict criterion data collected at a later point in time, then this is referred to as predictive validity evidence.

Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time.  When the measure is compared to another measure of the same type, they will be related (or correlated).  Returning to the selection survey example, this would mean that the surveys are administered to current employees and then correlated with their scores on performance reviews.

Predictive validity refers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future.  Again, with the selection survey example, this would mean that the surveys are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

This is also when your measurement predicts a relationship between what you are measuring and something else; predicting whether or not the other thing will happen in the future.  This type of validity is important from a public view standpoint; is this going to look acceptable to the public or not?

The validity of the design of experimental research studies is a fundamental part of the scientific method, and a concern of research ethics.  Without a valid design, valid conclusions cannot be drawn.

Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or ‘reasonable’.  This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to ‘reasonable’ conclusions that use: quantitative, statistical, and qualitative data.

Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical surveys, and reliable measurement procedures.  As this type of validity is concerned solely with the relationship that is found among variables, the relationship may be solely a correlation.

Internal validity is an inductive estimate of the degree to which conclusions about causal relationships can be made (e.g. cause and effect), based on the measures used, the research setting, and the whole research design.  Good experimental techniques, in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, usually allow for higher degrees of internal validity than, for example, single-case designs.

Eight kinds of confounding variables can interfere with internal validity (i.e. with the attempt to isolate causal relationships):

  1. History, the specific events occurring between the first and second measurements in addition to the experimental variables;
  2. Maturation, processes within the participants as a function of the passage of time (not specific to particular events), e.g., growing older, hungrier, more tired, and so on;
  3. Surveying, the effects of taking a survey upon the scores of a second surveying;
  4. Instrumentation, changes in calibration of a measurement tool or changes in the observers or scorers may produce changes in the obtained measurements;
  5. Statistical regression, operating where groups have been selected on the basis of their extreme scores;
  6. Selection, biases resulting from differential selection of respondents for the comparison groups;
  7. Experimental mortality, or differential loss of respondents from the comparison groups;
  8. Selection-maturation interaction, etc. e.g., in multiple-group quasi-experimental designs.

External validity concerns the extent to which the (internally valid) results of a study can be held to be true for other cases, for example to different people, places or times.  In other words, it is about whether findings can be validly generalized.  If the same research study was conducted in those other cases, would it get the same results?

A major factor in this is whether the study sample (e.g. the research participants) are representative of the general population along relevant dimensions.  Other factors jeopardizing external validity are:

  1. Reactive or interaction effect of surveying, a presurvey might increase the scores on a postsurvey;
  2. Interaction effects of selection biases and the experimental variable;
  3. Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings;
  4. Multiple-treatment interference, where effects of earlier treatments are not erasable.

Ecological validity is the extent to which research results can be applied to real life situations outside of research settings.  This issue is closely related to external validity but covers the question of to what degree experimental findings mirror what can be observed in the real world (ecology = the science of interaction between organism and its environment).  To be ecologically valid, the methods, materials, and setting of a study must approximate the real-life situation that is under investigation.

Ecological validity is partly related to the issue of experiment versus observation.  Typically in science, there are two domains of research: observational (passive) and experimental (active).  The purpose of experimental designs is to survey causality, so that you can infer A causes B or B causes A.  But sometimes, ethical and/or methodological restrictions prevent you from conducting an experiment (e.g. how does isolation influence a child’s cognitive functioning?).  Then you can still do research, but it’s not causal, it’s correlational.  You can only conclude that A occurs together with B.  Both techniques have their strengths and weaknesses.

On first glance, internal and external validity seem to contradict each other – to get an experimental design you have to control for all interfering variables.  That’s why you often conduct your experiment in a laboratory setting.  While gaining internal validity (excluding interfering variables by keeping them constant) you lose ecological or external validity because you establish an artificial laboratory setting.  On the other hand with observational research you can’t control for interfering variables (low internal validity) but you can measure in the natural (ecological) environment, at the place where behavior normally occurs.  However, in doing so, you sacrifice internal validity.

The apparent contradiction of internal validity and external validity is, however, only superficial.  The question of whether results from a particular study generalize to other people, places, or times arises only when one follows an inductivist research strategy.  If the goal of a study is to deductively survey a theory, one is only concerned with factors which might undermine the rigor of the study, i.e. threats to internal validity.