Validity of Your Survey Results

Validity of Your Survey Results

Validity is important because it determines what survey questions to use, and helps ensure that researchers are using questions that truly measure the issues of importance.  The validity of a survey is considered to be the degree to which it measures what it claims to measure.

Several different types of validity must be considered when designing and deploying survey research instruments.

Construct Validity

Refers to the extent to which surveys developed from a theory do actually measure what the theory says they do.  For example, to what extent is an IQ questionnaire actually measuring “intelligence”?

Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct.  Such lines of evidence include statistical analyses of the internal structure of the survey including the relationships between responses to different survey items.

Convergent Validity

Refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.

Content Validity

This is a non-statistical type of validity that involves the systematic examination of the survey content to determine whether it covers a representative sample of the behavior domain to be measured. For example, does an IQ questionnaire have items covering all areas of intelligence?

Content validity evidence involves the degree to which the content of the survey matches a content domain associated with the construct.  For example, a survey of the ability to add two numbers should include a range of combinations of digits. A survey with only one-digit numbers, or only even numbers, would not have good coverage of the content domain.

Representation Validity

This is also known as translation validity and is about the extent to which an abstract theoretical construct can be turned into a specific practical survey.

Face Validity

This is an estimate of whether a survey appears to measure a certain criterion.  Measures may have high validity, but when the survey does not appear to be measuring what it is, it has low face validity. Considering one may get more honest answers with lower face validity, it is sometimes important to make it appear as though there is low face validity whilst administering the measures.

Face validity is very closely related to content validity.  While content validity depends on a theoretical basis that a survey is assessing all domains of a certain criterion (e.g. does assessing addition skills yield a good measure for mathematical skills?), face validity relates to whether a survey appears to be a good measure or not.  This judgment is made on the “face” of the survey.

Criterion Validity

This involves the correlation between the survey and a criterion variable (or variables) taken as representative of the construct.  In other words, it compares the survey with other measures or outcomes (the criteria) already held to be valid.  For example, employee selection surveys are often validated against measures of job performance (the criterion), and IQ surveys are often validated against measures of academic performance (the criterion).

Concurrent Validity

This refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time.  When the measure is compared to another measure of the same type, they will be related (or correlated).  Returning to the selection survey example, this would mean that the surveys are administered to current employees and then correlated with their scores on performance reviews.

Predictive Validity

This refers to the degree to which the operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future.  Again, with the selection survey example, this would mean that the surveys are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

This is also when your measurement predicts a relationship between what you are measuring and something else, predicting whether or not the other thing will happen in the future.  This type of validity is important from a public view standpoint; is this going to look acceptable to the public or not?

Statistical Conclusion Validity

This is the degree to which conclusions about the relationship among variables based on the data are correct or “reasonable”. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical surveys, and reliable measurement procedures.  As this type of validity is concerned solely with the relationship that is found among variables, the relationship may be solely a correlation.

Internal Validity

This is an inductive estimate of the degree to which conclusions about causal relationships can be made (e.g. cause and effect), based on the measures used, the research setting, and the whole research design.  Good experimental techniques, in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, usually allow for higher degrees of internal validity than, for example, single-case designs.

Eight kinds of confounding variables can interfere with internal validity (i.e. with the attempt to isolate causal relationships):

  1. History, the specific events occurring between the first and second measurements in addition to the experimental variables
  2. Maturation, processes within the participants as a function of the passage of time (not specific to particular events), e.g., growing older, hungrier, more tired, and so on
  3. Surveying, the effects of taking a survey upon the scores of a second surveying
  4. Instrumentation, changes in calibration of a measurement tool or changes in the observers or scorers may produce changes in the obtained measurements
  5. Statistical regression, operating where groups have been selected on the basis of their extreme scores
  6. Selection, biases resulting from differential selection of respondents for the comparison groups
  7. Experimental mortality, or differential loss of respondents from the comparison groups
  8. Selection-maturation interaction, e.g., in multiple-group quasi-experimental designs.
External Validity

This concerns the extent to which the (internally valid) results of a study can be held to be true for other cases, for example to different people, places or times. If the same research study was conducted in those other cases, would it get the same results?

A major factor in this is whether the study sample (e.g. the research participants) are representative of the general population along relevant dimensions.  Other factors jeopardizing external validity are:

  1. Reactive or interaction effect of surveying, a pre-survey might increase the scores on a post-survey
  2. Interaction effects of selection biases and the experimental variable
  3. Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings
  4. Multiple-treatment interference, where effects of earlier treatments are not erasable.
Ecological Validity

This is the extent to which research results can be applied to real life situations outside of research settings.  This issue is closely related to external validity but covers the question of to what degree experimental findings mirror what can be observed in the real world (ecology = the science of interaction between organism and its environment).  To be ecologically valid, the methods, materials, and setting of a study must approximate the real-life situation that is under investigation.

Ecological validity is partly related to the issue of experiment versus observation.  Typically in science, there are two domains of research: observational (passive) and experimental (active).  The purpose of experimental designs is to survey causality, so that you can infer A causes B or B causes A.  But sometimes, ethical and/or methodological restrictions prevent you from conducting an experiment (e.g. how does isolation influence a child’s cognitive functioning?).  Then you can still do research, but it’s not causal, it’s correlational.  You can only conclude that A occurs together with B.  Both techniques have their strengths and weaknesses.

On first glance, internal and external validity seem to contradict each other – to get an experimental design you have to control for all interfering variables.  That’s why you often conduct your experiment in a laboratory setting.  While gaining internal validity (excluding interfering variables by keeping them constant) you lose ecological or external validity because you establish an artificial laboratory setting.  On the other hand with observational research you can’t control for interfering variables (low internal validity) but you can measure in the natural (ecological) environment, at the place where behavior normally occurs.  However, in doing so, you sacrifice internal validity.

The apparent contradiction of internal validity and external validity is, however, only superficial.  The question of whether results from a particular study generalize to other people, places, or times arises only when one follows an inductivist research strategy.  If the goal of a study is to deductively survey a theory, one is only concerned with factors which might undermine the rigor of the study, i.e. threats to internal validity.