Survey Validity

Measuring survey validity

Several different types of validity must be considered when designing and deploying survey research instruments.

Construct Validity

Construct validity is the extent to which surveys developed from a theory actually measure what the theory says they do. For example, to what degree is an IQ questionnaire actually measuring “intelligence”?

Construct validity evidence involves the empirical and theoretical support for the interpretation of a construct. Such evidence includes statistical analyses of a survey’s internal structure, such as the relationships between responses to different survey items.

Convergent Validity

Convergent validity refers to the degree to which two measures that theoretically should be related, are in fact related. For instance, to show the convergent validity of a test of mathematics skills, the scores on the test can be correlated with scores on other tests that are also designed to measure basic mathematics ability. High correlations between the test scores would be evidence of convergent validity.

Content Validity

This is a non-statistical type of validity that involves the systematic examination of the survey content to determine whether it covers a representative sample of the behavior domain to be measured. For example, does an IQ questionnaire contain items covering all areas of intelligence?

Content validity evidence involves the degree to which the content of the survey matches a content domain associated with the construct. For instance, a survey of the ability to add two numbers should include a range of combinations of digits. A survey with only one-digit numbers, or only even numbers, would not have good coverage of the content domain.

Face Validity

Face validity is an estimate of whether a survey appears to measure a certain criterion. Measures might have high validity, but when the survey does not appear to be measuring what it claims to measure, it has low face validity.

Since you might get more honest answers with lower face validity, it’s sometimes best to make it seem as though there is low face validity while administering the measures.

Face validity is very closely related to content validity. While content validity depends on a theoretical basis that a survey is assessing all domains of a certain criterion (for example, does assessing addition skills yield a good measure for mathematical skills?), face validity relates to whether a survey appears to be a good measure or not. This judgment is made on the “face” of the survey.

Criterion Validity

Criterion validity involves the correlation between the survey and a criterion variable (or variables) taken as representative of the construct. In other words, criterion validity compares the survey with other measures or outcomes (the criteria) already held to be valid.

For example, employee selection surveys are often validated against measures of job performance (the criterion), and IQ surveys are often validated against measures of academic performance (the criterion).

Concurrent Validity

This refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time.

When the measure is compared to another measure of the same type, both measures will be related (or correlated). For example, this would mean that employee selection surveys are administered to current employees and then correlated with employee scores on performance reviews.

Predictive Validity

Predictive validity refers to the degree to which operationalization can predict (or correlate with) other measures of the same construct that are measured at some time in the future. An example would be administering employee selection surveys to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

Predictive validity is also when your measurement predicts a relationship between what you are measuring and something else, predicting whether or not the other thing will happen in the future. This type of validity is important from a public view standpoint: Is this going to look acceptable to the public or not?

Statistical Conclusion Validity

This is the degree to which conclusions about the relationship among variables based on the data are correct or “reasonable.” Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical surveys, and reliable measurement procedures.

As this type of validity is only concerned with the relationship found among variables, the relationship may be solely a correlation.

Internal Validity

Eight kinds of confounding variables can hinder internal validity:

History—the specific events occurring between the first and second 1. measurements in addition to the experimental variables
Maturation—processes within the participants as a function of the passage of time (not specific to particular events) like growing older, hungrier, or more tired
Surveying—how taking a survey affects the scores of a second surveying
Instrumentation—changes in measurement tool calibration or in the observers or scorers might produce changes in the obtained measurements
Statistical regression—operating where groups have been selected on the basis of their extreme scores
Selection—biases resulting from differential selection of respondents for the comparison groups
Experimental mortality—differential loss of respondents from the comparison groups
Selection-maturation interaction—such as in multiple-group quasi-experimental designs

External Validity

External validity refers to the extent to which the (internally valid) results of a study can be true for other cases, such as for different people, places, or times. If the same research study was conducted in those other cases, would it get the same results?

A major concern is whether or not the study sample (e.g. the research participants) are representative of the general population along relevant dimensions. Other factors that can jeopardize external validity are:

Reactive or interaction effect of surveying—a pre-survey might increase the scores on a post-survey
Interaction effects of selection biases and the experimental variable
Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings
Multiple-treatment interference—where effects of earlier treatments are not erasable

Ecological Validity

Ecological validity is closely related to external validity, but it questions to what degree research results can be applied to real-life situations (outside of research settings) and how much experimental findings mirror what can be observed in the real world. To be ecologically valid, the methods, materials, and setting of a study must approximate the real-life situation that is under investigation.

Ecological validity is partly related to the issue of experiment versus observation. In science, there are typically two domains of research: observational (passive) and experimental (active).

The purpose of experimental designs is to survey causality so that you can infer A causes B or B causes A. But, sometimes, ethical and/or methodological restrictions prevent you from conducting an experiment (for example, how does isolation influence a child’s cognitive functioning?). Then, you can still do research, but it’s not causal—it’s correlational. You can only conclude that A occurs together with B. Both techniques have their strengths and weaknesses.

Internal vs External Validity

Internal and external validity seem to contradict each other at first glance. To get an experimental design, you have to control for all interfering variables. That’s why you often conduct your experiment in a laboratory setting. While gaining internal validity (excluding interfering variables by keeping them constant), you lose ecological or external validity because you establish an artificial laboratory setting.

On the other hand, with observational research, you can’t control for interfering variables (low internal validity), but you can measure in the natural (ecological) environment at the place where behavior normally occurs. However, in doing so, you sacrifice internal validity.

However, the apparent contradiction of internal validity and external validity is superficial. If the goal of a study is to deductively survey a theory, the only concern is factors that might undermine the rigor of the study, such as threats to internal validity.