The current revision of the Standards for Educational and Psychological Testing is the third version of the Standards and, like its predecessors, it is the collaborative effort of three prominent national associations Page 947  |  Top of Article interested in educational and psychological tests: the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). The first edition of the Standards appeared in 1974. It replaced a document published by APA in 1966 and prepared by a committee representing the APA, AERA, and NCME titled Standards for Educational and Psychological Tests and Manuals. The 1974 Standards was revised in 1985.

To identify revisions needed for the Standards, a rigorous and diligent effort to obtain input from the broad measurement community was undertaken. In 1993, the presidents of APA, AERA, and NCME appointed a 15-member Joint Committee with diverse backgrounds in testing and assessment in a variety of settings. The Joint Committee completed the revision during 6 years. Three extended periods of public review and comment provided the Joint Committee with more than 8,000 pages of comments from upwards of 200 organizations and individuals. The Joint Committee considered all this input and developed a draft document. An extensive legal review of this draft was then conducted to explore potential liability issues and to ensure compliance with existing federal law. The revised Standards represents a consensus of the Committee and has the endorsement of each of its three sponsoring organizations.

Purpose of the Standards

The intent of the third edition of the Standards is to promote sound and ethical use of tests and to provide a basis for evaluating the quality of testing practices by providing a frame of reference to assure that all relevant issues are addressed. Like its predecessors, the third edition attempts to reflect professional consensus regarding expectations for the development, validation, and use of educational and psychological tests. The Standards is intended to speak broadly to individuals (e.g., students, parents, teachers, administrators, job applicants, employees, clients, patients, supervisors, executives, and evaluators, among others), institutions (e.g., schools, colleges, businesses, industry, clinics, and government agencies), and society as a whole about tests and testing. The Standards can be used to help test publishers decide how to develop, validate, and present tests. The Standards can also be used by test users (those who administer tests) to select, use, and evaluate tests. The Standards does not attempt to provide psychometric answers to public policy issues that involve testing. Instead, the Standards encourages making relevant technical information about tests and testing available so that those involved in policy development may be fully informed.

Organization of the Standards

The current revision of the Standards contains three parts. Part I addresses test construction, evaluation, and documentation; Part II discusses fairness in testing; and Part III covers testing applications. The standards that apply to the development of tests and those that are of interest to test publishers appear primarily in Part I. The standards identified in Part II and Part III apply primarily, but not exclusively, to test users.

Part I includes the following chapters:

  1. Validity
  2. Reliability and Errors of Measurement
  3. Test Development and Revision
  4. Scales, Norms, and Score Comparability
  5. Test Administration, Scoring, and Reporting
  6. Supporting Documentation for Tests

    Part II includes the following chapters:

  7. Fairness in Testing and Test Use
  8. The Rights and Responsibilities of Test Takers
  9. Testing Individuals of Diverse Linguistic Backgrounds
  10. Testing Individuals with Disabilities

    Part III includes the following chapters:

  11. The Responsibilities of Test Users
  12. Psychological Testing and Assessment
  13. Educational Testing and Assessment
  14. Testing in Employment and Credentialing
  15. Testing in Program Evaluation and Public Policy

Each chapter begins with contextual background intended to facilitate interpretation and application of the standards in that chapter. An index and a glossary that defines terms as they are used in the Standards also are provided.

Major Differences Between the Second and Third Editions of the Standards

The overall number of standards has increased from the 1985 edition for three reasons. First, new types of tests and uses for tests evolved after the 1985 revision. Several of the new standards apply only to these new developments, unlike the broad applicability that is characteristic of many of the continuing standards. Second, some standards are repeated, with context-relevant wording changes, in multiple chapters to accommodate users who refer only to those chapters that have direct relevance to their particular setting or purpose for testing. The wording changes enable standards to align with the content of the chapter. Third, standards addressing issues such as conflict of interest and equitable treatment of test takers have been added. According to the Standards, "The increase in the number of standards does not per se signal an increase in the obligations placed on test developers and test users."

The 1985 Standards categorized each standard as either primary (to be met by all tests before they are used), secondary (desirable but not feasible in all situations), or conditional (importance varies with application). This distinction was eliminated for the third edition of the Standards because it was recognized that the various standards are not absolutes. The applicability of the standards can vary in relevance and importance based on the intended use(s) of a test and the role of the person(s) participating in the testing process (e.g., test taker, test administrator, test developer, test marketer, and those who make decisions based on test results). The third edition also clarifies that some standards are broad and encompassing in their applicability and others are narrower in scope. Therefore, the standards should not be applied in a literal, rigid, or cookbook fashion. Instead, whether a test or use is judged acceptable may depend on several interrelated and interacting factors. These can include (a) the extent to which the test developer has met relevant standards, (b) the degree to which the test user has met relevant standards, (c) the availability of alternative, equally promising measures, (d) the extent of empirical research that supports the intended use of the measure, and (e) professional judgment. The Standards advises that before a test is operationally used for a particular purpose, "each standard should be carefully considered to determine its applicability to the testing context under consideration."

Each chapter in the third edition contains more introductory material than did the second edition. The purpose of this additional information is to provide background for the standards specific to the chapter so that users can more easily interpret and apply the standards. The language, although prescriptive at times, "should not be interpreted as imposing additional standards."

The third edition defines and clarifies several important terms. For example, the term test is defined as an "evaluative device or procedure in which a sample of an examinee's behavior in a specified domain is obtained and subsequently evaluated and scored using a standardized process." Thus the term test is broad, including both instruments that are evaluated for quality or correctness and instruments that are measures of attitudes, interests, traits, and dispositions, often referred to as scales or inventories rather than tests. Assessment is considered to be a broader concept than testing, but testing is part of assessment. Assessment refers to a process that integrates test information with background and contextual information, whereas testing refers to the results obtained from a specific instrument or instruments.

The new Standards broadens the meaning of the term construct. In previous editions, construct meant unobservable characteristics that must be inferred from multiple, related observations, a definition that Page 949  |  Top of Article proved confusing and controversial. The third edition broadens the term to mean the "concept or characteristic that a test is designed to measure." This change requires test professionals to specify the interpretation of the construct that will be made on the basis of a score or pattern of scores. This change also reflects a shift in the third edition of the Standards from discussing types of validity to discussing various lines of validity evidence that serve to enhance interpretation of a score relative to the construct the test is designed to measure.

Lines of Validity Evidence

Formerly, the Standards described three types of validity that a test may demonstrate: content, criterion-related (predictive and concurrent), and construct. The new Standards considers validity to be the extent to which multiple lines of evidence and theory "support the interpretations of test scores entailed by the proposed uses of tests." According to the new Standards, then, validity is a function of the extent to which theory and empirical evidence support the assumption that a test score reflects the construct the test purports to measure. Five sources of such evidence of validity are identified in the S tandards: test content, response processes, internal structure, relations to other variables, and consequences of testing.

Of these five sources of validity, three reflect historical conceptions of validity that have appeared in previous editions of the Standards. These are test content (equivalent to content validity), internal structure (equivalent to construct validity), and relations to other variables (equivalent to criterion-related validity). The two new sources of validity evidence are response process and consequences of testing. Evidence of response process validity indicates whether the test taker used the processes intended by the test developer, rather than an unintended process, to respond to a problem. For example, in responding to a mathematics problem, did the test taker apply the intended cognitive-mathematical process or an alternative process, such as guessing? Validity evidence related to the consequences of testing emerges when the outcome and interpretation of the test fulfill the claims made for the test. For example, if a measure purports to predict those who will benefit from a particular psychological treatment, how well does it actually do so?

A number of advances and developments have occurred in testing since the Standards was released in 1999. Thus, this edition of the Standards should be considered a work in progress. AERA, APA, and NCME have already begun the revision process for the fourth edition. Although an exact publication date cannot be determined, it is expected to come early in the next decade.

Thomas Kubiszyn

