test development

Test Development

A well-designed, properly documented, and thoroughly evaluated testing program will not only yield more reliable results, but will protect you in the event of legal challenge or litigation. The Alpine Testing Solutions psychometricians, test development consultants, and technology experts work closely with you and your team to create tests and testing systems that maximize validity, fairness, reliability, legal defensibility and security.

Our services can be placed into five major categories: Program Design, Test Design, Item/Performance-Task Development, Psychometrics and Standard Setting, and Test Maintenance. We offer a la carte services within all five categories and customized, end-to-end solutions that will allow you to defend the validity of the interpretation and use of test scores, maximize the test reliability, and minimize bias.

Our methodologies and solutions are grounded in industry standards and best practices. We have helped numerous clients seeking accreditation from ANSI, NCCA, and other accrediting bodies.

Contact us today to learn how we can help with your testing program.

Program Design

If you are creating a new testing program or expanding an existing program, Alpine Testing Solutions can guide you through the program design process.

Through program definition focus groups, surveys, and one-on-one conversations, our experts will work with you, your team, and stakeholders so that the program will meet the business goals and the audience and stakeholder needs.

Test Design

If you are creating a new test or revising an existing test, we will work with you, your team, and your subject-matter experts (SMEs) to define the purpose and use of the test, delineate the target domain(s), define the practice and competencies of the domain(s), and prepare the test specifications and blueprint.

Test Validation Plan

The test validation process is the accumulation of evidence to support the inferences and interpretations made from test scores. The validation plan begins with an explicit statement of the proposed interpretation of test scores and a rationale for the proposed use. The validity of test score interpretations relates to the underlying construct that the test measures. This construct could be, for example, competence in a profession, mathematics achievement, or reading ability. The validation plan generates a list of inferences that are required to move from the construct to the scores and from the scores to the test use.

Test Definition

The test definition documents information pertinent to the intended use and interpretation of test scores, the intended audiences and stakeholders, and critical functional parameters.

Job-task/Practice/Domain Analysis

These three related services are all forms of a systematic method for the collection and subsequent definition of job responsibilities and associated tasks, competencies, and knowledge. There are several variants of this type of work. Our psychometricians and test development professionals effectively select and use these methods as we consult with you to determine the best approach to meet your particular situation.

Performance Task Clusters and Work Models

Job tasks that are represented in realistic, functional contexts are termed performance task clusters or work models. A performance task cluster or work model is defined as a representation of the essential, characteristic, and integrated performance situations in a domain of expertise that well qualified individuals are able to perform to a high standard. These performance task clusters or work models serve to define, focus, and manage assessment development activities. Performance work models are critical when creating performance oriented tests.

Blueprint and Functional Specifications

The test blueprint can be likened to the blueprint of a house. The test blueprint indicates the proposed structure and definition of the test. The test blueprint should indicate the content or topic areas that will be included on the exam, any required levels of cognitive demand or processing level for each topic area, the number of items that should be included by topic and cognitive level, and any relevant item specifications by topic and cognitive level. The item specifications illustrate sample types of test questions that will be included on the exam and the associated scoring and reporting procedures.

Functional specifications are specific to computer-based tests and define the display screens and delivery functions (i.e. skip items, mark items for review, review screen, etc.).

Item Development

Once a sound blueprint is in place, it is time to develop the test items and/or performance tasks. Our professionals will train your subject matter experts to convert their experience and knowledge into well-designed, straightforward test items and/or performance tasks that match the test purpose and content specifications.

Item writing training and performance task creation

We specialize in training subject matter experts (SMEs) to create fair and defensible items and performance tasks. Our psychometricians and test development professionals use a unique systematic approach that enables SMEs to write test items and performance tasks that meet the test blueprint requirements, the item specification requirements, sound psychometric principles of item structure, and defined scoring rules. Training is customized to fit specific requirements such as in-person or distributed delivery, item types, performance tasks, and bank size requirements.

Item editing

After the items and/or performance tasks have been drafted, we perform a psychometric and language edit. Our trained professionals will edit the items for grammar, sensitivity, and compliance to psychometric and style guidelines.

Congruence, Accuracy, Alignment, and Bias Reviews

We will facilitate a panel of your SME's to review the items for technical accuracy, scoring accuracy, and congruence to the test blueprint. The panel review can be conducted in-person at a location of your choosing or distributed via an online session.

In alignment studies we facilitate a panel of your SMEs to take a pool of items and match each one back to the test blueprint. The panel review can be conducted in-person at a location of your choosing or distributed via an online session.

In all reviews Alpine Testing Solutions' trained professionals review the items with input from SMEs for potential bias. After all other reviews, our trained professionals will perform a final edit for grammar and style – exercising great care not to change the technical meaning of any item or task.

Performance task development

Performance tests are designed to measure an examinee's ability to perform real-world tasks. A performance test typically utilizes a testing environment that is representative of the actual job environment, or at least allows the examinee to perform a task in a fashion that is representative of the actual job environment.

There is more to building a performance test than simply creating and delivering a performance environment. While performance tests can provide great face validity, due diligence must be taken with all aspects of test design, validation, and psychometrics or the test sponsor will likely not reap the benefits promised by performance testing.

Alpine Testing Solutions has the expertise to help your organization make good decisions around the design, development, and maintenance of a performance test.

Some examples of the value that Alpine Testing Solutions can provide when developing a performance test include:

Task Clusters. An important output of the job/domain analysis are task clusters (also known as work models). A task cluster is a synthesis of knowledge, skills, and judgments that are implemented holistically within a relevant, real-world context. The task clusters are the foundation and driving force behind the objectives, content weighting, performance scenario specifications, and scoring rubrics. If a performance test is designed from the decomposed knowledge, skills, and judgments that are normal outputs of a job/domain analysis rather than from task clusters, there is a good chance that the performance test is providing no more information about the test-taker than could be obtained from a traditional – and much less expensive – multiple-choice test.