Alpine staff introduced a new collusion detection method at the 8th Annual Conference on Test Security in Miami, Florida, held October 16 – 18, 2019.  Drs. Brett Foley and Russell Smith, Alpine’s Director of Professional Credentialing and Vice President of Assessment Services, respectively, presented on the method in two complementary sessions.

Dr. Foley gave a presentation entitled, “Collusion Detection Using an Extension of Yen’s Q3 Statistic.” The presentation began with a description of Foley’s logical and theoretical development of the new statistic, known as B3. The statistic can be used in concert with traditional estimates of candidate ability and item difficulty (e.g., using the Rasch model). He noted that, “Collusion is a security problem where examinees work together to gain an unfair advantage, either by copying from one another or sharing exam content in advance.  The B3 statistic works to detect collusion by identifying situations when two or more examinees perform more similarly than one would expect after accounting for their ability and the difficulty.”

Dr. Foley went on to discuss a preliminary analysis where the B3 statistic was used to independently identify known cheaters in a set of real credentialing exam data. Results of this analysis indicated that B3 shows promise as a new option for psychometricians running forensic data analyses.

Dr. Smith continued the discussion of the efficacy of the new method in a presentation titled, “Comparing B3 to Answer Similarity Index [ASI] for Detecting Collusion.” Several analyses of real and simulated candidate response data were used to evaluate how B3 functions in practice, with a special emphasis on how it compares to an existing collusion detection metric: the ASI. Smith noted, “B3 appears to go beyond ASI, in that it takes into account unusual performance on specific items, rather than just a simple count of matches across items.”

Together, the presenters found that B3 has several appealing features. Specifically, B3

  • Is based on a well-known and widely-recognized statistic.
  • Accounts for both examinee ability and item difficulty in its calculation.
  • Is easily interpretable.
  • Works with both dichotomous and polytomous items (or any mix thereof).
  • Is already available in commercially available IRT software (e.g., WINSTEPS).

It was noted by both presenters that while the method appears to be hold promise, more work is still needed to identify flagging thresholds. That is, more research needs to be done in identifying the specific values of B3 that would warrant flagging and/or taking action against a candidate.

For more information about these presentations, including copies of the slides, or to learn about how forensic security analyses can contribute to the validity of your testing program, please contact us.