Jae-Chun Ban, ACT, Inc.
Bradley A. Hanson, ACT, Inc.
Tianyou Wang, ACT, Inc.
Qing Yi, ACT, Inc.
Deborah J. Harris, ACT, Inc.
Paper presented at the Annual Meeting of the American Educational Research Association (New Orleans, April, 2000)
Revised: September 22, 2000
Abstract: The purpose of this study was to compare and evaluate five online pretest item calibration/scaling methods in computerized adaptive testing (CAT): the marginal maximum likelihood estimate with one EM cycle (OEM) method, the marginal maximum likelihood estimate with multiple EM cycles (MEM) method, Stocking's Method A, Stocking's Method B, and the BILOG/Prior method. The five methods were evaluated in terms of item parameter recovery under three different sample size conditions (300, 1,000, and 3,000). The MEM method appears to be the best choice among the methods used in this study because it produced the smallest parameter estimation errors for all sample size conditions. Stocking's Method B also worked very well, but it requires anchor items, which would make test lengths longer. The BILOG/Prior method did not seem to work with small sample sizes. Until more appropriate ways of handling the sparse data with BILOG are devised, the BILOG/Prior method may not be a reasonable choice. Because Stocking's Method A has the largest weighted total error as well as a theoretical weakness (i.e., treating estimated ability as true ability), there appears to be little reason to use it. The MEM method should be preferred to the OEM method unless amount of time involved in iterative computation is a great concern. Otherwise, the OEM method and the MEM method are mathematically similar, and the OEM method produces larger errors than the MEM method.
Download paper in PDF format (92 KB). Version 4.0 or later of Adobe Acrobat Reader (which is available for free) is needed to view this paper.
Return to Brad Hanson's Home Page
Return to OpenIRT Home Page
Last updated: November 16, 2014.