Pre-Pilot Phase |
Data for the pre-pilot phase was collected from November to December 2013. The sample (N = 278) for this study was collected to approximate the population of the U.S. with regard to gender, geographic region, race/ethnicity, and parental education level (PEL). Individuals were combined into age groups, ranging in age from 2 years 6 months to 22 years 11 months. Only native English speakers were eligible to participate in the pre-pilot study. Based on detailed demographic information provided by the examinees (or parents/guardians), the following criteria were used to determine whether an individual was a native English speaker: a) country of birth was an English-speaking country (e.g., U.S., Canada, Australia, South Africa); b) parents’ country of birth was also an English-speaking country; c) first language was English, and d) language spoken at home was English.
Examiners oversaw the testing process to ensure standardized presentation of the instructions and to assist examinees with the technology involved (e.g., helping with response input via mouse or touchscreen, or replaying audio stimuli). Eight very easy items were created as practice items to ensure the task instructions were clearly understood by the examinees. When the test portion began, the target audio stimuli played and the examinee selected the image that best matched the word they heard. Examiners provided assistance where needed in terms of navigating the screen, mouse, or Speaker button, but they did not provide assistance with item content. Responses were captured automatically by the software. In order to maximize the number of responses for each item during the pre-pilot phase, all examinees aged 2 years 11 months to 12 years 11 months started at the easiest item (Test Item 1). To prevent test fatigue, examinees aged 13 years and older began at Test Item 82 (i.e., skipping 17% of all possible items). A generous ceiling rule was employed to determine the stopping point; the test automatically terminated when 20 consecutive incorrect responses were recorded. However, examinees in the oldest age group (17 years 0 months to 22 years 11 months) completed all the items from Test Item 82 to the last item on the test (Test Item 474), regardless of their performance on the test; this was done to ensure sufficient data were collected on challenging items appearing near the end of the pre-pilot version of the assessment.
The main purpose of the analyses during the pre-pilot phase was to examine item functioning, refine the item pool, and empirically order the items. The following criteria, based on Classical Test Theory (CTT), were evaluated for each item: proportion of correct responses by age group, gender, PEL, and region, the frequency with which each distractor was chosen, item-total correlations, response time, and qualitative feedback from test examiners. For example, when examining the frequency with which each distractor was chosen, distractors that were ignored by most examinees were noted; these were likely to be too obvious as an incorrect answer choice for most examinees, regardless of their ability level. On the other hand, distractors that were chosen at an equal or higher frequency compared to the target image might have been too similar to the target image, rendering the item more challenging than intended.
In addition, a panel of subject-matter experts with extensive knowledge in cultural and linguistic issues (including two speech-language pathology professors, two school psychology professors, and one school psychologist; see Acknowledgments) reviewed the items for clarity of depiction, potential for bias, and appropriateness in terms of intended age range and difficulty.
Of the 474 items that entered this phase, 78 were flagged for psychometric or qualitative concerns and were removed from the item pool. Fifteen items that were answered correctly by a very high proportion of the sample were reserved as potential Screener items (to quickly gauge an examinee’s ability), and 381 test items were retained (either as initially designed or with minor revisions to distractor images) for the pilot phase. In conjunction with item difficulty, the proportion of correct responses for each item by age group was used to refine the order of the 381 retained test items from easiest to most difficult (with the 15 Screener items appearing at the very beginning of the test).
<< Item Generation and Development | Pilot Phase >> |