Pilot Phase |
Data for the pilot phase were collected from September 2014 to January 2015. General population data were collected for the English Speaker (N = 861) and English Learner (N = 513) samples. The individuals in these samples ranged in age from 2 years 6 months to 22 years 11 months, split equally by gender. The English Speaker sample was further stratified by geographic region, PEL, and race/ethnicity, structured to closely match the proportions observed in the 2010 Census (United States Census Bureau, 2010). The English Learner sample was stratified by geographic region, PEL, language spoken (corresponding to proportions observed in the 2010 Census for individuals who did not indicate they spoke only English at home), and length of exposure to English. Individuals were included in the English Speaker sample if both the individual and their parents were born in an English-speaking country and if the individual had only ever been exposed to English (e.g., first language was English, primary language of instruction was always English, and parents and siblings always spoke English to the examinee). Individuals with exposure to a language other than English, either at home or at school, were included in the English Learner sample.
A sample of individuals with a clinical diagnosis (N = 89) was also collected during this phase to assess the test’s ability to differentiate between various clinical groups, given that some of the collected groups were expected to show impairment related to receptive vocabulary. Data were gathered from individuals diagnosed with a Language Disorder (n = 16), Language Delay (n = 39), Speech-Sound Communication Disorder (n = 10), Autism Spectrum Disorder (ASD; n = 6), and other disorders (including Intellectual Disability [ID], Attention-Deficit/Hyperactivity Disorder [ADHD], and learning disabilities; n = 18), with diagnostic information verified by a registered clinician.
The pilot version consisted of 396 items (15 Screener items and 381 test items). Items were divided into item sets of approximately 25 items per set. Examinees began at an age-appropriate set of test items and proceeded through the test, attempting full sets of items with increasing difficulty until they reached their ceiling point (i.e., failed to correctly answer at least 50% of the items within a set).
Items were analyzed with data from the English Speaker, English Learner, and clinical samples. The following criteria, based on CTT and Item Response Theory (IRT) analyses, were examined for each item: the proportion of correct responses (by age group, gender, region, PEL and race/ethnicity for English speakers or language spoken for the English learners), response distribution and distractor analysis, mean group differences, discrimination and difficulty parameters from the IRT 2-parameter logistic (2PL) model (in which the third parameter, guessing, was fixed to 0.20), and response time. Qualitative feedback about visual or audio stimuli from test examiners was also considered.
As informed by the results, 13 items required revision to a distractor image or target word audio recording. Twenty-three new items were developed to capture the lowest range of word difficulty (e.g., adding more very easy target words such as “milk”) and to ensure balanced item content (e.g., inclusion of more verbs). The 15 items retained from pre-pilot as potential Screener items to serve as a preliminary performance check were analyzed separately and were observed to have a very high proportion of correct responses. Five more extant items with similar properties were added to this set to be tested as Screener items for the standardization version. Fifty-seven items were flagged for psychometric or qualitative concerns and were removed from the item pool. In total, 367 items (comprising 347 test items and 20 Screener items) were selected for the standardization phase. Difficulty parameters from the IRT analyses, in conjunction with an examination of the proportion of correct responses by age group, informed the adjusted item order for the standardization version of the test.
<< Pre-Pilot Phase | Standardization Phase >> |