227 / 2018-04-14 08:37:01
Improving language scales through alignment studies: the case of the China’s Standards of English
China's Standards of English, validation, standard setting
The field of language testing has reached a consensus view that test validation is an on-going process. Validation of a language scale, similarly, should also be a long-term, continuous process. With the imminent publication of the China’s Standards of English (CSE), there is concern that the implementation of this national framework of reference for English language education, the first of its kind in China, will raise doubts about its authority and meet resistance at macro-and micro-political levels (Jin, Wu, Alderson, Song, 2017). Validation studies have therefore been planned to improve stakeholders’ understanding of the CSE and more importantly, identify problems encountered in the use of the scales. One of such efforts is to empirically validate and improve the CSE through alignment studies, that is, linking curricula and assessments to its scales. In this presentation, we report a study to link the speaking test of the CET Band 4 (CET-SET4), a high stakes English language test, to the Speaking Scales of the CSE (CSE-SS).
Familiarization tasks and the procedure of alignment were developed and piloted among three CET-SET4 expert raters. The first phase of the main study involves collecting evidence through expert judgment. Panel members consisted of 15 experienced English teachers, CSE developers, or CET-SET4 raters. The test-centered, compound cumulative method was used in the first session of standardization session where judges placed each speaking task to a specific CSE-SS level. Operational test data (e.g., task difficulty) were incorporated in the process to facilitate judges’ understanding of the test. The examinee-centered, contrasting-groups method was then employed in the second session of standardization where judges rated examinees’ performances and linked them to the CSE-SS scales. Cut-scores were decided and discussed with the panel. Following each session, we reflected on the standard-setting process, analyzed the data, and checked the quality of the linking. In the second phase of the study, to triangulate the cut-off scores derived in the first phase of the study, further evidence was collected through the judgments of examinees (n=30), who were trained to use CSE descriptors to rate their own speaking performances. The self-assessment data were analyzed and calibrated onto the same latent ability scale.
The results of the alignment study showed that the exact agreement among judges about the level of CSE-SS descriptors was moderate (0.57), but their adjacent-level agreement was very high (0.99). Six descriptors that the judges had agreement problems were identified and flagged for further improvement. The standard-setting results indicated that the CET-SET4 measures speaking performances at the CSE-SS levels 4~6. The logistic regression analysis resulted in two cut-off scores, 13.56 for CSE-SS level 4/5 and 17.35 for CSE-SS level 5/6. Examinees’ self-assessment demonstrated the majority of them were able to use the descriptors to assess their speaking proficiency, but there were 20% over- or under-rate cases, mostly by the bottom or top-level examinees. The study concluded that the salient features of CSE-SS levels need to be further highlighted, and fine-tuning of some descriptors is necessary to improve the accessibility and applicability of CSE.
  • 会议日期





  • 03月31日 2018


  • 04月28日 2018


  • 06月01日 2018


  • 10月20日 2018

