Pictures are favored for providing the context both in language classroom instructions and assessments (Field, 2013). However when pictorial tasks are designed to track students’ progress, parallel tasks are suggested (Stansfield & Ross, 1988; Brown et. al, 1991; Weir & Wu, 2006) because it is important to avoid repeated use of the same task which exerts unwanted leakage, construct-irrelevant variance (e.g. Raymond et al., 2007), and practice effect (Cliffordson, 2004; Hausknecht et al., 2007). Although some studies examined the parallel picture set tasks in writing prompts (e.g. Bae, 2000; Bar& Lee, 2011; Pena et al., 2006) based on student’s products, few studies on the equivalence of picture-based listening tasks are currently available.
This research is to address the research gap. It documents evidences of equivalence of two picture-based listening tasks which were developed in alignment with National English Curriculum Standards Level 4 in China. The purpose is to ensure that these two tasks measure the same construct, yielding equivalent results and score interpretations, as well as to establish a sound process to validate the equivalence among picture-based listening tasks across years in tracking Chinese eighth graders’ progress.
It employed a stakeholders-oriented approach by probing test writers, Grade eight teachers form different language discipline background and the test takers of different language proficiency levels(High/medium/low) to evaluate two listening tasks featured themes of “farm” and “park”. In the first phase, nine stakeholders (three of each group) reviewed the test content in terms of the tasks’ characteristics on a 1-6 Likert scale. In the second phase, a second group of 9 stakeholders sat the tests, and their stimulated retrospective recall data of the test taking process were collected. Results showed different groups of stakeholders paid attention to different aspects of the task characteristics. The in-depth comparison of cognitive processes showed participants went through comprehensive layers of thinking process with similar stages at similar expert proficiency level. This study demonstrates the importance to involve different groups of stakeholders when examine task equivalency of picture sets in standards-based listening assessment. It is necessary to collect evidences through tapping into comprehensive layers of cognitive processes to evaluate task equivalency. The approach employed contributes to the establishment of methods to examine the parallel task forms at large.