Assignment 8
SUMMARY BEYOND TEST: ALTERNATIVES IN ASSESSMENT
In the public eye, tests have acquired an aura of infallibility in our culture of mass producing everything, including the education of school children. Everyone wants a test for everything, especially if the test is, cheap, quickly administered, and scored instantaneously. But we saw in Chapter 4 that while the standardized test industry has become a powerful juggernaut of influence on decisions about people's lives, it also has come under severe criticism from the public (Kahn, 2000).A more balanced viewpoint is offered by Bailey (1998, p. 204): “One of the disturbing things about tests is the extent to which many people accept the results uncritically, while others believe that all testing is invidious. But tests are simply measurement tools: It is the use to which we put their results that can be appropriate or inappropriate.” It is clear by now that tests are one of a number of possible types of assessment.
Assessment connotes a much broader concept in that most of the time when teachers are teaching, they are also assessing. Assessment includes all occasions from informal impromptu observations and comments up to and including tests. Early in the decade of the 1990s, in a culture of rebellion against the notion that all people and all skills could be measured by traditional tests, a novel concept emerged that began to be labeled “alternative” assessment. As teachers and students were becoming aware of the shortcomings of standardized tests, “an alternative to standardized testing and all the problems found with such testing” (Huerta-Macias, 1995, p. 8) was proposed.
SUMMARY BEYOND TEST: ALTERNATIVES IN ASSESSMENT
In the public eye, tests have acquired an aura of infallibility in our culture of mass producing everything, including the education of school children. Everyone wants a test for everything, especially if the test is, cheap, quickly administered, and scored instantaneously. But we saw in Chapter 4 that while the standardized test industry has become a powerful juggernaut of influence on decisions about people's lives, it also has come under severe criticism from the public (Kahn, 2000).A more balanced viewpoint is offered by Bailey (1998, p. 204): “One of the disturbing things about tests is the extent to which many people accept the results uncritically, while others believe that all testing is invidious. But tests are simply measurement tools: It is the use to which we put their results that can be appropriate or inappropriate.” It is clear by now that tests are one of a number of possible types of assessment.
Assessment connotes a much broader concept in that most of the time when teachers are teaching, they are also assessing. Assessment includes all occasions from informal impromptu observations and comments up to and including tests. Early in the decade of the 1990s, in a culture of rebellion against the notion that all people and all skills could be measured by traditional tests, a novel concept emerged that began to be labeled “alternative” assessment. As teachers and students were becoming aware of the shortcomings of standardized tests, “an alternative to standardized testing and all the problems found with such testing” (Huerta-Macias, 1995, p. 8) was proposed.
The defining characteristics of the various alternatives in assessment that have been commonly used across the profession were aptly summed up by Brown and Hudson (1998, pp. 654-655). Alternatives in assessments.
1. require students to perform, create, produce, or do something.
2. use real-world contexts or Simulations.
3. are nonintrusive in that they extend the day-to-day classroom activities.
4. allow students to be assessed on what they normap, y; do in class every day.
5. use tasks that represent meaningful instructional activities.
6. focus on processes as well as products.
7. tap into higher-level thinking and problem-solving skills.
8. provide information about both the strengths and weaknesses of students.
9. are multiculturally sensitive whenpropedy administered.
10. ensure that people, not machines, do the scoring, using human judgment.
11. encourage open disclosure of standards and rating criteria. and
12. call upon teachers to perform new instructional and assessment roles.
THE DILEMMA OF MAXIMIZING BOTH PRACTICALITY AND WASHBACK
The principal purpose of this chapter is to examine some of the alternatives in assessment that are markedly different from formal tests. Tests, especially largescale standardized tests, tend to be oneshot performances that are timed, multiple choice, decontextualized, norm referenced, and that foster extrinsic motivation. On the other hand, tasks like portfolios, journals, and selfassessment are.
- open-ended in their time orientation and format,
- contextualized to a curriculum,
- referenced to the criteria (objectives) of that curriculum, and
- likely to build intrinsic motivation.
One way of looking at this contrast poses a challenge to you as a teacher and test designer. Formal standardized tests are almost by definition highly practical, reliable instruments. They are designed to minimize time and money on the part of test designer and test-taker, and to be painstakingly accurate in their scoring. Alternatives such as portfolios, or conferencing with students on drafts of written work, or observations of learners over time all require considerable time and effort on the part of the teacher and the student. Even more time must be spent if the teacher hopes to offer a reliable evaluation within students across time, as well as across students (taking care not to favor one student or group of students). But the alternative techniques also offer markedly greater washback, are superior formative measures, and, because of their authenticity, usually carry greater face validity.
PERFORMANCE-BASED ASSESSMENT
Before proceeding to a direct consideration of types of alternatives in assessment, a word about performance based assessment is in order. There has been a great deal of press in recent years about performance based assessment, sometimes merely called performance assessment (Shohmy, 1995; Norris et al., 1998). Performance-based assessment implies productive, observable skills, such as speaking and writing, of content-valid tasks. Such performance usually, but not always, brings with it an air of authenticity-real-world tasks that students have had time to develop. It often implies an integration of language skills, perhaps all four skills in the case of project work. Because the tasks that students perform are consistent with course goals and curriculum, students and teachers are likely to be more motivated to perform them, as opposed to a set of multiple-choice questions about facts and figures regarding the solar system.
O’Malley and Valdez Pierce (1996) considered performance-based assessment to be a subset of authentic assessment. In other words, not all authentic assessment is performance based. One could infer that reading, listening, and thinking have many authentic manifestations, but since they are not directly observable in and of themselves, they are not performance based. According to O’Malley and Valdez Pierce (p. 5), the following are characteristics of performance assessment:
1. Students make a constructed response.
2. They engage in bigber order thinking, with open-ended tasks.
3. Tasks are meaningfulengaging, and authentic.
4. Tasks call for the integration of language skills.
5. Both process and product are assessed.
6. Depth of a student's mastery is emphasized over breadth.
Performance-based assessment needs to be approached with caution. It is tempting for teachers to assume that if a student is doing something, then the process hasfulfilled its own goal and the evaluator-needs only to make a mark in the grade book that says “accomplished» next to a particular competency. In reality, performances as assessment procedures need to be treated with the same rigor as traditional tests. This implies that teachers should.
- state the overall goal of the performance,
- specify the objectives (criteria) of the performance in detail,
- prepare students for performance in stepwise progressions,
- use a reliable evaluation form, checklist; or rating sheet,
- treat performances as opportunities for giving feedback and provide that feedback systematically, and
- if possible, utilize self- and peer-assessments judiciously.
PORTFOLIOS
One of the most popular alternatives in assessment, especially within a framework of communicative language teaching, is portfolio development. According to Genesee and Upshur (1996), a portfolio is "a purposeful collection of students' work that demonstrates...their efforts, progress, and achievements in given areas" (p. 99). Portfolios include materials such as
- essays and compositions in draft and fmal forms;
- reports, project outlines;
- poetry and creative prose;
- artwork, photos, newspaper or magazine clippings;
- audio and/or video recordings of presentations, demonstrations, etc.;
- journals, diaries, and other personal reflections;
- tests, test scores, and written homework exercises;
- notes on lectures; and
- self and peer-assessments comments, evaluations, and checklists.
Gottlieb (1995) suggested a developmental scheme for considering the nature and purpose of portfolios, using the acronym CRADLE to designate six possible attributes of a portfolio:
Collecting
Reflecting
Assessing
Documenting
linking
Evaluating
The advantages of engaging students in portfolio development have been extolled in a number of sources (Genesee & Upshur, 1996; O'Malley &Valdez Pierce, 1996; Brown & Hudson, 1998; Weigle, 2002). A synthesis of those characteristics gives us a number of potential benefits. Portfolios
- foster intrinsic motivation, responsibility, and ownership,
- promote student-teacher interaction with the teacher as facilitator,
- individualize learning and celebrate the uniqueness of each student,
- provide tangible evidence of a student's work,
- facilitate Critical thinking, self-assessment, and revision processes,
- offer opportunities for collaborative work with peers, and
- permit assessment of multiple dimensions of language learning.
JOURNALS
A journal is a log (or “account”) of one's thoughts, feelings, reactions, assessments, ideas, or progress toward goals, usually written with little attention to structure, form, or correctness. Learners can articulate their thoughts without the threat of those thoughts being judged later (usually by the teacher). Sometimes journals are rambling sets of verbiage that represent a stream of consciousness with no particular point, purpose, or audience. Fortunately, models of journal use in educational practice have sought to tighten up this style of journal in order to give them some focus (Staton et al., 1987). The result is the emergence of a number of overlapping
categories or purposes in journal writing, such as the following:
- language-learning logs
- grammar journals
- responses to readings
- strategies-based learning logs
- self-assessment reflections
- diaries of attitudes, feelings, and other affective factors
- acculturation logs
Most classroom oriented journals are what have now come to be known as dialogue journals. They imply an interaction between a reader (the teacher) and the student through dialogues or responses. For the best results, those responses should be dispersed across a course at regular intervals, perhaps weekly or biweekly. One of the principal objectives in a student's dialogue journal is to carry on a conversation with the teacher. Through dialogue journals, teachers can become better acquainted with their students, in terms of both their learning progress and their affective states, and thus become better equipped to meet students' individual needs.
It is important to turn the advantages and potential drawbacks of journals into positive general steps and guidelines for using journals as assessment instruments. The following steps are not coincidentally parallel to those cited above for portfolio development:
1. Sensitively introduce students to the concept of journal writing. For many students, especially those from educational systems that play down the, notion of teacher-student dialogue and collaboration, journal writing' will be difficult at first. University-level students, who have passed through a dozen years of product writing, will have particular difficulty with the concept of writing without fear of a teacher's scrutinizing every grammatical or spelling error. With modeling, assurance, and purpose, however, students can make a remarkable transition into the potentially liberating process of journal writing. Students who are shown examples of journal entries and are given specific topics and schedules for writing will become comfortable with the process.
2. State the objective(s) of the journal. Integrate journal writing into the objectives of the curriculum in some way, especially if journal entries become topics of class discussion. The list of types of journals at the beginning of this section may coincide with the following examples of some purposes of journals:
- Language-learning logs.
- Grammar journals.
- Responses to readings.
- Strategies-based learning logs,
- Self-assessment reflections.
- Diaries of attitudes, feelings, and other affective factors.
- Acculturation logs.
3. Give guidelines on what kinds of topics to include. Once the purpose or type of journal is clear, students will benefit from models or suggestions on what kinds of topics to incorporate into their journals.
4. Carefully specify the criteria for assessing or grading journals. Students need to understand the freewriting involved in journals, but at the same time, they need to know assessment criteria. Once you have clarified that journals will not be evaluated for grammatical correctness and rhetorical conventions, state how they will.be evaluated. Usually the purpose of the journal will dictate the major assessment- criterion. Effort as exhibited in the thoroughness of students' entries will-no doubt be important.
5. Provide optimal feedback in your responses McNamara (1998, p. 39) recommended three different kinds of feedback to journals:
- cheerleading feedback, in which you celebrate successes with the students or encourage them to persevere through difficulties,
- instructional feedback, in which you suggest strategies or materials, suggest ways to fme-tune strategy use, or instruct students in their writing, and
- reality-check feedback, in which you help the students set more realistic expectations for their language abilities.
6. Designate appropriate time frames and scbedules for review. Journals, like portofolies, need to be esteemed by students as integral parts of a course. There· fore, it is essential to budgetenough time within a curriculum for both writing journals and for your written responses. Set schedules for submitting journal entries periodically; return them in short order.
7. Provide formative, washbackgiving final comments. Journals, perhaps even more than portfolios, are the most formative of all the alternatives in assessment. They are day-by-day (or at least weekly) chronicles of progress whose purpose is to provide a thread of continuous assessment and reassessment, to recognize mid-stream direction changes, and/or to refocus on goals. Should you reduce a final assessment of such a procedure to a grade or a score? Sonle say yes, some say no (peyton & Reed, 1990), but it appears to be in keeping with the formative nature of journals not to do so.
CONFERENCES AND INTERVIEWS
Reference was made to conferencing as a standard part of the process approach to teaching writing, in which the teacher, in a conversation about a draft, facilitates the improvement of the written work. Such interaction has the advantage of one-on-one interaction between teacher and student, and the teacher's being able to direct feedback toward a student's specific needs. Conferences are not limited to drafts of written work. Including portfolios and journals discussed above, the list of possible functions and subject matter for conferencing is substantial:
- commenting on drafts of essays and reports
- reviewing portfolios
- responding to journals
- advising on a student's plan for an oral presentation
- assessing a proposal for a project
- giving feedback on the results of performance on a test
- clarifying understanding of a reading
- exploring strategies-based options for enhancement or compensation
- focusing on aspects of oral production
- checking a student's self-assessment of a performance
- setting personal goals for the near future
- assessing general progress in a course
Because interviews have multiple objectives, as noted above, it is difficult to generalize principles for conducting them, but the following ,guidelines may help to frame the questions efficiently:
1. Offer an initial atmosphere of warmth and anxiety-lowering (warm-up).
2. Begin with relatively simple questions.
3. Continue with level<heck and probe questions, but adapt to the interviewee
as needed.
4. Frame questions simply and directly.
5. Focus on only one factor for each question. Do not combine several objectives
in the same question.
6. Be prepared to repeat or reframe questions that are not understood.
7. Wind down with friendly and reassuring closing comments.
How do conferences and interviews score in terms of principles of assessment? Their practicality as is true for many of the alternatives to assessment low because they are time consuming. Reliability will vary between conferences and interviews. In the case of conferences, it may not be important to have rater reliability because the whole purpose is to offer individualized attention, which will vary greatly from student to student. For interviews, a relatively high level of reliability should be maintained with careful attention to objectives and procedures. Face validity for both can be maintained at a high level due to their individualized nature. Washback potential and authenticity are high for conferences, but 'possibly only moderate for interviews unless the results of the interview are clearly folded into subsequent learning.
OBSERVATIONS
All teachers, whether they are aware of it or not, observe their students in the classroom almost constantly. Virtually every question, every response, and almost every nonverbal behavior is, at some level of perception, noticed. All those intuitive perceptions are stored as little bits and pieces of information about students that can form a composite impression of a student's ability. Without eyer administering a test or a quiz, teachers know a lot about their students. In fact, experienced teachers are so good at this almost subliminal process of assessment that their estimates of a student's competence are often highly correlated with actual independently administered test scores.
The list could be even more specific to suit the characteristics of students, the focus of a lesson or module, the objectives of a curriculum, and other factors. The list might expand, as well, to include other possible observed performance. In order to carry out classroom observation, it is of course important to take the following steps:
1. Determine the specific objectives of the observation.
2. Decide how many students will be observed at one time.
3. Set up the logistics for making t:mn0ticed observations.
4. Design a system for recording observed performances.
5. Do not overestimate the number of different elements you can observe at one timekeep them very limited.
6. Plan how many observations you will make.
7. Determine specifically how you will use the results.
Designing a system for observing is no simple task. Recording your observations can take the form of anecdotal records, checklists, or rating scales. Anecdotal records should be as specific as possible in focusing on the objective of the observation, but they are so varied in form that to suggest format here would be counterproductive. Their very purpose is more notetaking than recordkeeping. Checklists are a viable alternative for recording observation results. Some checklists of student classroom performance, such as the COLT observation scheme devised by Spada and Frohlich (1995), are elaborate grids referring to such variables as:
- whole-class, group, and individual participation,
- content of the topic,
- linguistic competence (form, function, discourse, sociolinguistic),
- materials being used, and
- skill (listening, speaking, reading, writing),
SELF-AND PEER ASSESSMENTS
A conventional view of language assessment might consider the notion of self and peerassessment as an absurd reversal of politically correct power relationships. Self assessment derives its theoretical justification from a number of well established principles of second language acquisition. The principle of autonomy stand out as one of the primary foundation stones of successful learning. The ability to set one's own goals both within and beyond the structure of a classroom curriculum, to pursue them without the presence of an external prod, and to independently monitor that pursuit are all keys to success. Developing intrinsic motivation that comes from a self-propelled desire to excel is at the top of the list of successful acquisition of any set of skills.
Peer-assessment appeals to similar principles, the most obvious of which is cooperative learning. Many people go through a whole regimen of education from kindergarten up through a graduate degree and never come to appreciate the value of collaboration in learning the benefit of a community of learners capable of teaching each other something. Peerassessment is simply one arm of a plethora of tasks and procedures within the domain of learnercentered and collaborative education.
Researchers (such as Brown & Hudson, 1998) agree that the above theoretical underpinnings of self and peer assessment offer certain benefits: direct involvement of students in their own destiny, the encouragement of autonomy, and increased motivation because of their selfinvolvement. Subjectivity is a primary obstacle, to overcome. Students may be either too harsh on themselves or too self flattering, or they may not have the necessary tools to make an accurate assessment. Also, especially in the case of direct assessments of performance (see below), they may not be able to discern their own errors. In contrast, Bailey (1998) conducted a study in which learners showed moderately high correlations (between .58 and .64) between self- rated oral production ability and scores on the OPI, which suggests that in the assessment of general competence, learners' self-assessments may be more accurate than one might suppose.
Types of Self- and Peer-Assessment
1. Assessment of (a specific) performance.
In this category, a student typically monitors him or herself in either oral or written production and renders some kind of evaluation of performance. The evaluation takes place immediately or very soon after the performance. Thus, having made an oral presentation, the student (or a peer) fills out a checklist that rates performance on a defined scale. Or perhaps the student views a videorecorded lecture and completes a selfcorrected comprehension quiz. A journal may serve a toolfor such selfassessment. Peer editing is an excellent example of direct assessment of a specific performance.
On this and other similar sites, a learner may access a grammar or vocabulary quiz on the Internet and then self score the result, which may be followed by comparing with a partner. Television and film media also offer convenient resources for self and peer assessment. Gardner (1996) recommended that students in non English speaking countries access bilingual news, films, and television programs and then selfassess their comprehension ability. He also noted that video versions of movies with subtitles can be viewed first without the subtitles, then with them, as another form of self-and/or peer-assessment.
2. Indirect assessment of (general) competence.
Indirect self or peer assessment targets larger slices of time with a view to rendering an evaluation of general ability, as opposed to one specific, relatively time constrained performance. The distinction between direct and indirect assessments is the classic competence performance distinction. Self-and peer-assessments of performance are limited in time and focus to a relatively short performance.
3. Metacognitive assessment (for setting goals).
Some kinds of evaluation are more strategic in nature, with the purpose not just of viewing past performance-or competence but of setting goals and maintaining an eye on the process of their pursuit. Personal goal-setting has the advantage of fostering intrinsic motivation and of providing learners with that extra-special impetus from having set and accomplished one's own goals. Strategic planning and self-monitoring can take the form of journal entries, choices from a list of possibilities, questionnaires, or cooperative
(oral) pair or group planning.
4. Socioaffective assessment.
Socioaffective assessment.Yet another type of self and peerassessment comes in the form of methods of examining affective factors in learning. Such assessment is quite different from looking at and planning linguistic aspects of acquisition. It requires looking at oneself through a psychological lens and may not differ greatly from self-assessment across a number of subject-matter areas or for any set of personal skills. When learners resolve to assess and improve motivation, to gauge and lower their own anxiety, to find mental or emotional obstacles to learning and then plan to overcome those barriers, an all-important socioaffective domain is invoked.
5. Student-generated tests
A final type of assessment that is not usually classified strictly as self- or peer-assessment is the technique of engaging students in the process of constructing tests themselves. The traditional view of what a test is would never allow students to engage in test construction, but student-generated tests can be productive, intrinsically motivating, autonomy-building processes.
Gorsuch (1998) found that student-generated quiz items transformed routine weekly quizzes into a collaborative and fulfilling experience. Students in small groups were directed to create content questions on their reading passages and to collectively...choose six vocabulary items for inclusion on the quiz. The process of creating questions and choosing lexical items served as a more powerful reinforcement of the reading than any teacher-designed quiz could ever be.
Murphey (1995), another champion of self- and peer-generated tests, successfully employed the technique of directing students to generate their own lists of words, grammatical concepts, and content that they think are important over the course of a unit. The list is synthesized by Murphey into a list for review, and all items on the test come from the list. Students -thereby have' a; voice in determining the content of tests. On other occasions, Murphey has used what he calls “interne-tive pair tests” in which students assess each other using a set of quiz items.
Guidelines for Self- and Peer-Assessment
Self-and peer-assessment are among the best possible formative types of assessment and possibly the most rewarding, but they must be carefully designed and administered for them to reach their potential. Four guidelines will help teachers bring this intrinsically motivating task into the classroom successfully.
- Tell students the purpose of the assessment.
- Define the task(s) clearly.
- Encourage impartial evaluation of performance or ability.
- Ensure benefictal washback through follow-up tasks.
A Taxonomy of Self and Peer Assessment Tasks
To sum up the possibilities for self and peer assessment, it is helpful to consider a variety of tasks within each of the four skills.
Seff- and peer-assessment tasks
Listening Tasks
listening to TV or radio broadcasts a.nd checking comprehension with a partner
listening to bilingual versions of a broadcast and checking comprehension
asking when you don't understand something in pair or group work
listening to an academic lecture and checking yourself on a “quiz” of the content
setting goals for creating/increasing opportunities for listening.
Speaking Tasks
filling out student self-checklists and questionnaires
using peer checklists and questionnaires
rating someone's oral presentation (holistically)
detecting pronunciation or grammar errors On a self-recording
asking others for confirmation checks in conversational settings
setting goals for creating/increasing opportunities for speaking
Reading Tasks
reading passages with self-check comprehension questions following
reading and checking comprehension with a partner
taking vocabulary quizzes
taking grammar and vocabulary quizzes on the Internet
conducting self-assessment of reading habits
setting goals for creating/increasing opportunities for reading
Writing Tasks
revising written work on your own
revising written work with a peer (peer editing)
proof reading
using journal writing for reflection, assessment, and goal-setting
setting goals for creating/increasing opportunities for writing
An evaluation of self and peer assessment according to our classic principles of assessment yields a pattern that is quite consistent with other alternatives to assessment that have been analyzed in this chapter. Practicality can achieve a moderate level with such procedures as checklists· and questionnaires, while reliability risks remaining at a low level, given the variation within and across learners. Once students accept the notion that they can legitimately assess themselves, then face validity can be raised from what might otherwise be a low level. Adherence to course objectives will maintain a high degree of content validity. Authenticity and washback both have very high potential because students are centering on their own linguistic needs and are receiving useful feedback.
Reference
Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practice. New York: Pearson Education
Komentar
Posting Komentar