Skip to main content

Educator, applied linguist, language tester.

hsiaoyun.net/

groups.diigo.com/group/assessment-literacy

Feedback

5 min read

It's Tuesday as I write this, and as I happen to be doing a workshop on feedback tomorrow, I thought I'd be lazy and share some of the key content as my Wednesday post on assessment. I've organised my session around the three categories of Why? - How? - What? (inspired by Shove, Pantzar and Watson's SPT (social practice theory) framework), before we give it a try as a class. The aim is to give effective feedback as efficiently as possible; as we all know, it's tiring and time-consuming work, and sometimes it feels like our efforts just disappear into a black hole!

 

Why feedback?

Feedback is integral to formative assessment, which, as we already know from Black & Wiliam, can result in significant learning gains, helps low achievers in particular, and can cultivate active and collaborative learners. It therefore supports self-directed learning and 21st century competencies.

 

How can we give effective feedback?

Here's a great image based on this article.

5 research-based tips for providing students with meaningful feedback

This work by rebe_zuniga is licensed under a Creative Commons Attribution 2.0 Generic Licence.

More tips I've gathered from various articles (including some tweeted by Dr Carless):

  1. Build trust: make learners feel safe to fail, so that they take risks, and allow us to see what help and feedback is needed
  2. Promote a growth mindset: as per Carol Dweck -- as Dylan Wiliam says 'smart is not something you are, smart is something you get'
  3. Develop a dialogue: instead of writing mini-essays learners might never read in earnest, engage our learners in a dialogue
  4. Forget the sandwich: the feedback sandwich can seem condescending or manipulative; be honest and constructive instead
  5. Focus on task, not ego: we don't need the sandwich to protect the learner's fragile ego if we focus on the task rather than the person
  6. Eliminate grades/marks: or delay releasing them if we can't -- research shows learners tend to ignore feedback if both are given
  7. Assess one criterion per task: we risk overwhelming the learner if we try to assess everything at once -- focus on one thing at a time, and let the learner know in advance so that they know where to direct their efforts
  8. Feed it forward: what next? how can the learner apply this feedback in future work?
  9. Make it actionable: can it be applied? or is it beyond the ability of the learner?
  10. Work less than the learner: resist correcting everything for the learner -- we want to encourage them to take responsibility and ownership, and to develop self-directed learning capabilities
  11. Cultivate feedback literacy: why is feedback important, and how do we use feedback to improve what we do?
  12. Activate peers: peer feedback can be more effective than ours, and learners learn twice when they give feedback, helping them internalise the qualities of a good performance and self-assess
  13. Share range of feedback: learners improve their awareness when they see what others have done well or poorly
  14. Incorporate regular reflection: reflection helps learners develop themselves as self-assessors and self-directed learners, and helps us understand better the kind of feedback our learners are in need of

 

What can we use?

I've thought of 10 tools but maybe you have more to suggest.

  1. Analytic rubrics/scoring: this is usually in the form of a grid, and breaks performance down into criteria
  2. Marking symbols: commonly used in assessing writing (e.g. SP = spelling error) 
  3. Master list of comments: keeping a list of frequent comments that we can 'recycle' by copying and pasting; this can include links to resources such as YouTube content
  4. Google Drive: the Swiss Army knife of digital feedback tools; easily build a feedback dialogue -- check out Doctopus which turbocharges what is already a powerful tool
  5. Voice recordings: can result in better uptake; easy on Google Docs with Kaizena (not so easy on Word)
  6. Google Forms: great for eyeballing answers collated onto a spreadsheet and quick individual comments as feedback; allows learners to see range of anwers and feedback
  7. Spreadsheets: as part of Google Form or by itself; helps us be consistent with both feedback and comments; easily mail merge feedback to learners
  8. Screenshot annotations: sometimes we need to show, not tell; I really like Awesome Screenshot because it plays well with Google Drive
  9. Screencasting: sometimes we need to show and tell; Screencastify is one of many options out there (free and works with Chromebooks)
  10. YouTube: with a webcam, we can easily video ourselves giving feedback and upload it immediately as a public or private video for sharing

 

I can't profess to be a model of a good 'feedbacker', but I do consider feedback on my feedback seriously and reflect on my own practices (even as I write this). Have you got other tips or strategies to share? What has worked and not worked for you?

 

Alphabet soup: AfL, AaL, LOA

2 min read

Last week, my post on formative assessment (and a subsequent tweet asking for suggestions) sparked a short conversation on Twitter with @ashley about Assessment for Learning and Assessment as Learning, as well as Learning Oriented Assessment. I'm still looking for suggestions for this blog (let me know!); in the meantime, here's my attempt at sorting out these concepts.

Assessment for Learning (AfL) is for all intents and purposes formative assessment. It's useful here to revisit Dylan Wiliam @dylanwiliam's table:


Assessment as Learning was originally proposed by Lorna Earl @lmearl. While often differentiated from AfL, if we accept Wiliam's definition of AfL, AaL is more accurately a subset of AfL:


Learning Oriented Assessment is the 'new' kid on the assessment block:

Figure from
Carless (2007)

Originally proposed by David Carless @carlessdavid and his colleagues, the concept should ring a bell for those of you who are familiar with the backward design approach to curriculum. This approach includes Understanding by Design (Wiggins @grantwiggins & McTighe @jaymctighe), popular in K-12:


(taken from here; original source unknown)

And also Biggs's Constructive Alignment (well-known in HE):


Diagram by UCD Teaching & Learning

I see LOA as a model that not only employs backward design, but does it in a way that foregrounds formative assessment (including AaL). It also deemphasises the distinction between summative and formative assessment in a way that might actually be constructive -- the key is to make summative assessment perform a learning-oriented service, in addition to institutional purposes. I say constructive because seeing the two assessments as a dichotomy (mutually exclusive) could put teachers and learners in a bind -- we can't do away with summative assessments because of institutional demands, and positioning them as the 'bad guys' doesn't necessarily eliminate washback. IMO, the distinction between formative and summative is still important, but the gap can be narrowed, and an assessment could be thoughtfully designed to serve both purposes, perhaps especially if it is an 'alternative' assessment rather than a traditional timed test. By aligning all assessments with the LOs, we can ideally ensure that both kinds -- summative and formative -- are pulling stakeholders in the same direction rather than opposing ones, and promote positive washback.

I've really only just started thinking about these concepts (and what they mean in relation to my own research), so any thoughts you might have on this are very welcome :)

Formative assessment

3 min read

What is assessment? While we often use “test” and “assessment” interchangeably, it’s important to differentiate the two. A test is an assessment, but an assessment isn’t necessarily a test. Tests are usually timed and result in marks or grades. Assessments can take many other forms, however.

Hill and McNamara (2012) talk about assessment opportunities, which they define as ‘any actions, interactions or artifacts... which have the potential to provide information on the qualities of a learner’s... performance’. It’s important to note that these can be unplanned, unconscious and embedded, and therefore can take place anytime in class, and these days, out of class as well.

Assessment opportunities are particularly useful for formative assessment. Black and Wiliam, who have written extensively on this topic, say that assessment is formative only if the evidence about student achievement obtained is actually used to make decisions about the next steps in instruction.



Formative assessment is often known as Assessment for Learning. The Assessment Reform Group came up with this diagram (above) to illustrate the importance of formative assessment. I think it shows the different dimensions of formative assessment very well. I particularly like the point about developing the capacity for self-assessment, which is critical to the development of self-directed learners. In their definition of AfL, the 3 aims are to find out where the learners are, where they need to go, and how best to get there.



Wiliam usefully unpacks formative assessment in the chart above, which shows us the respective roles of teacher, peer and learner in achieving the 3 aims I’ve just mentioned. As you can see, formative assessment, done right, ought to cultivate active and collaborative learners.

So what’s the difference between formative assessment and its opposite, summative assessment? In a nutshell, they have different functions and result in different things. Summative assessment is used to rank or certify, and for accountability purposes, while formative assessment is actually used to meet learner needs. Summative assessment typically ends with grades or marks, while formative assessment produces feedback for the learner instead.

Black and Wiliam have noted that when students are given both, they tend to ignore feedback and focus solely on their grades or marks. This is a habit that’s hard to break, and makes marks and grades doubly un-useful for learners.

What are some other reasons formative assessment is important? Black and Wiliam have reported significant learning gains as a result, noting that it helps low achievers in particular.

So often, however, teachers think of formative assessments as little tests that result in marks or grades, which don’t tell the teachers nor the students much about the learning that’s going on, or what to do next.

Formative assessment can be embedded into our class activities. Take a look at this page by the Northwest Evaluation Association for some ideas.

What formative assessment activities do you use? How do you and your students use them to inform teaching and learning? Please share with us on Twitter

Designing tests

1 min read

I'm cheating a bit this week by posting a set of slides adapted from the one I used for my class. (This cycle will be a bit different if designing alternative and/or formative assessments.)

Washback

2 min read

Even if you are not familiar with the term, you are probably familiar with the concept of washback (commonly called backwash in educational assessment). It refers to the effects of assessment on teaching and learning, and anyone who's studied in an exam-oriented system would have experienced this.

We tend to think poorly of washback because we often think of negative washback, e.g. ignoring what's in the syllabus in favour of what will be in the exam, even if we think that the syllabus has more worthy learning outcomes. While washback can be very problematic, I think we do need to consider two things.

First, as long as high-stakes exams determine a person's educational prospects, it's pretty unfair to blame teachers (and parents and learners) for their preoccupation with preparing students for exams. I don't mean to say that teachers etc. should willingly let exams lead them by the nose, and applaud those who can look beyond exams to think and act with true education in mind. However, we would be doing our students a disservice if we didn't prepare them adequately for exams (think face validity and student-related reliability). The point is not to obsess over them and let them overrun the curriculum.

Second, washback can be positive, and we should try to leverage this. While national exams are not within our control (though we may be able to exert some subtle influence), classroom assessments are -- make sure these are aligned with our intended learning outcomes. I believe that real learning will serve students well in their exams, and that obsessive exam prepping is unnecessary.

How do you deal with washback? Let us know on Twitter with

Authenticity

1 min read

Authenticity is about the closeness of your assessment task to a real-world task. This seems quite straightforward, until you consider that in the real world, few tasks demand only language proficiency, and not also non-language related knowledge and competencies.

So how authentic can a language test get? Brown and Abeyrickrama (2010) list a few qualities:

  • language that is as natural as possible
  • contextualised items
  • meaningful, relevant, interesting topics (although it's worth considering that meaningful, relevant, interesting to us may not be meaningful, relevant, interesting to students)
  • some thematic organisation to items, e.g. through a storyline
  • 'real-world' tasks (which could also be questionable -- do language teachers necessarily have an accurate sense of the authenticity of tasks?) 


It's possible that what we need for optimal authenticity are 'integrated' assessment tasks that combine different subjects in the curriculum, instead of language on its own.

What do you think? What sort of authentic assessment tasks do you use? Let me know with on Twitter.

Practicality

2 min read

If you've followed this series so far, you might be thinking that wow it's hard to make a test reliable and valid -- too hard!

Well actually the first principle of language assessment discussed in Brown and Abeywickrama (2010) is 'practicality'. You could design the most reliable and valid test in the world, but if it's not practical to carry out, you know it isn't going to happen the way you planned it. My take on this is that we can try our best to be reliable and valid in our assessment, but also be realistic about what is achievable given limited resources.

For instance, an elaborate rubric might be more reliable to mark with, but if it's too complex to use easily and you have a big group of students, you might not use it the way it's intended, because it just takes too much time to mark one script. As a result, reliability suffers, because different teachers end up handling the complexity of the rubric in different ways.

Another example: we know that double marking is more reliable, but we also recognise that double marking every script of every test is just not feasible. In such a case, we have to make other efforts at maximising reliability.

Having said this, I think we can sometimes think of creative ways to maximise reliability and validity while still being realistic about what is doable. Take for instance standardisation meetings, which can be a drag because they take up so much valuable time. As I mentioned before, markers can be given the scripts prior to the meeting to mark at home, or they might even discuss the scripts online (e.g. by annotating them on shared Google Docs). I believe that technology can offer ways to make test administration more reliable and valid in more effective and efficient ways, and we should not therefore immediately discard a possible measure because of its perceived impracticality.

Have you got tips and strategies to maximise reliability and validity more efficiently? Please share on !

Face validity

1 min read

So far we haven't considered the test-taker's point of view. Face validity refers to exactly this: does the test look right and fair to the student?

Of course, one might argue that students are not usually the best judge of validity. But their opinion, however flawed, can affect their performance. You want students to be confident and low in anxiety when taking a test, because you want to maximise student-related reliability, as mentioned in an earlier post.

Brown and Abeywickrama (2010) advise teachers to use:

  • a well-constructed, expected format with familiar tasks
  • tasks that can be accomplished within an allotted time limit
  • items that are clear and uncomplicated
  • directions that are crystal clear
  • tasks that have been rehearsed in their previous course work
  • tasks that relate to their course work (content validity)
  • a difficulty level that presents a reasonable challenge

(p. 35)

As always, please share your thoughts on .

Construct validity

3 min read

This post is a bit challenging to write, partly because the concept of 'construct' is hard to explain (for me), and partly because construct validity is so central to discussions of validity in the literature.

When I started blogging about validity, I wrote that we can take the concept to mean asking the question 'does the test measure what it's supposed to measure?' We can now think a bit further as to what is actually being measured by tests. A test can only measure things that can be observed.

Say we are attempting to figure out a student's writing ability (maybe your typical school composition kind of writing). We can't actually directly measure your construct -- that mysterious, abstract ability called 'writing' -- but we do have an idea as to what it looks like. To try to fully assess it we might look at all the things that make up the ability we know as 'writing'. These are the kind of things that you will find in your marking rubric (they are there because we think that they are signs a person is good or bad at writing): organisation, grammar, vocabulary, punctuation, spelling, etc.

So we look at what we are measuring when we assess writing, and ask ourselves if these things do indeed comprehensively make up the ability we know as 'writing'. Is anything missing (construct underrepresentation)? Is there anything there that shouldn't be there because it has nothing to do with writing per se (construct irrelevance)? Imagine a writing test that didn't include marking for 'grammar', or one that required you to do a lot of difficult reading before you write. Certainly you can test writing in either of these ways, but you'd need to be clear as to what your construct is, how it differs from the more commonly understood construct of 'writing' and why. You could argue for a construct of reading+writing based on research findings, for example.

What I've written above is probably a gross over-simplification (maybe more so than usual). If you'd like a more technical explanation, I recommend JD Brown's article for JALT. It isn't long, and I love how it even includes, even if briefly, Messick's model of validity. This model is so important to our understanding of testing that I'm going to include here McNamara and Roever's (2006) interpretation of the model, in the hope that it might give you some food for thought over the long LNY weekend ;-)


Source: McNamara (2010)

Criterion validity

3 min read

This work by frankleleon is licensed under a Creative Commons Attribution 2.0 Generic Licence

Okay, so you've designed a test and you've decided that if the students reach a certain mark or grade (or meet certain criteria), they have achieved the learning outcomes you're after. But are you really sure? How can you know? This is essentially the question we aim to answer when we consider criterion validity.

We can consider two aspects of criterion validity: concurrent validity and predictive validity.

To establish concurrent validity, we assess students in another way for the same outcomes, to see for example if those who performed well in the first assessment really have that level of proficiency. In my previous post on content validity, I gave the example of an MCQ grammar test vs an oral interview speaking test, to measure grammatical accuracy in speaking. To check the concurrent validity of the MCQ test, you could administer both tests to the same group of students, and see how well the two sets of scores correlate. (This does assume you are confident of the validity of the speaking test!) In a low stakes classroom testing situation, you might not have the time to administer another test, but you could for instance call up a few students for a short talk, and check their grammatical accuracy that way. You might pick the students who are borderline passes -- this could show you whether your pass mark is justified.

As for predictive validity, this is really more important when the test scores determine the placement of the student. Singapore schools typically practise streaming and/or banding to place students with others of the same level. If the test we use to determine their placement does not have predictive validity, that means there is a good chance the student would not be successful in that group. Which kind of defeats the purpose of streaming/banding! We can't predict the future, but we can compare past and future performances. We could for instance compare the test scores of students a few months into their new placement with the test scores we used to determine their placement. If there are students who perform much better or poorer than you would reasonably expect, it's time to re-examine the original test, and probably move the students to a more suitable class too.

That's about it for criterion validity. As always, tweet your comments and questions with .