Skip to main content


2 min read

If you've followed this series so far, you might be thinking that wow it's hard to make a test reliable and valid -- too hard!

Well actually the first principle of language assessment discussed in Brown and Abeywickrama (2010) is 'practicality'. You could design the most reliable and valid test in the world, but if it's not practical to carry out, you know it isn't going to happen the way you planned it. My take on this is that we can try our best to be reliable and valid in our assessment, but also be realistic about what is achievable given limited resources.

For instance, an elaborate rubric might be more reliable to mark with, but if it's too complex to use easily and you have a big group of students, you might not use it the way it's intended, because it just takes too much time to mark one script. As a result, reliability suffers, because different teachers end up handling the complexity of the rubric in different ways.

Another example: we know that double marking is more reliable, but we also recognise that double marking every script of every test is just not feasible. In such a case, we have to make other efforts at maximising reliability.

Having said this, I think we can sometimes think of creative ways to maximise reliability and validity while still being realistic about what is doable. Take for instance standardisation meetings, which can be a drag because they take up so much valuable time. As I mentioned before, markers can be given the scripts prior to the meeting to mark at home, or they might even discuss the scripts online (e.g. by annotating them on shared Google Docs). I believe that technology can offer ways to make test administration more reliable and valid in more effective and efficient ways, and we should not therefore immediately discard a possible measure because of its perceived impracticality.

Have you got tips and strategies to maximise reliability and validity more efficiently? Please share on !