Skip to main content

Test reliability

3 min read

This post is for those of you who set your own assessments. Which I guess we all have to sooner or later!

There are all sorts of 'best practices' you can read about test reliability in large-scale, 'standardised' tests (I use 'standardised' here in the true sense, i.e. not exclusively Multiple Choice Questions). As usual, though, I will concentrate on what is practical to do within the context of the classroom.

I want to start with MCQs in fact, because we usually think of them, or any other sort of dichotomously scored items (T/F, matching, etc), as being the most reliable. However, if you've read my post on rater reliability, you'll recall that the validity of such items can be questionable (i.e. do they really test what we want to test?) They can also be unreliable in unexpected ways. For instance, it's quite common to find MCQ items with more than one correct answer, and so students who choose the 'unofficial' correct answer end up being marked wrong. This might not always be apparent to us as test writers, so it's a good idea to get colleagues to check. Having been test takers ourselves, we must know too that it's all too tempting to tikam (i.e. guess) when we don't know the right answer to an MCQ. So your student might get the right answer by sheer luck. Sometimes they guess the right answer because of irrelevant clues (e.g. it's longer/shorter than the other options).

MCQs aren't necessarily bad items, but they do require a lot of time and effort to design well, and should be avoided unless you are willing to invest both. Perhaps you are designing a large scale test that you want to be able to mark quickly, and will build up a test bank of recyclable test items over time. There are lots of good advice out there for MCQ test designers.


This work by gulia.forsythe is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic Licence

If we go with subjectively scored items such as essays, there is likely to be rater unreliability. We already know that these can be minimised, though, and I tend to think that time on this is better spent than time on designing good MCQs, in most classroom language assessment situations. Such test items can also be badly designed though. They can be ambiguous in the way they are written, such that a student who may know their stuff doesn't actually give you what you thought you were asking for. Again, getting the help of colleagues to check the items is useful.

Brown and Abeywickrama (2010) offer some other tips to enhancing test reliability. Don't make the test too long, because while tests can be too short to reliably measure proficiency, they can also be so long that they cause fatigue in test takers. They also point out that some people (like me!) don't cope well with the stress of timed tests.

I'll stop here but if you have something to say about test reliability, please tweet it with .