3 min read
This post is a bit challenging to write, partly because the concept of 'construct' is hard to explain (for me), and partly because construct validity is so central to discussions of validity in the literature.
When I started blogging about validity, I wrote that we can take the concept to mean asking the question 'does the test measure what it's supposed to measure?' We can now think a bit further as to what is actually being measured by tests. A test can only measure things that can be observed.
Say we are attempting to figure out a student's writing ability (maybe your typical school composition kind of writing). We can't actually directly measure your construct -- that mysterious, abstract ability called 'writing' -- but we do have an idea as to what it looks like. To try to fully assess it we might look at all the things that make up the ability we know as 'writing'. These are the kind of things that you will find in your marking rubric (they are there because we think that they are signs a person is good or bad at writing): organisation, grammar, vocabulary, punctuation, spelling, etc.
So we look at what we are measuring when we assess writing, and ask ourselves if these things do indeed comprehensively make up the ability we know as 'writing'. Is anything missing (construct underrepresentation)? Is there anything there that shouldn't be there because it has nothing to do with writing per se (construct irrelevance)? Imagine a writing test that didn't include marking for 'grammar', or one that required you to do a lot of difficult reading before you write. Certainly you can test writing in either of these ways, but you'd need to be clear as to what your construct is, how it differs from the more commonly understood construct of 'writing' and why. You could argue for a construct of reading+writing based on research findings, for example.
What I've written above is probably a gross over-simplification (maybe more so than usual). If you'd like a more technical explanation, I recommend JD Brown's article for JALT. It isn't long, and I love how it even includes, even if briefly, Messick's model of validity. This model is so important to our understanding of testing that I'm going to include here McNamara and Roever's (2006) interpretation of the model, in the hope that it might give you some food for thought over the long LNY weekend ;-)
Source: McNamara (2010)