When you talk to test designers and test providers, you hear the word validity a lot. It’s clearly important ‒ everyone wants a test to be valid. But what is validity? And how do you tell if a test is valid or not?
Let’s take the example of a driving test. Do we just ask people to do a classroom test to check they know the rules of the road? No, we obviously want to find out whether they can actually drive a car. Do we take them to an empty car park and observe them driving round and parking in a bay? No, we want to see whether they can drive in the real world. So we design a test in which candidates have to drive in streets with other traffic and pedestrians, and show they can cope with real-life, everyday scenarios. The purpose of the test is to differentiate between those who are ready to drive safely on their own and those who aren’t. If the test can achieve that and can do it reliably (i.e. regardless of where and when someone takes it and who the examiner is), then it is valid.
How does this translate to language testing? A valid English-language test is one that demonstrates whether a test-taker can use English to meet their real-world needs. For a new migrant, that means the ability to carry out everyday tasks in English; for a university student, it means the ability to study an academic subject in English; for an IT support call agent, it means the ability to communicate with customers about technical queries, by phone or by email. In each case, the range of English-language skills is different, so we can’t use the same test to assess the test-taker’s skills in all those areas. That is a key factor of validity ‒ a test isn’t valid in itself; it’s valid for a particular use. When testers analyse the validity of a test, what they aim to evaluate is: Does the test tell us whether the test-taker can use English for the particular target use?
In the early days of testing, there was a reliance on measuring reliability: i.e. does the test provide consistent scoring across candidates? But with the growth of testing, testers realised that reliability is not enough on its own; statistical consistency doesn’t tell you whether the test is evaluating the right skills. So testers developed a multifaceted view of validity, focusing on a range of issues, such as:
- Construct: Do the questions test the real-world skills required for a particular use? Are the questions representative of the full range of skills?
- Consequences: Does the nature and content of the test encourage teachers and learners to develop relevant and useful skills or just focus on exam technique?
- Usefulness: Does the test give stakeholders useful results? In other words, does success in the test mean people actually succeed when they go into the real-world environment which the test was designed for?
- Fairness: Is the test equally accessible to everyone, regardless of background and personal characteristics? Does it give a fair chance to all?
English-language tests have consequences for a test-taker. The result can be the difference, for example, between receiving or not receiving a visa to stay in an English-speaking country, or it can mean being placed in the wrong class, with negative effects on both student and teacher. That’s why validity matters and why testing organisations put so much effort into ensuring that their tests are valid for the purpose for which they are designed.