Basic notions of language testing - How Do We Learn Languages? Second Language Acquisition

14. How Do We Learn Languages? Second Language Acquisition

15.3. Basic notions of language testing

So far you have come to terms with the historical changes of the concept of language proficiency, and some important teaching and testing methods. In this section you will learn about some of the major issues in modern language testing. You will read about the

15.3. Basic Notions of Language Testing

characteristics of good tests, test types, the washback effect of tests, and the main steps that should be followed to develop a good test.

15.3.1. Characteristics of good tests

As far as testing is concerned several issues must be taken into account. One of these is that in a language classroom students’ language knowledge can be assessed from a lot of aspects:

their listening, speaking, reading and writing skills, their vocabulary and grammar can all be tested. Naturally, all these skills have to be assessed if we are interested in somebody’s general language proficiency. This is the issue of VALIDITY, which is of key importance in language testing. To put it differently, a valid test should measure what it intends to measure. For ex-ample, imagine the following: in the grammar school you learnt English and you got a grade after each semester for your general English proficiency. However, you had to write only grammar and vocabulary tests and never had to make any oral presentations. Your final grade was based upon the test scores in grammar and vocabulary but knowing the language does not only mean being good at grammar and knowing a lot of words, but also being able to read, write, understand and speak in that language. So, the final grade does not reflect a com-plete picture of the students’ real language knowledge; in other words, it does not test what it should.

Another example of validity problems is the case of assessing somebody’s speaking ability only with one task: reciting a text which s/he learnt by heart. Even an extraordinary per-formance in this task does not necessarily mean that spontaneous conversation would be just as easy for the student. Therefore, we can state that this test did not test what it wanted to: the student’s speaking skills in general.

The third example is from the realm of writing: if you have to write a composition entitled

"Compare the advantages and disadvantages of nuclear and water power stations, and state clearly which you prefer.", this task is not valid if it intends to measure students' composition skills as the ones who have some knowledge of the topic will probably write a more well-or-ganized text than those who do not know much about the topic. The reason for this is that the topic is too specific.

Apart from validity, RELIABILITYis another key factor in testing, and it covers two things. One is the extent to which test scores are consistent: if candidates took the test again tomorrow after taking it today, would they get the same result? If yes, the scores are reliable, if not, there must be some problems with that test. The other meaning of reliability refers to the exam-iners’ work: if two examiners, separately from each other, give the same score to the same stu-dent essay, the INTER-RATER RELIABILITYof that test is high. If their scores are very different, the inter-rater reliability is low. INTRA-RATER RELIABILITYis high if one rater gives a certain score to an essay on a certain day, and some days later the same rater gives the same score to the same essay.

It is common knowledge among language testers that a test can be reliable without being valid, but it cannot be valid without being reliable (see “Points to Ponder” at the end of this chapter).

15.3.2. Test types

Based on what is tested, we can talk about ACHIEVEMENT TESTS, which are based on a syllabus or a textbook and intend to test how successfully students managed to learn the material cov-ered in that textbook over a week, term, or year. PROGRESS TESTSare very similar to achieve-ment tests as they intend to measure progress during a course. For instance, in a 15-unit textbook there can be three progress tests altogether, one after every fifth unit, and one achievement test at the end of the book. We can also talk about PROFICIENCY TESTS, which are not related to any SYLLABUSas they are intended to test students’ level of proficiency.

Any language examination system accredited in Hungary (“érettségi”, társalKODÓ, ECL, etc.) belongs to this category.

Based on why we test language knowledge, there are DIAGNOSTIC TESTSintended for diagnos-ing candidates' strengths and weaknesses. The aim of PLACEMENT TESTSis to measure profi-ciency as related to levels or groups, for example, to find out in what language level group somebody should start their studies in a language school. A FILTER TESTaims to filter out can-didates whose level of proficiency is below a certain level. A university entrance test, which in today’s Hungary is the advanced-level final examination (“érettségi”), is a typical example for this test type.

As discussed earlier, based on how we test we can distinguish discrete point and integrative tests. DISCRETE-POINT TESTSmeasure how well students know separate elements of a lan-guage always on sentence-level one at a time. Multiple choice grammar tests belong to this category. For example:

On the first day, we came across a young couple and their (1) ________ son.

(A. seven-year-old B. seven years old C. seven-years-old D. seven year old)

INTEGRATIVE TESTStest two or more (or a number of) skills together: For instance, you have to read a letter to which you have to reply. In this task your reading and writing skills are tested at the same time because you have to understand the letter so that you can reply to that.

Of course, a test can have several purposes at the same time, for example, a proficiency test can be a filter test at the same time.

15.3.2. Test Types

15.3.3. Washback effect of tests

A test can have a positive or a negative WASHBACK EFFECTon teaching and learning. It is pos-itive if the aims of the course and the testing are the same: similar tasks can be found both in the exam and during the language course. It is negative, however, if it is testing that deter-mines the content of the course. It happens if tasks different from the ones that are in the exam are not dealt with during the course at all. In this case, students do not learn the lan-guage but rather they prepare for the exam. If you learn a foreign lanlan-guage according to the communicative approach, you should not change your learning methods just because you have to take an examination.

15.3.4. A standard model of developing a good test

Think of the “érettségi” test you had to complete not a long time ago. Both the content and the layout seemed quite professional, did it not? No wonder, as it must have caused a hard time for the testers to make the test look and be so masterly. A test, mainly if it is a high-stakes test like the “érettségi”, has to be compiled in a rigorous way, which, in a nutshell, goes like the following:

First the test questions (TEST ITEMS) have to be written by somebody who knows how to write good items (TESTERS). After that this item-writer has to show other item-writers what s/he wrote and they have to discuss how acceptable those items are. If there are some wrong items, they must be changed. In the next phase the test must be tested (PILOT PHASE) – some students should complete the proposed test. These students cannot be the ones we want to test. The results have to be statistically analysed. If there are some items which do not test students’ competence well enough, they have to be changed. After that the test has to be for-matted and students can complete that. Then the tests have to be evaluated. Also, the pass-mark has to be set, which means that you have to state what the minimum score is students have to achieve so as not to fail the test.

In document A PPLIED L INGUISTICS T HEORETICALAND F IRST S TEPSIN (Pldal 175-178)