Skip to main content

Numerical Marking: General Principles

This policy should be read in conjunction with: 

The additional information below provides further clarification in the context of the Common Awards programmes. 

Marking is a matter of qualitative academic judgment, guided by formal criteria. 

Qualitative academic judgments about student work cannot be reduced to formulae, or made a matter of ‘ticking boxes’. We do, nevertheless, provide detailed guidance to help makers translate their qualitative judgments into numerical marks, and to express those judgments in consistent language. 

Our detailed assessment criteria are intended to be a helpful guide, not a straitjacket. 

The University provides, in its Core Regulations, ‘Generic Assessment Criteria’ for degree-level work. These provided a benchmark for the creation of more detailed criteria specific to Common Awards. See our pages on Marking Criteria for more details. These more detailed criteria were produced by the Common Awards Finished Product Group (a group set up by the Ministry Development Team to finalise the Common Awards paperwork) and have been revised by the Continuing Implementation Group. In other words, they were generated by representatives of the TEIs working together with the Ministry Development Team. So they are not a regulation imposed by the University, but rather an attempt by the wider Common Awards community to provide helpful guidance that will support consistent marking practice across the TEIs. Feedback from TEIs on how helpful they are proving in practice will help us to refine them further. 

Using the detailed assessment criteria is more art than science. 

These detailed marking criteria are not designed to be used mechanically. Looking at the table of criteria in relation to a given piece of work, a marker may find that some rows are more applicable than others, that the implied classification of that piece of work is different in different rows, and that several different descriptions in some rows could plausibly be applied. No formula for combining all these factors into a single mark can substitute for good academic judgment, in the light of the learning outcomes for the module, the nature of the task being assessed, the kind of guidance that was given to students, the materials available to them, and so on. Nevertheless, the tables can assist a marker in calibrating their judgments against those of other markers, and can help them find language to express their judgments clearly to the student. 

Numerical marking 

Our marking criteria provide a way of translating qualitative judgments into numerical marks. 

The Common Awards marking criteria are qualitative, not quantitative. The vast majority of our marking reflects that. We do not decide that one essay is exactly 2.3 times as good as another essay, nor that a student has made 24% fewer errors than another in a given essay. 

We use a numerical scale that is widely used in further and higher education, but the numbers themselves are purely conventional. We choose, for instance, to assign the boundary between upper second quality work and first class quality work the number 70. We could have assigned it the number 3, the number 270, or the number 3.8x1067. The number 70 has no direct meaning: it does not mean that 70% of the learning outcomes were met; it does not mean that first class work is at least 70/40 or one and three quarter times as good as a bare pass. 

The translation into numerical marks is intended to model our intuitive judgments about how qualitative judgments combine. 

We have, however, picked these otherwise arbitrary numbers so that students who gain fairly straightforward profiles of marks over the course of their studies will normally end up with the overall classification that we intuitively deem they should. Here is a student with one high 2.2 mark, two low 2.1 marks, and one high 2.1 mark; turn those into numbers, take the average as required by our degree classification rules, and we’re looking at a 2.1 overall, which ‘seems about right’. ‘Seeming about right’ in the kinds of cases where we find it fairly easy to agree in our judgments is the only real test of whether the otherwise arbitrary mathematical rules that we have put in place for determining classifications are appropriate ones. 

The system is also designed to extend to cases that are more difficult, where our intuition gives out. 

In the case of a student whose marks are all over the place, for instance, we may well not have any agreed sense of what ‘seems about right’. So we trust the numbers and rules that have worked in more straightforward cases.

Using the full range of marks 

We do not ‘mark student work out of 100’, and the regular call to ‘use the full range of marks’ does not mean ‘marks should go all the way up to 100’. 

There is no useful sense in which our marks are ‘percentages’. A piece of work that gets a mark of 60 has not got 60 things out of 100 right, or achieved six tenths of perfect clarity. We could decide that the highest possible mark was 76 or 80 or 92, and the mere fact that there are numbers between that highest mark and the number 100 would be, in itself, a completely uninteresting fact – and the common call to ‘use the full range of marks’ is meaningless if this is all that it rests on. 

There are, however, good reasons for ensuring that we don’t confine our first-class marks to the low 70s. 

It is appropriate to ask whether the range of marks that we do use for good first class work models our intuitive judgments well. Our rules for award classifications happen to give a prominent role to numerical averages, and this does mean that the width of the range of first class marks matters. If, for instance, all our first class marks are clustered into the 70-75 band while our second class marks remain spread over the 20 marks between 50 and 70, it becomes very easy for a second class mark to pull a student’s average down below 70, and very hard for a first class mark to pull it up. In such a case, a student might well get a whole range of the very highest first class marks we are prepared to give, and yet be pulled down by a few marks that are low-ish 2.1. Does that ‘seem about right’? If not – and successive Boards of Examiners in multiple universities have tended to say that it does not – then we need to use a wider range of first class marks in order to make our quantitative model of our qualitative judgments work better. 

Our marking scheme includes a band from 86 to 100 to recognise extraordinary work. 

In line with many other universities, our assessment criteria contain guidance on marks all the way up to the 86–100 range. That has nothing to do with the false idea that we are ‘marking out of 100’, and therefore need to make sense of the numbers all the way up to the ‘top’. It is instead a way of allowing and encouraging us to recognise truly extraordinary work on the occasions when we meet it, and to give it a mark that will make a serious difference to the student’s overall grade. Marks in this range will be rare. The detailed Level 6 criteria for this band for ‘Essays and Other Written Assignment’, for instance, indicates that all such work will typically demonstrate ‘complete mastery’ of the question set, ‘extremely powerful, original argument’, ‘outstanding analysis’ and more. 

And marks in this range will get rarer the higher you go. Work marked at 86 will be work where we judge that, on balance, yes, these amazing things can actually be said about this essay, though only just. Work much higher up the band will be work that clamours to be acknowledged in these terms, and could still be described in this way even if it were significantly worse. As a result, marks in the 90s tend to be very rare indeed, and it would be no surprise to attend several Boards of Examiners and not see any examples.