Out of the Ashes: were levels in primary really that bad?


First, a little about me. My name is Jon Brunskill and I started teaching in 2013, on the Teach First programme. I began teaching in year 3, during final statutory year of assessing using ‘levels’; but then I moved to year 6, during which time levels were allowed a stay of execution. This year I’ve been wrestling with the interim assessment framework as the head of year 2 at an all through school in London. I’m also currently a member of Dame Reena Keeble’s review of effective practice in primary teaching, commissioned by the DfE and led by the Teaching Schools Alliance.

So I trained in the period that levels became no more, which in retrospect was an interesting formative experience. Back then, I remember being confused, and my tutors and mentors only exacerbated this feeling.  Whenever I asked a question about assessment they shuffled uncomfortably and replied that ‘we just don’t know’.

So I was being trained to use a system of assessment that was on the conveyer belt to the scrap heap, but nobody was quite able to articulate to me just why they were being abandoned. After reading more about the nonsense of, say, levelling a single piece of work, it felt disingenuous to then sit down and do exactly that in order to fulfil my data entry obligations. This only got worse the following year, last year, when I taught year six, and we had to do the same thing again. Everyone else in the school had jumped ship, and we were left undertaking what we knew to be poor practice.

Since then, there has been much written and said about the flaws of national curriculum levels. These problems have been shown to be both inherent within their design, and (what was undoubtedly much more damaging) the manner in which they were implemented in schools over their lifetime. You’ve heard today from Dame Allison Peacock, and others, compelling reasons to move beyond levels and consign them to the dustbin of history.

With this being said, you may be wondering why the title of this talk seems to be an apology for levels. And I’d like to make clear from the start that there were serious, perhaps even fatal, problems with levels and the way that they were implemented. But I’ve been struck by a sort of smug clambering to the “Of course I never liked levels anyway’ bandwagon. It has become heresy to admit during staff meetings (or certainly on social media) that you actually quite like the APP grids and find them quite useful.

So What I think is necessary is a closer examination of exactly what went wrong with levels, and why, and whether there is any aspect that can be salvaged from the wreck. Were they really that bad? I’m not so sure, and I don’t actually think that I’m alone.

Show slide two

This is a poll conducted by Michael Tidd, a superb commentator on primary education and assessment. I like Michael because he’s a practitioner, and what comes along with that is a sort of grounded, practical approach to these problems. It is absolutely right that independent experts weigh in on the systems that we use in education to assess students, but I fear that along with this comes a sort of unrealistic, unattainable Platonic form of assessment. It is necessary for school leaders and teachers to be comfortable with a certain level of ‘this is the best that we’ve got to work so far’ mentality. Education is messy, and we are working with children whose learning we know does not conform to the neat trajectories that would make our life so much easier. A dollop of cynicism and a reflective attitude is healthy and productive in schools, but too much can be paralysing and ultimately damaging.

Anyway, when Michael polled his followers (and caveats here: this is a small, unconfirmed sample, which is almost certainly not representative of the profession in any demographic that we would be interested in), all of that notwithstanding, Michael found that out of over 1,600 respondees in primary, over 70% would prefer to return to levels now if given the chance. So what conclusions can we draw from that? Do all of these teachers just not understand assessment? Are they the victims of Stockholm syndrome?  Are they so unimpressed with the alternative that they’d prefer to stick with the better devil that they know? What are your thoughts? Why do so many teachers reject the impassioned and well-reasoned argument of this conference that we should discard levels for good?

Give delegates 30 seconds to discuss, then share responses.

It is my contention here today that these respondees are not idiots. It is my contention that they understand the flaws of levels, and that they do not take them to be a perfect system. I’d like to argue that the level system has much to offer and, whilst it was warped and bastardised and taken hostage by a high stakes monitoring mechainism (thereby nullifiying it’s use) it is still a very useful tool in supporting professional dialogue and understanding what children can do, where this broadly places them on a progressive spectrum of learning, and what they should master next to improve their knowledge and skills further.

Show Slide Three

In order to do this, I’d like to spend this session doing 5 things. I’m going to rush through talking about these five areas because I’m acutely aware that I may well be monumentally wrong here, and so I’m interested in your thoughts on what I propose. Having heard the case for the defense, I think there will be real value in tossing around the issues as a group of informed professionals (incidentally, this is what I think is powerful about the levels progress grids).

So first, I’d like to give a very brief history of levels, including their inception.

Secondly, I’d like to review what went wrong, from both an academic and a practitioner point of view. And I want to make it clear right now – I do think that they went wrong. This is not an unconditional apology for levels. I’m merely trying to separate the baby from the bath water.

So after reviewing what went wrong, and I’ll make clear that it was teachers and leaders, under pressure from the inspectorate, that ruined a useful system of assessment, after reviewing what went wrong I’d like to undertake a short assessment exercise with you.

After this, we’ll take a quick look at what has replaced levels as the statutory assessment system – focusing in on key stage one (my key stage).

This will lead to ‘other approaches’ to levels, and I’ll argue why I don’t find them convincing and don’t consider them to be a particular useful replacement to the admittedly flawed boat that we all jumped so happily from.

To be clear, my argument is that we have jumped out of the frying pan and into the fire with regards to primary assessment. I’d like to consider today that it may be fruitful to revisit what was compelling and helpful about the levels approach, and to try and agree on aspects that we might salvage from the wreckage. So we will end with an open discussion on assessment in which I’ll be interested to hear all of your views and will attempt to facilitate a discussion on how we best move forward.

Show Slide Four

1987 was the dawn of perhaps the most significant age of educational reform in this country’s history. Key stages were introduced and the National Curriculum was introduced. Alongside this new national curriculum, an assessment system was devised by Paul Black, influenced by Carol Dweck’s early work on mindset. Previously, there had been children moving through school who were always the same ‘grade’. In year 1 they were grade D, ad in year 5 they were still grade D. As far as parents and children were concerned, they had made no progress in their learning, despite the fact that they would clearly have been able to do much more in year 5 compared to year 1. So the notion of levels, a system systematic, key achievements that show incremental improvement, allowed both parents and children to see that they were ‘getting better’, they were making progress, they were improving in what they knew and could do.

Dylan Wiliam, who led the Expert Panel that advised the abandonment of levels, was unequivocal in his defence for levels in their original formulation, stating:

“Let me be clear. I was a huge fan of the system of 10 (later eight) levels that Paul Black’s Task Group on Assessment and Testing recommended to Kenneth Baker (then Secretary of State for Education) in December 1987, not least because it was based on the work that Margaret Brown and I had done on levels of achievement in graded assessment schemes, and Carol Dweck’s early work on mindset.”

So it’s clear. This was not a system that was flawed from the start. Something happened. Something went wrong. Our question now, is what?


Teachers. That’s the short answer. Teachers is what went wrong. If you were being a little more sympathetic, you may modify that to teachers under the direction of senior leaders, and a more charitable still reading would be teachers under the direction of senior management facing implicit instructions from a powerful inspectorate. But that is another story. Let’s turn now to exactly how we bastardised the system of assessment of which Dylan Wiliam was a ‘huge fan’.

“Let me be clear. I was a huge fan of the system of 10 (later eight) levels that Paul Black’s Task Group on Assessment and Testing recommended to Kenneth Baker (then Secretary of State for Education) in December 1987, not least because it was based on the work that Margaret Brown and I had done on levels of achievement in graded assessment schemes, and Carol Dweck’s early work on mindset.”

Tim Oates, Group director of Assessment, Research and Development at Cambridge Assessment, and Chair of the Expert Panel which informed the review of national curriculum from 2010 to 2013, sheds some light on just what went wrong.

Show Slide Five

There were three key problems:

First, children began themselves to self-label. They were referring to themselves (and each other) as ‘level three’ in a pejorative manner.

Second was undue pace. We were incentivized to constantly push children on to the next level descriptor, instead of ensuring children had deep understanding and had mastered the basics, especially those key, core foundational ideas. It was go go go, progress progress progress.

Third was the invalidity of levels: Test score? Just in? Best fit? These were not comparable.

What did this look like from a practitioner’s level?

Show Slide Six

We took a system that was originally supposed to be used to help teacher judgments of overall performance at age 7 and 11, and broke it down into a system that would be used to judge children at the end of each year. Since only two levels of progress was expected over four years, it was difficult for teachers to ‘demonstrate progress’, or give evidence that the children were doing better than when they arrived. It would be perfectly possible, expected in fact, that a child would enter year three at a level 2, and leave with the same number, despite having made substantial progress.

Show Slide Seven and Eight

So we broke them down into sub-levels. Now a child could enter at 2c, and leave at 2a – progress demonstrated. And to show that we were committed to rapid and sustained progress, we assessed children with these sublevels not at the end of each key stage, as intended, not at the end of each year, not even at the end of each term, but EVERY half term.

Show Slide Nine

Worse still, because we were so determined to ensure children ‘made progress’, we started taking the measures that were supposed to be used to retrospectively judge performance, and turned them into targets for children. So the level descriptors became learning objectives and success criteria. And as we all know from Charles Goodhart, ‘When a measure becomes a target, it fails to be a good measure.’

So here is the crux, all of these problems are associated with a formative framework for assessment becoming highjacked as a summative measure and used for monitoring purposes. If you get rid of the numbers – and I never told children what their number was, all they need to know is what they need to do next to improve further – then all of a sudden this looks very reasonable.

Show Slide Ten

Because there is better and worse writing. Caught up in the smug abandonment of levels there is almost a notion that there is no such thing as a fairly linear progression in improving writing skills. There is, and although vague level descriptors aren’t very helpful if you are making a high stakes, fine grained judgement, they work just fine for day to day teaching and as a common language between professionals. Again, it is not the levels that were the problem, but the way that we used them.

So let’s take a test case.

Show Slide Eleven

Here are two children, child A and child B. Which writing is better?

Clearly child A. We’re actually really great at comparing children’s writing like this, and very quickly making a judgment on which is better. There’s a session next door on comparative judgment which is a system of assessment that I’m looking forward to follow closely. But we don’t just want to know which writing is better (although we do want to know that). It’s necessary but not sufficient. Because what happens when child B says to you, ‘I want to write as well as child A.’ What do you say to her?

Take a few minutes and make a list of what feedback you would give to this child, based on what they have currently produced.

Make a list on flipchart.

What is this starting to look like? An APP grid. We could introduce a better piece of writing and do the same thing again. And before we knew it we’d have a rough progression of descriptors. Of course there is better and worse writing, and we’re broadly able to explain what makes writing better and worse, and the sorts of things that children should do to improve their writing.

Especially as a new teacher, I found the app grids useful as a document that gave me a brief overview of the journey that young writers go through, from letter formation to fluency.

Here’s a confession, I actually still find an APP grid a useful thing to have on the desk when assessing children’s writing at the end of a term. I think it frames and focuses professional discussion. If you take the numbers and letters away, and don’t use this to try and report to managers and government, then this is a good, useful document and framework to help make all children get better at writing.

And perhaps this is just a primary thing. Perhaps it works only because the skills are quite basic, the next steps relatively clear. I understand that this may be less helpful as skills and next steps become more complex and nuanced. I’d be interested in any secondary colleagues’ thoughts on this.

The problem was never levels. It was our conflation of formative and summative systems. We need to properly separate formative and summative assessment. It’s madness to expect teachers to mark their own homework, whilst putting huge pressure on schools to deliver better and better results, and not expect those numbers to get gamed and become empty and worthless. Teacher assessment is useful in showing the teacher what a child can do, and what they may need to do next, but hopeless as a monitoring tool from up above. It immediately loses all value, violating Goodhart’s law. But, and I stress this, an APP grid, and the spirit of levels, of clear progression that all children can move through, that was a useful internal classroom structure. It becomes both meaningless and damaging if you try and also use that as a monitoring tool.

But there should be summative assessment systems used to monitor schools from above. To suggest anything else would be childish. The government absolutely has a responsibility to gain a comprehensive picture of how schools and children are doing, challenging poor performance and gaps in attainment, especially the seemingly intractable gap in outcomes between the rich and the poor.

Of course, high-stakes testing drove a lot of the mistakes that primary schools made with assessment. So I’d like to end by arguing that we should actually summatively assess children more frequently – and Tim Oates may agree here, he says ‘there is too little assessment (of the right kind). I think that if you summatively assess children, through an annual standardised test, then teacher’s workload would be dramatically reduced, we would gain a more accurate picture of children’s learning and progression, and, perhaps most importantly, the stakes would be organically lowered. No longer would school be getting one throw of the dice at the end of year six. This would leave teachers free to focus on what children can do currently, and what they need to do next to improve.


One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s