Bluffer’s Guide to Surviving an Ofsted Inspection

This article originally appeared in Teach Primary, a wonderful and intelligent publication to which you can subscribe here.

Finally, you fully understand what it would feel for the characters in one of those asteroid apocalypse movies. Bruce Willis can’t help you now though, the death-line has already been crossed. It starts with headteacher charging into your classroom just before midday, wild eyed and sweating, manically gesturing a phone-hand-signal before blurting out, “We’ve had the call, everything is okay. EVERYTHING IS OKAY!”

Everything is quite clearly not okay. Everything is very far from okay. But fear not. You have 18 hours until impact, and armed with this trusty guide, you can bluff your way through your Ofsted inspection…

Pile additional pressure on teachers whilst insisting there’s nothing to worry about.

Briefly consider delivering an amended version of the “We will not go silently into the night!” speech from Independence Day, complete with dramatic music. Immediately disregard that plan and tell nobody you even considered it. Instead call a huge, panicked staff meeting like one of the town hall scenes from the Simpsons. Explain that we’ve all been expecting this and all we need to do is show them what we do on a day to day basis. Then don’t let anyone leave until every facet of the school is unrecognisable from its usual self.

Pile additional pressure on children whilst insisting that there’s nothing to worry about.

Primary aged children have an annoying tendency of being completely honest when asked questions by adults, and this is perhaps your biggest threat over the next few days. Embrace your inner Malcom Tucker and ensure that they are all briefed within an inch of their lives. Drill them with the literacy targets that you half-heartedly introduced two months ago, and ‘remind’ them of how they know how to explain their learning objective and success criteria. Prepare them for the fact that you will be wearing a thing called a ‘tie’ tomorrow, and that this is completely normal. Sternly explain that our ‘special visitors’ will be watching their behaviour very closely so it’s very important that they, you know, behave. For once.

Play your best team 

Whilst little Patrick’s refusal to do anything except shout “You smelly head” at the top of his voice has become endearing to the staff of your school, inspectors may not be so understanding. It’s lucky that Patrick has been looking a little peaky recently. In fact, come to think of it, a lot of the children with more ‘lively’ approaches to learning suddenly look a bit under the weather.  Seek out these characters and hold a compassionate hand to their brow, before asking them how they’re feeling. When they reply, puzzled, that they’re absolutely fine, send them to the medical room immediately. Suggest to their parents they stay off for the next 48 hours. Better to be safe than sorry, right?

Buy Red Bull.

Coffee is not going to cut it. Red Bull comes in crates.

The books. My god the books.

Triage, my friend. You cannot mark all of those books in the time available to you. It’s just not possible. You need three piles. First, your ‘show’ books; the trusty high attainers with neat handwriting – strategically place these in areas most likely to be perused by unwelcome hands. Now take books of the middle attainers and randomly highlight in an array of colours. Then have the children ‘edit’ their work in a variety of coloured pencils whilst shouting repeatedly, “You’re responding to feedback, just like always. What are you doing!? Responding to feedback, that’s right. Just like always.” Finally, there are the ‘hopeless cases’, the books that a thousand half terms couldn’t save. These will need to be lost in a series of unfortunate incidents including, but not limited to: accidentally being thrown out with their books from last year; being ruined by spilt tea; insisting the child took it home; and “I’m sure I’ve got it here somewhere” before hiding in the toilet until they go away.

Deprive yourself of everything that makes you an effective teacher.

Since you’ll be in school until midnight, take away pizza is the only viable option, but be sure to supplement this with biscuits, sweets and stockpiled generic junk food. If you absolutely must, you may sleep a total of two hours, but assert loudly the next morning that you didn’t sleep at all. Any sort of recreational activity that brings you joy and well-being is strictly banned. Surely that goes without saying.


Of course, you could disregard all of this advice and just do what you normally do, like some sort of maverick from an 80s cop show. In which case, you’ve only got yourself to blame.


Bluffer’s Guide to Assessment

This article originally appeared in Teach Primary, a wonderful and intelligent publication to which you can subscribe here.


Okay, so assessment in primary is a mess, but let’s get something straight: when we complained that the government should stop interfering and let teachers get on with assessing children however we liked, we didn’t actually expect them to let us do it. So we’ve learnt two things. First, it turns out that the only thing worse than having a bad system is having no system at all, and second: the only thing worse than being told what to do is not being told what to do.

But since we moaned for so long about creating our own systems, we need to muddle something together, and preferably before the end of the academic year. To help guide you through the chaos, I present the bluffer’s guide to assessment:

Re-invent levels (but change the name)

So maybe levels were ‘unhelpfully vague’, ‘statistically flawed’ and ‘encouraged children to progress before ready’, but better the devil you know, am I right? When creating new assessment grids, you should just copy and paste the old level descriptors from APP but, and here’s the clever part, change the name at the top. Something like ‘progress indicators’ or ‘mastery thresholds’ will do. Preferably, you should pay a shady educational company thousands of pounds to do this for you, but instructing your work-wearied Deputy Head to bodge it together over a weekend is also acceptable.

Use the word ‘mastery’. A lot. (Don’t worry about what it means.)

Apparently, we should all now be doing mastery. Don’t worry, nobody really understands what it means, so you can just declare authoritatively that you are definitely doing mastery and shrug if anybody questions you. You can evidence mastery by adding a box to your planning pro forma that says ‘mastery’ and sticking in whatever you’ve decided your more-able students will be doing. If a student gets all of their work correct then smile and say things like “Well done Michelle. You’re doing mastery now.” Using apostrophes correctly is a good example of mastery, I think, but there are others too.

Generate as much data as possible.

Teachers have grown accustomed to spending large portions of their evenings and weekends pointlessly entering data into spreadsheets that nobody looks at. It would be both confusing and dangerously liberating for you to remove this requirement so, whatever you do, make sure that your new policy necessitates a bewilderingly over-burdensome data-entry process. If teachers question the system, respond with a sigh and the timeless cop-out of “I know, but Ofsted require it.” If teachers counter with the new Ofsted guidance, tighten your lips and proclaim that “they don’t really mean that’.

Continue to be terrified by Ofsted.

But, like, really really now.

If a parent asks about assessment, remember the three Cs.

Given the national coverage of the changes to assessment, some parents have unfortunately cottoned on to the fact that nobody knows what the hell we’re doing. They may approach you and say irritating things like “I don’t understand this new assessment system,” or “My child used to be on track but now you’re saying they’re behind,” or “What on earth is any of this supposed to mean. And why do you look so panicked? Hey, where are you going?”

Although such complaints are well-grounded, coherent and reasonable, it’s very important that we maintain the illusion that we know what we’re doing. They must never learn the truth. The government’s guidance of how to respond to a terrorist attack (run, hide, call for help) happily doubles as solid advice for dealing with parents with questions about assessment.

For the trickier customers, remember the three Cs: counter, confuse, confabulate. Start by explaining with conviction that everything is fine and that your new system is robust and reliable. Then throw every piece of educational jargon at your disposal at them, using at least a dozen acronyms, for example: “The APS for EAL and SEN was never comparable to our FMS, and IEPs further complicated matters”. Nod sagely as you do this. Finally, end by thanking them for their interest and allowing you to clear everything up. Promise to email them some documents, but never do.


And finally, keep your head down and wait for the whole thing to blow over.

Following this guide should protect you from ever being called out for not knowing what you’re doing but, more importantly, it will mean that the DfE never call our bluff and actually let us do anything for ourselves again. At which point we can safely go back to moaning about not being listened to.


Educational effectiveness research: I want to believe.

The following constitutes the introduction to an essay recently submitted for a masters that I’m currently undertaking in educational research. In particular I’m interested in the relationship between the body of knowledge known as educational effectiveness research (empirically validated approaches that ‘work’) and practitioners use (or not) of this knowledge to drive improvement at the classroom and school level.


The dynamic approach to school improvement (DASI), claims to provide educational researchers with a “theoretical framework for establishing a theory-driven and evidence-based approach to school improvement,” (Creemers & Kyriakides, 2012, p.4). The approach draws on the knowledge base provided by Educational Effectiveness Research (EER), and proposes a dynamic model which can be applied by practitioners and policy makers to improve schools. Although great progress has been made within the field over its thirty-year history, there remain several challenges.

One of the more contemporary difficulties rests in assumptions of the targets of education. One of the three main assumptions of the dynamic model is that ‘student outcomes’ are defined more broadly than the achievement of basic skills in core subjects such as language and mathematics. More specifically, ‘whole school curriculum aims (cognitive, psychomotor, metacognitive and affective)’ are specified as the important goals by which we now measure ‘effective schools’.

In their state-of-the-art-review on educational effectiveness research, Reynolds et al. (2014) concluded that “At the level of practice, it would… be difficult to find evidence of substantial take-up of the insights of EER at practitioner level in many countries”. Hallinger and Heck (2011) also lament the disconnect between EER (what we know about what makes schools effective) and school improvement (using this knowledge to make lasting improvements to schools and children’s outcomes).

To illustrate the point, the authors concede that they were unable to answer a school principal when asked

“Given what you know about leadership for learning, where would you advise me to put my effort as a school leader in order to gain the greatest improvement in learning for students at my school?’’ (Hallinger and Heck, 2011, pp. 1-2).

This is particularly disappointing given that the Congress of School Effectiveness and Improvement (ICSEI), set up almost 30 years ago in 1988, aims to bridge the gap between researchers and practitioners.

There are many reasons which could explain the fraught relationship between EER and school improvement, several of which are proposed by Reynolds et al. (2014, pp. 217-218): the quantitative orientation of EER; lack of an underlying theory; the static nature of EER analyses; and neglecting core concerns of practitioners. Many of these problems stem from the historic purpose of EER, to refute the charge that “schools make no difference” (Bernstein, 1968). Such an assertion followed from the seminal report Equality of Educational Opportunity in which sociologist James Coleman concluded:

Schools bring little influence to bear on a child’s achievement that is independent of his [sic] background and general social context; and that this very lack of an independent effect means that the inequalities imposed on children by their home, neighbourhood and peer environment are carried along to become the inequalities with which they confront adult life at the end of school. (Coleman et al., 1966, p. 325)

This is not to say that the sub-field of EER has not made positive contributions to the field of education. Perhaps most important has been its refutation of Bernstein’s contention. Reynolds et al. (2012, p. 1) conclude that the “field of EER has had some success in improving the prospects of the world’s children over the last three decades – in combating the pessimistic belief that ‘schools make no difference’.” The authors go on to suggest that EER is even beginning to generate a “reliable knowledge base about ‘what works’ for practitioners to use and develop, and in influencing educational practices and policies positively in many countries”.

Such an assertion is set within a contemporary national context in which there is evidence of a growing appetite for evidence from research informing practice. A prominent example is Educational Endowment Foundation (EEF), an independent charity funded by the government to commission research evaluating the impact of different projects and initiatives in education. In 2013 the EEF was designated a ‘What Works centre’, with the hope that it would provide for state education what the National Institute for Health and Clinical Excellence offers the NHS. In its 2014-2015 annual report (EEF, 2016) the EEF cites a survey conducted by the National Audit Office, which suggests that 64% of school leaders had accessed its research ‘toolkit’, suggesting that practitioners are increasingly seeking ‘evidence based’ approaches to teaching and learning.

Given the above, it seems a reasonable to assert that we are currently on the precipice of a genuine awakening within education in which practitioners and researchers work symbiotically to help children across the country receive highly effective teaching. Thus, the broad enquiry that I wish to turn my focus toward during the course of my studies and throughout my thesis project is how, given the many criticisms and difficulties alluded to[1], the knowledge base accumulated from EER can be used to help schools improve the outcomes of pupils.

[1] There are a great deal more criticisms, which I do not have space to explore fully here. Chapter 1 of Improving Quality in Education, Dynamic Approaches To School Improvement (Bert Creemers and Leondias Kyriakides, 2012) gives a fairly comprehensive overview of the more difficult historic and present challenges facing the field.



David Reynolds, Pam Sammons, Bieke De Fraine, Jan Van Damme, Tony Townsend, Charles Teddlie & Sam Stringfield (2014) ‘Educational effectiveness research (EER): a state-of-the-art review’. School Effectiveness and School Improvement: An International Journal of Research, Policy and Practice, 25:2, 197-230,

David Reynolds, Christopher Chapman, Anthony Kelly, Daniel Muijs and Pam Sammons (2012) ‘Educational effectiveness: the development of the discipline, the critiques, the defence, and the present debate’. Effective Education. 3:2,1–19

Philip Hallinger & Ronald H. Heck (2011) ‘Exploring the journey of school improvement: classifying and analyzing patterns of change in school improvement processes and learning outcomes’. School Effectiveness and School Improvement, 22:1, 1-27

Leonidas Kyriakides, Bert P. M. Creemers, & Panyiotis Antoniou, (2009) ‘The effects of teacher factors on different outcomes: two studies testing the validity of the dynamic model’. Effective Education, 1: 1, 12-23

Bert P.M. Creemers & Leonidas Kyriakides, (2012) ‘Improving Quality in Education: Dynamic Approaches to School Improvement’, Routledge: NY

This year’s primary assessment data will be useless because everyone is going to just game it.

This is not going to be a popular post. It is not an attack on primary teachers, although it might seem that way. It is simply outlining the consequence of a poorly conceived and badly implemented assessment framework and procedure. We had no choice in this being hoisted upon us, but what we do have a choice in how we respond to it. And I think that for us to stand any chance of being taken seriously as a profession we should stand together and honestly explain how we are going to ignore guidance to ensure that as many children as possible make the expected standard in KS1 and KS2.

The new assessment criteria

Levels were abolished last year for a multitude of reasons. They have not been replaced, but an interim assessment framework has been published by the government, which includes a number of ‘can do’ statements for reading, writing, maths and science. For children to be assessed as ‘working at the expected standard’ they are required to be ticked off against ALL statements, with evidence from a ‘broad range’ of work.

Alongside the statements, there is clear guidance in how children should be assessed. Most importantly:

  • Children should only be assessed once they have completed the key stage
  • The statements should not be used for ‘tracking’ progress part way through the year.
  • And single pieces of work should not be assessed against the framework.

Failing to learn from past mistakes

There are very good reasons for this accompanying guidance. It mitigates against what made levels such a poor measure of what a child can do. As I’ve previously mentioned, assessment criteria fall easily victim to Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure.

What does this mean? During a school year (or key stage) an entire curriculum is supposed to be delivered; taught and learnt. At the end of it, we are interested in what children are able to do, so we apply measures to assess them. Clearly you can’t assess everything so you choose a small number of measures (i.e. a sample) which ostensibly allow you to make inferences about the entire curriculum (i.e. domain).

This should be done retrospectively, of course, but since people have the measures beforehand, and since they are under such pressure to deliver top results, they are hugely incentivised to teach to the test, or make the small number of measures targets for children to check off. This can be done quite quickly. If a measure is ‘uses varied and ambitious vocabulary’ then you simply make your learning objective ‘To use varied and ambitious vocabulary’, teach a tight lesson and hey presto, one of the measures can be ticked off.

Clearly this makes the ‘measures’ useless in any meaningful sense, since the children had very little opportunity to be successful. The hard work was done for them.

We can see now more clearly why the three constraints above were put in place. They stop this gaming by it being explicitly stated that children should only be assessed at the end of the Key Stage (year 2 or year 6), that lots of work is assessed against each standard, and that no tracking (checking which statements have and haven’t been met) should take place during the year.

How we’ll all ignore the guidance and game the data

Already, many teachers have admitted to shirking this guidance, happily falling into the same trap of levels. In many cases, teachers do not even realise they are gaming their data. They believe that assessing children and then teaching to the gaps (i.e. turning the measures into explicit targets) is good teaching. Furthermore, it is seen as the only way to guarantee as many children as possible make the expected standard.

So here is what is (or will be) happening. Teachers will take each child and RAG them against each of the statements for reading, writing, maths and science. Any child with red will be prioritised (another flaw in levels) whereas ‘green’ children will be left alone. Lessons will be adjusted to ensure that all children can easily demonstrate the ‘can do’ a statements (for example, turning one of the statements into a learning objective). Children will be reassessed, and red will turn to green (this is the very definition of ‘tracking’ which we aren’t supposed to do because it invalidates the measures).

There will be some poor teachers and schools out there who follow the guidance to the letter, and only sit down to assess their children against the standards at the end of May/beginning of June with a broad range of work for each child. They will find themselves in a pickle because it will take them between 80 and 400 hours to make sound judgements necessary. Most teachers have realized this and so have started assessing children now (even though the key stage hasn’t been completed, thereby ignoring the guidance). In making the assessments now, they will have another few months to ‘fill the gaps’ or teach the test, thereby narrowing the curriculum and making the data useless.

Secondary school teachers will get a bunch of kids through their door, almost all ‘at the expected standard’ which tells them nothing more than that a kid copied down some adjectives when they were very explicitly told to.

Suffocating young writers.

This article originally appeared in the Times Educational Supplement, a magnificent publication which you can subscribe to here.


It’s hard to describe to those who don’t work in primary schools the intense satisfaction and pride that travels alongside watching young children transform from novice writers, unable to transcribe a simple sentence, to authors who pen an entire story, woven and plucked from their imaginations. Our role as teachers in this process is critical in nourishing two key aspects of accomplished writing.

The first is the mechanical process, which includes letter formation, correct punctuation and sentence construction. But alongside this, children must acquire a creative flair that allows them to come up with ideas, innovate, organise their thoughts and arrange them on a page for some imagined audience. In short, writing is a tremendously complex and difficult process, consisting of dozens of sub-processes making painful demands on the pitifully limited working-memory that we are all stuck with.

When trying to assist children in balancing all these spinning plates and coming out the other side as independent and confident writers, marking can get in the way. In fact, I’d argue marking can have unintended consequences that end up doing more harm than good.

Since it is so easy for those who do not teach a class full time to forget, it is worth laying out just what a rigorous marking policy can cost in terms of time. An average primary school teacher will deliver at least one maths, one literacy and one topic lesson per day. With a class of 30, that totals 90 books to be marked after the kids go home. If only two minutes is given to read and respond to each book, it will take three hours to clear the pile. Three hours of marking. Every day.

While these overly-burdensome marking policies are written by the senior leadership teams of individual schools, the high stakes nature of Ofsted inspections has caused many, especially those carrying the cross of a ‘requires improvement’ status, to forget exactly who marking is for, and why we do it.

I have found myself guilty of this. After noticing a child wasn’t starting a new line for a new speaker while writing dialogue, I explained this convention and told her to follow it for the rest of their piece. Then, without thought, I scribbled in the margin exactly what I had just told her. “What are your doing?” the child asked, confused.

“Don’t worry, that’s not for you.” I replied.

Well who the hell is it for, then? I later asked myself.

Although there have recently been some very promising reforms and clarifications, spearheaded by National Director Sean Harford, Ofsted must bear some responsibility for the chaos of marking and the potentially damaging effect it is having on students. Schools, desperate to secure a magic ‘good’ or ‘outstanding’ rating from Ofsted, scour recent reports for clues of how to impress their impending visitors. A simple search of the Watchsted website returns comments from recent inspection reports like:

Teachers’ written comments in marking do not always give pupils precise enough guidance on how to improve their work -108800

Pupils do not have a clear understanding of what they have achieved or how they can improve their work because marking is not always thorough – 115606

It is not difficult to see how these comments, both published in the last twelve months, could mold and supercharge a school’s marking policy. If this were only crushing teachers’ work/life balance with unreasonable demands, it would be bad enough, but this baseless ‘advice’ means children aren’t able to stretch their wings as writers and take ownership of their work. They follow a checklist laid out by an expert who does all of the hard work for them. Writing is reduced to a dull, generic formula of fronted adverbials and developed noun phrases.

When caught in the maelstrom of life in the classroom, with duties freewheeling down a never ending to-do list, it is so easy for us to lose sight of the bigger picture; the end goal. In short, we want our students to become proficient and motivated writers. The work of professor Robert Bjork shows how mechanistic feedback from teachers can get in the way of the former.

He has demonstrated that ‘desirable difficulties’ – i.e. not spoon feeding the student – improves student’s retrieval of important information in the longer term. After all, why would a child remember to start their sentence with a capital letter if they know that you’ll circle them all for them later that evening.

“They are desirable because they enhance long-term retention and transfer,” he explains, “they’re difficulties because they pose challenges. They slow down the rate that your own performance is improving as a student. And the consequence of that is that they can be easily unappreciated.”

But what of motivation? Surely marking children’s work at least ensures that they are driven to write?

It seems plausible enough, but rests on a mistaken belief of what motivates us. By rewarding every piece of work with a gold star or written praise, we are laying extrinsic motivators on top of an activity that should be inherently desirable. In Punished by Rewards, Alfie Kohn makes the point more clearly, summarising one of the most robust findings from social psychology: “the more you reward someone for doing something, the less interest that person will tend to have in whatever he or she was rewarded to do”.

A more natural way to motivate children to write is to provide an authentic purpose for doing so. So, asking children to write letters to public figures or organisations (and sending them), publishing articles for a school newspaper, or creating recipes to feature in a cookbook that is sent home to parents will provide a much greater drive than writing ‘so I can improve my level’ (a genuine response from a child when I asked why we were practising writing letters).

Perhaps one of the most promising uses of technology over the last five years in primary school is the Pobble website (previously Lend Me Your Literacy), which allows pupils to publish their work and have it read and commented on by teachers and students across the globe. Forgetting capital letters, then, becomes something that a student is compelled to check not because their teacher scribed it as a ‘wish’, or because it features on their target sheet in the front of their book, but because it’s embarrassing to miss them when you have a real audience.

If we abandon the received wisdom that every piece of writing must be post-scripted with obligatory Praise and Next Steps which require no reflection or thought, then we might just start to see children emerge as real writers, unafraid to make messy drafts, but meticulous final versions. They may well actively seek feedback and be compelled to implement it, rather than have it enforced upon them. They may well begin to flourish and light a fire within themselves that we couldn’t put out if we tried. 

Do primary schools need research?

This is a transcript of my opening address during a panel discussion at ResearchEd 2016.  
Primary school teachers have a lot to be proud of. Last year, the outgoing HMCI Sir Michael Wilshaw praised primaries in England for their huge improvement and went so far to say that “the rigour …at primary stage is not often developed sufficiently at secondary”.

This is not to ignore any problems in the system, though. Around one in five children fail to reach what used to be level 4 in reading, writing and maths, and though we can argue about whether this constitutes functional literacy, it is unlikely to set children up for success at secondary and beyond. It will be no surprise to anyone here that children from the most disadvantaged backgrounds are least likely to leave primary school with expected levels of attainment.

We know that this inequality takes hold in primary, and that by they time our children leave us in year six the dye is more or less cast.

So although, in primary, we have much to be proud of, there is still a lot of work to do to improve our schools and strive towards the goal of all children leaving compulsory education with the knowledge and skills that allow them to lead lives of choice an opportunity.

Whilst politicians tinker with structural reform, we all know that, ultimately, achievement is dependent on the teacher standing in the front of the classroom. Michael Fullan, Professor Emeritus at the Ontario Institute, “School Improvement and Pupil Improvement depend on what teachers do and think. It is as simple and as complex as that”

What can research offer the primary teacher? With Dylan William staring at this conference that the problem with educational research is, that ‘when the teacher goes to the research cupboard, they find it bare’, you’d be forgiven for treating any talk of evidence based education with a healthy dollop of cynicism. After all, as Wiliam says, ‘everything works somewhere, nothing works everywhere’

So I’m not arguing that educational research offers a silver bullet to ending educational equality nor a magic cure for poor teaching practice where it exists. Professional judgment, experience and the well honed craft of excellent teaching will always have a critical role in the classroom.

Having said this, I think research, from many different fields, can help teachers and leaders in primary schools in three key ways:

Accelerate development of new teachers

Many trainee teachers experience a sort of sink or swim approach to how hey fare in their first year. So much is dependent on your mentor, as you’re likely to learn much of your craft from them. If they are too busy or hostile, no real development can take place. Alternatively, your mentor could be committed to faulty ideas which don’t promote good student outcomes, and bad habits can be learnt.

Some think that there is no quick way to learn how to be a great teacher, you must weather slings and arrows for years before eventually possessing a bank of ‘how best to teach this to them’.

This is troubling for many reasons. “I’ve had 20 year’s experience” doesn’t necessarily trump a newbie in how to plan a lesson. After all, it’s possible that rather than having 20 years experience, , you actually have one year’s experience, twenty times.
How people (including children) learn

Whilst teacher training is still dominated by bloom’s taxonomy, Vygotsky’s ZPDs and Piaget’s…whatever Piaget talked about, there has been a boom in the field of cognitive psychology, the science of how we think and learn, and how this might be applied to education. Dan Wilingham tells us that students are far more alike than different in how they learn, and so broad principles of how best to teach can be derived from these principles. Bjork’s notion of desirable difficulties, the idea of embedding retrieval practice into lessons and are all examples of findings from research that can help us plan and teach every lesson, every day. Many of these principles are helpfully summarised in the Deans for Impact ‘Science of Learning’ report.

Best bets

One of the difficulties with having debates around education is the annoying fact that basically everything works. This means that you can feel like there is no need for you to be preached to by some academic or nosey management figure, because you know that the children are learning! In fact, we are constantly making choices about what and how we teach, and it is very possible that although you’ve made a good choice, there is a more good choice out there that you could have been doing instead. Although a blunt instrument, educational research, especially the EEF toolkit and reports like the Sutton trusts what makes great teachings, provide us with best bets. On balance, how should I approach x or y.

For these three reasons, thoughtful engagement with academic research in primary should be welcomed.

Out of the Ashes: were levels in primary really that bad?


First, a little about me. My name is Jon Brunskill and I started teaching in 2013, on the Teach First programme. I began teaching in year 3, during final statutory year of assessing using ‘levels’; but then I moved to year 6, during which time levels were allowed a stay of execution. This year I’ve been wrestling with the interim assessment framework as the head of year 2 at an all through school in London. I’m also currently a member of Dame Reena Keeble’s review of effective practice in primary teaching, commissioned by the DfE and led by the Teaching Schools Alliance.

So I trained in the period that levels became no more, which in retrospect was an interesting formative experience. Back then, I remember being confused, and my tutors and mentors only exacerbated this feeling.  Whenever I asked a question about assessment they shuffled uncomfortably and replied that ‘we just don’t know’.

So I was being trained to use a system of assessment that was on the conveyer belt to the scrap heap, but nobody was quite able to articulate to me just why they were being abandoned. After reading more about the nonsense of, say, levelling a single piece of work, it felt disingenuous to then sit down and do exactly that in order to fulfil my data entry obligations. This only got worse the following year, last year, when I taught year six, and we had to do the same thing again. Everyone else in the school had jumped ship, and we were left undertaking what we knew to be poor practice.

Since then, there has been much written and said about the flaws of national curriculum levels. These problems have been shown to be both inherent within their design, and (what was undoubtedly much more damaging) the manner in which they were implemented in schools over their lifetime. You’ve heard today from Dame Allison Peacock, and others, compelling reasons to move beyond levels and consign them to the dustbin of history.

With this being said, you may be wondering why the title of this talk seems to be an apology for levels. And I’d like to make clear from the start that there were serious, perhaps even fatal, problems with levels and the way that they were implemented. But I’ve been struck by a sort of smug clambering to the “Of course I never liked levels anyway’ bandwagon. It has become heresy to admit during staff meetings (or certainly on social media) that you actually quite like the APP grids and find them quite useful.

So What I think is necessary is a closer examination of exactly what went wrong with levels, and why, and whether there is any aspect that can be salvaged from the wreck. Were they really that bad? I’m not so sure, and I don’t actually think that I’m alone.

Show slide two

This is a poll conducted by Michael Tidd, a superb commentator on primary education and assessment. I like Michael because he’s a practitioner, and what comes along with that is a sort of grounded, practical approach to these problems. It is absolutely right that independent experts weigh in on the systems that we use in education to assess students, but I fear that along with this comes a sort of unrealistic, unattainable Platonic form of assessment. It is necessary for school leaders and teachers to be comfortable with a certain level of ‘this is the best that we’ve got to work so far’ mentality. Education is messy, and we are working with children whose learning we know does not conform to the neat trajectories that would make our life so much easier. A dollop of cynicism and a reflective attitude is healthy and productive in schools, but too much can be paralysing and ultimately damaging.

Anyway, when Michael polled his followers (and caveats here: this is a small, unconfirmed sample, which is almost certainly not representative of the profession in any demographic that we would be interested in), all of that notwithstanding, Michael found that out of over 1,600 respondees in primary, over 70% would prefer to return to levels now if given the chance. So what conclusions can we draw from that? Do all of these teachers just not understand assessment? Are they the victims of Stockholm syndrome?  Are they so unimpressed with the alternative that they’d prefer to stick with the better devil that they know? What are your thoughts? Why do so many teachers reject the impassioned and well-reasoned argument of this conference that we should discard levels for good?

Give delegates 30 seconds to discuss, then share responses.

It is my contention here today that these respondees are not idiots. It is my contention that they understand the flaws of levels, and that they do not take them to be a perfect system. I’d like to argue that the level system has much to offer and, whilst it was warped and bastardised and taken hostage by a high stakes monitoring mechainism (thereby nullifiying it’s use) it is still a very useful tool in supporting professional dialogue and understanding what children can do, where this broadly places them on a progressive spectrum of learning, and what they should master next to improve their knowledge and skills further.

Show Slide Three

In order to do this, I’d like to spend this session doing 5 things. I’m going to rush through talking about these five areas because I’m acutely aware that I may well be monumentally wrong here, and so I’m interested in your thoughts on what I propose. Having heard the case for the defense, I think there will be real value in tossing around the issues as a group of informed professionals (incidentally, this is what I think is powerful about the levels progress grids).

So first, I’d like to give a very brief history of levels, including their inception.

Secondly, I’d like to review what went wrong, from both an academic and a practitioner point of view. And I want to make it clear right now – I do think that they went wrong. This is not an unconditional apology for levels. I’m merely trying to separate the baby from the bath water.

So after reviewing what went wrong, and I’ll make clear that it was teachers and leaders, under pressure from the inspectorate, that ruined a useful system of assessment, after reviewing what went wrong I’d like to undertake a short assessment exercise with you.

After this, we’ll take a quick look at what has replaced levels as the statutory assessment system – focusing in on key stage one (my key stage).

This will lead to ‘other approaches’ to levels, and I’ll argue why I don’t find them convincing and don’t consider them to be a particular useful replacement to the admittedly flawed boat that we all jumped so happily from.

To be clear, my argument is that we have jumped out of the frying pan and into the fire with regards to primary assessment. I’d like to consider today that it may be fruitful to revisit what was compelling and helpful about the levels approach, and to try and agree on aspects that we might salvage from the wreckage. So we will end with an open discussion on assessment in which I’ll be interested to hear all of your views and will attempt to facilitate a discussion on how we best move forward.

Show Slide Four

1987 was the dawn of perhaps the most significant age of educational reform in this country’s history. Key stages were introduced and the National Curriculum was introduced. Alongside this new national curriculum, an assessment system was devised by Paul Black, influenced by Carol Dweck’s early work on mindset. Previously, there had been children moving through school who were always the same ‘grade’. In year 1 they were grade D, ad in year 5 they were still grade D. As far as parents and children were concerned, they had made no progress in their learning, despite the fact that they would clearly have been able to do much more in year 5 compared to year 1. So the notion of levels, a system systematic, key achievements that show incremental improvement, allowed both parents and children to see that they were ‘getting better’, they were making progress, they were improving in what they knew and could do.

Dylan Wiliam, who led the Expert Panel that advised the abandonment of levels, was unequivocal in his defence for levels in their original formulation, stating:

“Let me be clear. I was a huge fan of the system of 10 (later eight) levels that Paul Black’s Task Group on Assessment and Testing recommended to Kenneth Baker (then Secretary of State for Education) in December 1987, not least because it was based on the work that Margaret Brown and I had done on levels of achievement in graded assessment schemes, and Carol Dweck’s early work on mindset.”

So it’s clear. This was not a system that was flawed from the start. Something happened. Something went wrong. Our question now, is what?


Teachers. That’s the short answer. Teachers is what went wrong. If you were being a little more sympathetic, you may modify that to teachers under the direction of senior leaders, and a more charitable still reading would be teachers under the direction of senior management facing implicit instructions from a powerful inspectorate. But that is another story. Let’s turn now to exactly how we bastardised the system of assessment of which Dylan Wiliam was a ‘huge fan’.

“Let me be clear. I was a huge fan of the system of 10 (later eight) levels that Paul Black’s Task Group on Assessment and Testing recommended to Kenneth Baker (then Secretary of State for Education) in December 1987, not least because it was based on the work that Margaret Brown and I had done on levels of achievement in graded assessment schemes, and Carol Dweck’s early work on mindset.”

Tim Oates, Group director of Assessment, Research and Development at Cambridge Assessment, and Chair of the Expert Panel which informed the review of national curriculum from 2010 to 2013, sheds some light on just what went wrong.

Show Slide Five

There were three key problems:

First, children began themselves to self-label. They were referring to themselves (and each other) as ‘level three’ in a pejorative manner.

Second was undue pace. We were incentivized to constantly push children on to the next level descriptor, instead of ensuring children had deep understanding and had mastered the basics, especially those key, core foundational ideas. It was go go go, progress progress progress.

Third was the invalidity of levels: Test score? Just in? Best fit? These were not comparable.

What did this look like from a practitioner’s level?

Show Slide Six

We took a system that was originally supposed to be used to help teacher judgments of overall performance at age 7 and 11, and broke it down into a system that would be used to judge children at the end of each year. Since only two levels of progress was expected over four years, it was difficult for teachers to ‘demonstrate progress’, or give evidence that the children were doing better than when they arrived. It would be perfectly possible, expected in fact, that a child would enter year three at a level 2, and leave with the same number, despite having made substantial progress.

Show Slide Seven and Eight

So we broke them down into sub-levels. Now a child could enter at 2c, and leave at 2a – progress demonstrated. And to show that we were committed to rapid and sustained progress, we assessed children with these sublevels not at the end of each key stage, as intended, not at the end of each year, not even at the end of each term, but EVERY half term.

Show Slide Nine

Worse still, because we were so determined to ensure children ‘made progress’, we started taking the measures that were supposed to be used to retrospectively judge performance, and turned them into targets for children. So the level descriptors became learning objectives and success criteria. And as we all know from Charles Goodhart, ‘When a measure becomes a target, it fails to be a good measure.’

So here is the crux, all of these problems are associated with a formative framework for assessment becoming highjacked as a summative measure and used for monitoring purposes. If you get rid of the numbers – and I never told children what their number was, all they need to know is what they need to do next to improve further – then all of a sudden this looks very reasonable.

Show Slide Ten

Because there is better and worse writing. Caught up in the smug abandonment of levels there is almost a notion that there is no such thing as a fairly linear progression in improving writing skills. There is, and although vague level descriptors aren’t very helpful if you are making a high stakes, fine grained judgement, they work just fine for day to day teaching and as a common language between professionals. Again, it is not the levels that were the problem, but the way that we used them.

So let’s take a test case.

Show Slide Eleven

Here are two children, child A and child B. Which writing is better?

Clearly child A. We’re actually really great at comparing children’s writing like this, and very quickly making a judgment on which is better. There’s a session next door on comparative judgment which is a system of assessment that I’m looking forward to follow closely. But we don’t just want to know which writing is better (although we do want to know that). It’s necessary but not sufficient. Because what happens when child B says to you, ‘I want to write as well as child A.’ What do you say to her?

Take a few minutes and make a list of what feedback you would give to this child, based on what they have currently produced.

Make a list on flipchart.

What is this starting to look like? An APP grid. We could introduce a better piece of writing and do the same thing again. And before we knew it we’d have a rough progression of descriptors. Of course there is better and worse writing, and we’re broadly able to explain what makes writing better and worse, and the sorts of things that children should do to improve their writing.

Especially as a new teacher, I found the app grids useful as a document that gave me a brief overview of the journey that young writers go through, from letter formation to fluency.

Here’s a confession, I actually still find an APP grid a useful thing to have on the desk when assessing children’s writing at the end of a term. I think it frames and focuses professional discussion. If you take the numbers and letters away, and don’t use this to try and report to managers and government, then this is a good, useful document and framework to help make all children get better at writing.

And perhaps this is just a primary thing. Perhaps it works only because the skills are quite basic, the next steps relatively clear. I understand that this may be less helpful as skills and next steps become more complex and nuanced. I’d be interested in any secondary colleagues’ thoughts on this.

The problem was never levels. It was our conflation of formative and summative systems. We need to properly separate formative and summative assessment. It’s madness to expect teachers to mark their own homework, whilst putting huge pressure on schools to deliver better and better results, and not expect those numbers to get gamed and become empty and worthless. Teacher assessment is useful in showing the teacher what a child can do, and what they may need to do next, but hopeless as a monitoring tool from up above. It immediately loses all value, violating Goodhart’s law. But, and I stress this, an APP grid, and the spirit of levels, of clear progression that all children can move through, that was a useful internal classroom structure. It becomes both meaningless and damaging if you try and also use that as a monitoring tool.

But there should be summative assessment systems used to monitor schools from above. To suggest anything else would be childish. The government absolutely has a responsibility to gain a comprehensive picture of how schools and children are doing, challenging poor performance and gaps in attainment, especially the seemingly intractable gap in outcomes between the rich and the poor.

Of course, high-stakes testing drove a lot of the mistakes that primary schools made with assessment. So I’d like to end by arguing that we should actually summatively assess children more frequently – and Tim Oates may agree here, he says ‘there is too little assessment (of the right kind). I think that if you summatively assess children, through an annual standardised test, then teacher’s workload would be dramatically reduced, we would gain a more accurate picture of children’s learning and progression, and, perhaps most importantly, the stakes would be organically lowered. No longer would school be getting one throw of the dice at the end of year six. This would leave teachers free to focus on what children can do currently, and what they need to do next to improve.