Friday, 31 July 2015

Assessment Commission report: my top 5 points

The cat is out of the bag. This week the frustratingly overdue report from the Assessment Commission into assessment without levels was leaked via Warwick Mansell and it nearly broke twitter, such was the excitement over its contents. Keen to head the charge, I skimmed through it and tweeted some key points. But Harry Fletcher-Wood was already on the case. He carried out a more thorough dissection and hit us with a barrage of tweets followed by a ridiculously quickly written and excellent summary. He's already covered all the key aspects which just leaves me to countdown my top 5 points from the report and enjoy the warm glow of satisfaction derived from the knowledge that many of their recommendations match what I've been tweeting, blogging and banging on about for the past year.

So, here are my top 5:

5) "Levels also used a ‘best fit’ model, which meant that a pupil could have serious gaps in their knowledge and understanding, but still be placed within the level." (p8)

Yet many if not most schools are implementing systems that are placing pupils into best-fit bands, which have little to do with teaching and learning and everything to do with accountability. Yeah, I'm looking at you Emerging, Developing, Secure. It's time to take an honest, objective look at these systems and ask the question: "Is this really assessment without levels?" 

4) "The word mastery is increasingly appearing in assessment systems and in discussions about assessment. Unfortunately, it is used in a number of different ways and there is a risk of confusion if it is not clear which meaning is intended." (p11).

Call me old fashioned but I reckon it probably is best to work out what mastery means before we attempt to assess it.

3) "Progress became synonymous with moving on to the next level, but progress can involve developing deeper or wider understanding, not just moving on to work of greater difficulty. Sometimes progress is simply about consolidation." (p7).

Just that: sometimes progress is simply about consolidation. Progress is neither a race nor is it linear, and we need to stop devising systems that treat it as such. 

2)" The starting point of any assessment policy should be the school’s principles of assessment." (p20)

It does not start with the tracking system!

1) "More frequent collection of assessment data may not only be a waste of time, but could actually be damaging if actions are taken based on spurious or unreliable interpretations. It could also encourage a rapid-but-superficial approach to learning." (p26).

Yes! We need assessment for learning, not assessment of learning. If we adopt systems of assessment that involve the collection of data every few weeks we'll continue to repeat the mistake of the past whereby a) teachers may be tempted to fabricate data in order to 'prove' progress, and b) pupils may be pushed on before consolidating their knowledge. Ultimately no one wins. Maybe, just maybe, progress measures themselves are at the heart of the problem

So, that's the key points I've taken from the report. I really recommend you read it, digest it, and look at your own systems through the prism of its guidance. Hopefully by this time next year we'll actually start assessing without levels.

Happy holidays!

Friday, 24 July 2015

The Progress Paradox

There is a radical concept in urban design known as shared space. It involves the removal of kerbs, street furniture, and painted lines in order to blur the boundaries between traffic and pedestrians. The idea is that if you merge the various zones of use in the urban environment - pavements, cycle lanes and roads - people become more aware of other users and more conscientious towards their fellow citizens as a result. And it works! Removing all the features that are designed to keep us safe actually makes us safer.

I promise there is a point to this and I'll get back to it later.

I have blogged before about the highly dubious and misguided approaches we take to measuring progress. That we seek to distill learning down to a baseless numerical value not for the benefit of teaching and learning - for teachers and pupils - but for the purposes of accountability and performance management. Levels - perhaps once fit for purpose - were hacked up into an ill-defined system of sublevels and points, and bundled into neat packages of 'expected' progress in order to quantify learning and satisfy the demands of external agencies.

Points became the currency of scrutiny.

And so, these measures are now part of the common language of assessment and are now so integral to the daily running of a school that it is hard to imagine a world without them. They have come to define the contours of learning. It is perhaps inevitable that when levels were removed we set about recreating them. We needed the numbers to 'prove' the progress even though we knew deep down that the numbers meant nothing. The cage was opened but we quietly closed the door and stayed put.

But we have to measure progress, right? Surely we need to quantify it in some way?

Don't we?

One of the key reasons for the removal of levels was that they often caused pupils to be rushed through content before they were ready. Pupils that were deemed to be 'broadly level 4' therefore reached the end of end of the key stage with significant gaps in their learning.

But if that was a key  issue with levels, isn't it a problem with any progress measure? If we are driven by steps, bands and points then isn't there a big temptation to tick the box and move the pupil on? Aren't we just chasing meaningless numbers? Has anything really changed?

This brings me back to the concept of shared space. Perhaps if we remove all the points and expected rates of progress - the street furniture of assessment - we would concentrate more on the learning; on identifying pupils' weaknesses and addressing the gaps. Assessment would then be returned to it's proper state: about what is right for the child, not what is right for the bottom line; and ultimately both the child and school would benefit.

So, maybe progress measures are a distraction and if we concentrate on embedding learning - on consolidation, cognition, gaps, and next steps - then the progress will take care of itself. Perhaps, ironically, pupils would make better progress in a world without progress measures, where teachers are not chained to expected rates linked to linear scales that tempt them to push pupils on before they are ready. We must avoid repeating past mistakes, shoehorning pupils into 'best-fit' bands and expecting uniform progression through the curriculum. Instead let's focus on the detail - track the objectives that the pupil has achieved and assess their depth of understanding. The progress will be evident in pupils' work and we don't need arbitrary numbers to tell us that.

Essentially it all comes down to one irrefutable truth:

If you teach them, they will learn.

Thursday, 16 July 2015

The Gift

We're all knackered. You've all been teaching forever and I've visited approximately 1000 schools a week since I become self-employed last November. What I want to do right now is talk to my family, watch the Big Bang Theory, drink some beer and then sod off to France in a couple of weeks and go climbing. The last thing I wanted to do this evening was write a blog.

But then the DfE published this research into the reception baseline.

I skipped the first document (55 pages), speed read the next one, and wasn't going to bother with the third. It basically sounded like one of those police officers at the scene of an accident: "nothing to see here. Move along." But I thought I should make the effort. It's only 12 pages long after all. 

And I'm very glad I did. In amongst the flannel and whitewash was this:

The research noted the difference between the scores of the two groups - the teaching & learning group and the accountability group - with the latter having lower scores, suggesting that perhaps when tests are administered for purposes of establishing a baseline for measuring progress (i.e for accountability reasons) lower scores are given.

Then they appear to have let their guard down.

Read paragraph 3 in the screenshot above:

"The overall result would be statistically significant at the 95% level if the data were from an independent random sample."

Hang on! What?

Is the data significant? Or isn't it?

It would appear that the use of a 95% confidence interval is not appropriate in this case because the data is not from a random independent sample. So it is significant at the 95% level but that test is not used due to the nature of the sample. Quite rightly they employ a more appropriate test.

But significance tests in RAISE are carried out using a 95% confidence interval. Either this means that cohorts of pupils are independent random samples or the wrong test is used in RAISE.

This is something that Jack Marwood, myself and others have been trying to get across for a while - that there isn't a cohort of pupils in England (or maybe anywhere for that matter) that can be considered to be an independent random sample.

Not one.

So if the DfE decides to use a different test for significance in this research on the grounds that the samples are not independent and random, then shouldn't they do the same in RAISE?

Until cohorts of children are true independent, random samples, does this mean we can discount every blue (and green) box in our RAISE reports?

Well, perhaps not - that would be rather foolhardy. In an email exchange with Dave Thomson of FFT today, he stated that the tests used in RAISE are useful in that they indicate where there is a large deviation from the national mean and significant data should be treated as the starting point for a conversation. He did then point out that no cause can be inferred; that statistical significance is not evidence of 'school effects' and that it should not be treated as a judgement.

So, there is some disagreement over the significance of the sentence (pun intended) but I'm still left wondering why a test that is not appropriate here, is deemed appropriate for other data that is neither random nor independent. 

That sentence may not change everything as I rather excitedly claimed last night, but it does pose big questions about the validity of the tests used in RAISE. This reads like an admission that statistical significance tests applied to groups of pupils are flawed and should be treated with extreme caution. Considering how much faith and importance is invested in the results of these tests by those that use them to judge school performance, perhaps we need to have a proper conversation about their use and appropriateness. It is certainly imperative that users understand the limitations of these data.

So, thank you DfE, in one sentence you've helped vindicate my concerns about the application of statistical significance tests in RAISEonline. An unexpected end of year gift. 

Have a great summer!