Wednesday, 20 July 2016

Content of the new RAISE report: The Good, the Bad and the Ugly

The DfE have now released details of the content and format of the new RAISE summary reports, to be published in the Autumn term. As expected, they are going to look considerably different to previous versions; and it's not just the obvious stuff that's changed. Of course, there will no longer be pages filled with percentages achieving level 4 and level 5, and progress matrices are consigned to history, but there are also big changes in the way data is put into context, most notably with new national comparators and prior attainment groups for both KS1 and KS2. The new KS2 progress measure uses a similar methodology to existing VA but scores will be in a new format; and colour coding - perhaps the thing we are most fixated on - has changed and now comes with a subtle twist. Some of these changes I like, some I'm not so sure about and some really bother me. Here's a rundown of the good, the bad and the ugly.

The Good

There appears to be a big change to the way performance of groups is compared. Previously, this has been a mess with results of key groups compared against national 'averages' for all pupils, for the same group, or for the 'opposite' group with no indication as to which is most useful or relevant. The analyses of performance of disadvantaged pupils was particularly confused with comparisons made against national comparators for disadvantaged pupils and all pupils in some parts of the report but, most critically, against non-disadvantaged pupils in the closing the gap section; a section that was - somewhat bizarrely considering its importance - tacked on the end of the report. This mess seems to have been addressed in the new reports so we should now be more aware of the relevant benchmarks for each group. For example: boys, girls, no SEN, non disadvantaged and EAL groups will be compared against the same group nationally; disadvantaged, FSM and CLA pupils will be compared against the national figures for non-disadvantaged pupils (as per closing the gap); and the SEN support and SEN with statement or EHC plan groups will be compared against national figures for all pupils. OK, I don't quite get the rationale behind the last bit. Either SEN pupils should be compared against SEN pupils nationally, or not compared against anything. At least in the interactive reports users will be able to switch the comparator to 'same' for all groups allowing a like-for-like comparison.

Prior attainment
Big changes here. One of my main criticisms of RAISE is the lack of progress data for KS1. Previously, schools were judged on attainment alone. The new RAISE reports, whilst not providing VA scores for KS1 (one of the many things I like about FFT dashboards), will put the KS1 results into context by splitting cohorts into prior attainment groups based on EYFS outcomes. So, the KS1 results for those pupils that achieved a good level of development in the foundation stage will be shown, presumably alongside a national comparator. There will also be a further breakdown of KS1 results for those pupils that were emerging, at expected or exceeding in the early learning goals for reading, writing, and maths. I know there are many who will disagree with this approach but too many schools are forced into producing this data themselves when KS1 results are low. The lack of KS1 progress data in RAISE has also resulted in a major dilemma for primary schools: to they go high to ensure 'good' KS1 results, or go low and gain more progress at KS2. Hopefully, with KS1 results now placed in the context of prior attainment, this pressure will ease somewhat.

We will also see new subject specific prior attainment groups for progress measures at KS2. For example, the progress in maths will be shown for pupils with low prior attainment in reading, or middle prior attainment in writing. I assume the definitions of these bands are simply low = W or L1, middle = L2, High = L3, which differs from the point thresholds used for overall prior attainment groups based on APS at KS1. This new approach is welcome as it goes a long way to addressing concerns about the new VA measure outlined here. Whilst the main VA model is based on KS1 APS and will therefore result in pupils with contrasting prior attainment in english and maths being grouped together, these new prior attainment groups will allow us to unpick the progress data and isolate the issues.

Subject data on a page
All information about each subject (i.e. progress score, average score, %EXS+, % high score, for the cohort and key groups) will be shown in one table, which is great news, because up until now it's been all over the place. Previously, we've had to scroll up and down through about 30 pages of attainment and progress data to get the headlines for KS2, forcing us to create our own templates to compile the key figures. Hopefully now we'll just need to refer to a handful of key pages, which will be very welcome.

A shorter report?
Reading between the lines here, I'm hoping we'll have a slimmed down RAISE report this autumn. 60 pages was too much. How about 20 pages? How about just 10 and ditch the dashboard? Please let that happen.

The Bad

Comparing results of SEN pupils against those of all pupils nationally is certainly not great. That should be changed to a like-for-like comparison by default, rather than the onus being on the school to do this via the interactive reports in RAISEonline. Also, writing VA continues to worry me, but that doesn't look like it'll be changing anytime soon. I look forward to seeing the methodology but I'd rather it was ditched from progress measures. My other bugbear is percentages for small groups and it looks like that farce is set to continue. I don't think there are many primary schools where percentages make any sense when you drill down to group level, even when the gap from national 'average' is expressed as a number of pupils. I would prefer analysis of group data to focus on average scores, but even that is flawed in that it can be easily skewed by anomalies. The presentation of data for small cohorts and groups needs some serious thought.

The Ugly

Sometimes we should be careful what we wish for. I have major concerns with the application and interpretation of significance indicators in RAISE and have called for a more nuanced approach. And now we've got one. First thing to note is that red replaces blue as the colour of 'bad'. Many evidently aren't happy about this but the writing was on the wall once red dots were used in the Inspection dashboard and closing the gap section of the RAISE report. Red is also used to indicate data that is significantly below average in FFT reports. The second thing to note is that the double meaning of the colour coding, introduced in the inspection dashboard, continues. Red can either mean data that is significantly below average or signify a gap from national average that equates to one or more pupils in percentage terms. The third thing to note is that we now have shades of red and green defined as follows:

Pale red: indicates that data is significantly below national average but not in bottom 10%; or denotes negative % point difference equivalent to a 'small' number of pupils.

Bright red: indicates that data is significantly below national average and in bottom 10%, or denotes negative % point difference equivalent to a 'larger' number of pupils.

Pale green: indicates that data is significantly above national average but not in top 10%; or denotes positive % point difference equivalent to a 'small' number of pupils.

Bright green: indicates that data is significantly above national average and in top 10%; or denotes positive % point difference equivalent to a 'larger' number of pupils.

There is some serious blurring of the rules going on here. A significance test is a threshold and a school's results are either significant or they're not. Yet this approach will no doubt result in language such as 'very significant' and 'quite significant' entering the 'school improvement' lexicon, despite the bright red or bright green boxes actually being defined by a decile threshold rather than being the result of an additional significance test (e.g. a 99% confidence interval).  It's bad enough that people might talk in terms of degrees of significance; it's even worse that people will apply the term to data on which no significance test has been applied. Inevitably, we will hear disadvantaged gaps being described as 'very' or 'quite' significant because they are assigned a bright or pale red or green box, which in other parts of the report indicates statistical significance. Here, however, they relate to a gap equivalent to a certain number of pupils, and the thresholds used are entirely arbitrary; they are not the result of a statistical test. So we have colours meaning different things in different sections of the report - some denoting significance and others not - and shades of those colours defined by arbitrary thresholds. There is too much scope for confusion and misinterpretation, and schools will be forced to waste precious time compiling evidence to counter a narrative based on dodgy data.

No change there then.

Thursday, 7 July 2016

The progress measures you shouldn't attempt (but you're going to anyway)

The SATS results are out and many schools are feeling utterly dejected. Our tracking systems told us everything would be OK and we confidently gave predictions to governors, staff, LAs and others, of results that would at least be above floor. But then Tuesday happened and not even the country as a whole came close to floor standard. Almost half of pupils nationally did not meet the expected standard and schools are left reeling. Almost as soon as the scores were downloaded from NCA Tools, teachers were on twitter discussing progress measures:

1) How will progress be measured this year?

2) What will the floor standards for progress be?

3) How can we measure progress now?

The answer to the first question is: it will be a VA measure calculated like this - pretty much identical (in concept) to VA calculations of previous years.

The answer to the second question is: we don't know. They will certainly be negative; and now it seems they'll be very negative if the DfE really only wants a 1% point increase in the number of schools below floor.

The answer to the third question is: we can't. We have to wait until the autumn term for the VA scores to come out in RAISE. But that's not going to stop schools and others (I'm looking at you LAs and Academy chains) from having a go now. 

So, here are the three things that schools are going to attempt in advance of the real data being published. Three things that are pointless and most likely to be at odds with the official VA data; that are likely to cause further confusion and pain down the line, but that schools will do anyway.

1) Attempt to calculate VA

Surely we all understand how VA work by now. It involves comparing each pupil's KS2 score against the national average score for similar pupils (similar in terms of prior attainment based on KS1 APS). So, if you wanted to have a decent stab at it, you will first need to know the national average scaled score for each of these start points:

3 8 11.5 15 18.5
4.5 8.5 12 15.5 19
5.5 9 12.5 16 19.5
6 9.5 13 16.5 20
6.5 10 13.5 17 20.5
7 10.5 14 17.5 21
7.5 11 14.5 18

According to my calculations, using the new KS1 APS formula, those are the discrete APS outcomes derived from all possible combinations of W, L1, 2c, 2b, 2a, L3 for reading, writing and maths at KS1. We need to know the national average KS2 scaled score outcomes for each of those 34 prior attainment groups (more if we throw L4 into the KS1 mix). No one knows these figures at the moment. And even if we did, there are shrinkage factors to take account of school size, and no doubt other coefficients to consider as well, none of which we know right now. So what might some try instead?

First, they might just compare each pupil's KS2 score to the overall national average score of 103. That is not VA; that is relative attainment. It is not a progress measure. You are not comparing pupils against an appropriate benchmark linked to start point. It is meaningless. A non-starter. Don't do this.

Second, you might work out the 'in-school' average score for each start point and compare each pupil to that. Well, nice try and on the right track (it at least demonstrates awareness of VA methodology) but there is one major floor: you will always get the same outcome: 0. I know people have tried this in the past (had that conversation) and a headteacher actually emailed me some LA guidance this week that suggested schools do this. Seriously, don't. 

2) Subtract KS1 APS from KS2 score
As certain as death and taxes, give people a start figure and an end figure and they'll subtract the former from the latter. I bet people are doing this right now: subtracting KS1 APS from KS2 scaled score, and then inventing arbitrary thresholds to define 'expected' and 'more than expected' progress. Do you have a 2b pupil who achieved 107 in their maths test? Simply subtract 15 from 107. They've made 92 points of progress. That's excellent! Another only made 88 and fell short of the 90 point good progress threshold. Hmm, less than expected.

That's Numberwang! (credit: @simonraz :-)

3) Invent a progress matrix
We all love a progress matrix. I know I do (seriously, I do). So let's invent one now. First we start with the assumption that all 2b pupils should achieve scores of 100+. But what about the others? The L1, 2c, 2a and L3 pupils? Time for some arbitrary thresholds I reckon. To save you all the hassle of creating a progress matrix yourself, I've made one up for you:

It's so simple to use and easy to understand. What can possibly go wrong? Feel free to copy and fill out to present to SLT, Governors, SIPs, LA officers, Ofsted inspectors and that chap from academy head office. I guarantee they'll love it.

But I guarantee one other thing, too: none of this will bear any relation to the real VA data when it's published.

so why bother?

Wednesday, 6 July 2016

Calculating %EXS in RWM combined

The KS2 results are out (joy!) and we know the national RWM combined figure is 53% so obviously we now want to know how we compare. This is where the problems start. There are two mistakes people commonly make when working out this figure and they are the same mistakes that were made when we had L4 RWM:

1) take the lower of the three percentages for individual subjects.

This is the more forgivable of the two mistakes. If the percentages meeting the expected standard in reading, writing and maths are 69%, 72% and 77% then you take the lowest one (69%) as the combined figure. I get why people do this and sometimes this is correct but don't assume it is. It is certainly the case that your combined figure cannot be higher than the lowest of the three, but it can be a lot lower.

2) take the average of the three percentages for individual subjects.

This is a common mistake and one that needs to stop. You don't average percentages. Well, there are occasions when it could work but this really isn't one of them. 

Essentially, the combined figure is about Venn diagrams, the pupils that plot in the intersect of the three circles. Imagine you had 9 pupils and 3 pupils (33%) achieved the expected standard in each subject. Using the above methods your combined figure would be 33% RWM. But it could be 0% if we had discrete groups achieving the expected standard in each subject. The pupils that achieved it in maths, did not achieve it in reading and writing; those that achieved it in reading did not do so in maths and writing and so on. No pupils achieved all three and so none plot in the interect of the Venn diagram.

Imagine you are standing out on the playground on a winter's day and you note that some children are wearing hats, some are wearing gloves and some are wearing scarves. Some are wearing just one of the items, some are wearing two, and some are wearing all three. You would not calculate the percentage wearing all three items - hats, gloves and scarves - by working out the percentage wearing the individual items and averaging it. That would be nuts. You would simply count how many pupils were wearing all three items together. You could even draw a Venn diagram on the playground and get pupils to stand in the right parts of the circles. 

That is no different to our approach to calculating %EXS in RWM. You need the pupil level data: a spreadsheet with the names of the pupils and 3 columns for reading, writing and maths. Then just enter a Y or N according to whether or not they met the expected standard in each subject. When you're done, count the number of pupils that have a Y in all 3 columns and divide that figure by the total number of pupils to get the correct percentage. 

Or if you're feeling flash, write a clever =IF formula in excel to do it all for you. I do love an =IF formula.

Hopefully that all makes some sort of sense.