Wednesday, 14 December 2016

10 things I hate about data

I seem to spend a lot of time ranting these days. Recently I've been trying to rein it in a bit, be less preachy. It's counter productive to wind people up - need to get them on side - the problem is there are just so many opportunities to get annoyed these days. I'm turning into a data analyst who hates data. Well, a data analyst who hates bad data (as any decent data analyst should). And let's face it, there's a lot of bad data out there to get annoyed about. So, a few weeks ago I gave a conference talk entitled '10 things I hate about data' (it could have been much longer, believe me, but 10 is a good number).

Here's a summary of that talk.

1) Primary floor standards
We are now in a crazy world where the attainment floor standard is way above national 'average'. England fell below its own minimum expectation. How can that happen? On the 1st September 2016, the floor standard ceased to be a floor standard and became an aspirational target. But the DfE had already committed to having no more than 6% of schools below floor, which meant they had to set the progress thresholds so low that they just captured a handful of schools. I find it hard to apply the phrase 'sufficient progress' to scores of -5 and -7 and keep a straight face. So primary schools have four floor standards: one linked to attainment, which is way too high, and three relating to progress, which are way too low. If the school is below 65% EXS in reading, writing and maths, and below just one of the progress measures it is below floor. Unless that one happens to be writing, in which case chances are it'll be overlooked because writing data is junk. Oh, and if you are below just one progress floor then it has to be significantly below to be deemed below floor, which is ridiculous because it's not actually possible to have scores that low and for them not to be significantly below. Meanwhile, secondary schools with all the complexity of GCSE and equivalent data, have one single measure, progress 8, which captures the average progress made by pupils in up to 8 subjects. The floor standard at KS4 is half a grade below. Simple. Why can't primary schools have a similar single, combined-subject, progress-based floor measure?

2) Coasting
I hate this measure. I get what they're trying to do - identify schools with high attainment and low progress - but this has been so badly executed. Why 85%? What does that mean? How does 85% level 4 in previous years link to 85% achieving expected standards this year? Why are they using levels of progress medians for 2014 and 2015 when they could have used VA, which would make the progress broadly comparable with 2016? And why have they just halved the progress floor measures? (smacks of what my Dad would describe as a 'Friday afternoon job'). Remember those quadrant plots in RAISE? The ones that plotted relative attainment (which compared the school's average score against the national average score) against VA? Schools that plot significantly in the bottom right hand quadrant 3 years running - that would be a better definition of coasting. Unless they are junior schools, in which case forget it. Actually, until we have some robust data with accurate baselines, perhaps forget the whole thing.

3) The use of teacher assessment in high stakes accountability measures
The issue of KS2 writing has been discussed plenty already. We know it's inconsistent, we know it's unreliable, we know it's probably junk. Will it improve? No. Not until teacher assessment is removed from the floor standards at least. I'm not saying that writing shouldn't be teacher assessed, and that teacher assessment shouldn't be collected, but we can't be surprised that data becomes corrupted when the stakes are so high. The DfE evidently already understands this - they decided a year ago not to use writing teacher assessment in the progress 8 baseline from 2017 onward (the first cohort to have writing teacher assessed at KS2). It's not just a KS2 issue either. KS1 assessments form the baseline for progress measures so primary schools have a vested interest in erring on the side of caution there; and now that the DfE are using EYFSP outcomes to devise prior attainment groups for KS1, who knows what the impact will be on the quality of that data. All this gaming is undermining the status of teacher assessment. It needs a rethink.

4) The writing progress measure
Oh boy! This is a whopper. If you were doubting my assertion above that writing teacher assessment should be removed from floor standards, this should change your mind. Probably best to read this but I'll attempt to summarise here. Essentially, VA involves comparing a pupil's test score against the national average scores for pupils with the same start point. A pupil might score 97 in the test and the national average score for their prior attainment group is 94, so that pupil has a progress score of +3. This is fine in reading and maths (and FFT have calculated VA for SPaG) but it doesn't work for writing because there are no test scores. Instead, pupils are assigned a 'nominal score' according to their teacher assessment: WTS = 91, EXS = 103, GDS = 113, which is then compared against an unachievable fine graded benchmark. So, a pupil in prior attainment 12 (KS1 APS of 15 i.e. 2b in reading, writing and maths) has to achieve 100.75 in writing, which they can't. If they are assessed as meeting expected standard (nominal score of 103) their progress score will be +2.25; if they are assessed as working towards (nominal score of 91) their progress score will be -9.75. Huge swings in progress scores are therefore common because most pupils can't get close to their benchmarks due to the limitations of the scoring system. And I haven't got space to here to go discuss the heinousness of the nominal scoring system applied to pre-key stage pupils except to say that it is pretty much impossible for pupils below the level of the test to achieve a positive progress score. So much the for claim in the primary accountability document that the progress measures would reflect the progress made by ALL pupils. Hmmm.

5) The death of CVA
In 2011, the DfE stated that 'Contextual Value Added (CVA) goes further than simply measuring progress based on prior attainment [i.e. VA] by making adjustments to account for the impact of other factors outside of the school’s control which are known to have had an impact on the progress of individual pupils e.g. levels of deprivation. This means that CVA gives a much fairer statistical measure of the effectiveness of a school and provides a solid basis for comparisons.' Within a year, they'd scrapped it. But some form of CVA is needed now more than ever. Currently, pupils are grouped and compared on the basis of their prior attainment, without any account taken of special needs, length of time in school, number of school moves or deprivation. This is a particular issue for low prior attainment groups, which commonly comprise two distinct types of pupils: SEN and EAL. Currently, no distinction is made and these pupils are therefore treated the same in the progress measures, which means they are compared against the same end of key stage benchmarks. These benchmarks represent national average scores for all pupils in the particular prior attainment group, and are heavily influenced by the high attainment of the EAL pupils in that group, rendering them out of reach for many SEN pupils. Schools with high percentages of SEN are therefore disadvantaged by the current VA measure and are likely to end up with negative progress scores. The opposite is the case for schools with a large proportion of EAL pupils. This could be solved by either introducing form of CVA or by removing SEN pupils from headline measures. This of course could lead to more gaming of the system in terms of registering pupils as SEN or not registering as EAL, but the current system is unfair and needs some serious consideration.

6) The progress loophole of despair
This is nuts! Basically, pupils that are assessed as pre-key stage are included in progress measures (they are assigned a nominal score as mentioned above), whereas those assessed as HNM (in reading and maths) that fail to achieve a scaled score (i.e. do not achieve enough raw marks on the test) are excluded from progress measures, which avoids huge negative progress scores. I have seen a number of cases this year of HNM pupils achieving 3 marks on the test and achieving a scaled score of 80. Typically they end up with progress deficits of -12 or worse (sometimes much worse), which has a huge impact on overall school progress. Removing such pupils often makes the difference between being significantly below and in line with average. And the really mad thing is that if those pupils had achieved one less mark on the test, they wouldn't have achieved a scaled score and therefore would not have been included in the progress measures (unlike the pre-key stage pupils). Recipe for tactical assessment if ever I saw one.

7) The one about getting rid of expected progress measures
The primary accountability document states that 'the ‘[expected progress] measure has been replaced by a value-added measure. There is no ‘target’ for the amount of progress an individual pupil is expected to make.’ Yeah, pull the other one. Have you seen those transition matrices in RAISE (for low/middle/high start points) and in the RAISE library (for the 21 prior attainment groups)? How many would really like to see those broken down into KS1 sublevel start points? Be careful what you wish for. Before we know it, crude expectations will be put in place, which will be at odds with value added and we're back to square one. Most worrying are the measures at KS1 involving specific early learning goals linked to end of KS1 outcomes, and the plethora of associated weaknesses splashed all over page 1 of the dashboard. Teachers are already referring to pupils not making 'expected progress' across from EYFS to KS1 on the basis of this data. Expected progress and VA are also commonly conflated, with estimates viewed as minimum targets. In every training session I've run recently, a headteacher has recounted a visit by some school improvement type who has shown up brandishing a copy of the table from the accountability guidance, and told them what scores each pupil is expected to get this year. Expected implies that it is prescribed in advance, and yet VA involves comparison against the current year's averages for each prior attainment group and we don't know what these are yet. Furthermore, because it is based on the current year's averages, half the pupils nationally will fall below the estimates and half will hit or exceed them. That's just how it is. Expected progress is the opposite of VA and my response to anyone confusing the two is: tell me what the 2017 averages are for each of the 21 prior attainment groups, and i'll see what I can do. I spoofed this subject here, by the way.

8) Progress measures in 2020
Again, this. Put simply, the basis of the current VA measure is a pupil's APS at KS1. How are we going to do this for the current Y3? How to I work out the average for EXS, WTS, EXS? Will the teacher assessments be assigned a nominal value? How many prior attainment groups will we have in 2020 when this cohort reach the end of KS2. Currently we have 21 but surely we'll have fewer considering there are now fewer possible outcomes at KS1, which means we'll have more pupils crammed into a smaller number of broader groups. Such a lack of refinement doesn't exactly bode well for future progress measures. Remember that all pupils in a particular prior attainment group will have the same estimates at the end of KS2, so all your EXS pupils will be lumped into a group with all other EXS pupils nationally and given the same line to cross. This could have been avoided if the KS1 test scores were collected and used as part of the baseline, but they weren't so here we are. 2020 is going to be interesting.

9) Colour coding used in RAISE
Here is a scene from the script I'm working on for my new play, 'RAISE (a tragedy)'.

HT: "blue is significantly below, green is sigificantly above, right?"
DfE: "No. It's red and green now"
HT: "right, so red is significantly below, and green is significantly above. Got it"
DfE: "well, unless it's prior attainment"
HT: "sorry?"
DfE: "blue is significantly below if we're dealing with prior attainment"
HT: "so blue is significantly below for prior attainment but red is significantly below for other stuff, and green is significantly above regardless. And that's it?"
DfE: "Yes"
HT: "You sure? You don't look sure."
DfE: "Well...."
HT: "well what?"
DfE: "well, it depends on the shade?"
HT: "what shade? what do you mean, shade?"
DfE: "shade of green"
HT: "shade of green?"
DfE: "or shade of red"
HT: "Is there a camera hidden here somewhere?"
DfE: "No. Look, it's perfectly simple really. Dark red means significantly below and in the bottom 10% nationally, light red means significantly below but not in the bottom 10%; dark green is significantly above and in the top 10%, light green is significantly above but not in the top 10%. See?"
HT: "Erm....right so shades of red and green indicating how significant my data is. Got it."
DfE: "Oh no. We never say 'how significant'. That's not appropriate, statistically speaking"
HT: "but, the shades...."
DfE: "well, yes"
HT: *sighs* "OK, shades of red and green that show data is significantly below or above and possibly in the bottom or top 10%. Right, got it"
DfE: "but only for progress"
HT: "Sorry, what?"
DfE: "we only do that for progress"
HT: "but I have dark and light green and red boxes for attainment, too. Look, here on pages 9 and 11 and 12. See?"
DfE: "Yes, but that's different"
HT: "How is it different? HOW?"
DfE: "for a start, it's not a solid box, it's an outline"
HT: "Is this a joke?"
DfE: "No"
HT: "So, what the hell do these mean then?"
DfE: "well those show the size of the gap as a number of pupils"
HT: "are you serious?"
DfE: "Yes. So work out the gap from national average, then work out the percentage value of a pupil by dividing 100 by the number of pupils in that group. Then see how many pupils you can shoehorn into the gap"
HT: "and the colours?"
DfE: "well, if you are 2 or more pupils below that's a dark red box, and one pupil below is a light red box, and 1 pupil above that's a light green box, and you get a dark green box if you are 2 or more pupils above national average"
HT: "and what does that tell us?"
DfE: "I'm not sure, but -2 or lower is well below, and +2 or higher is well above. You may have seen the weaknesses on your dashboard"
HT: "So let me get this straight. We have dark and light shades of red and green to indicate data that is either statistically below or above, and in or not in the top or bottom 10%, or gaps that equate to 1 or 2 or more pupils below or above national average. Am I there now?"
DfE: "Yes, well unless we're talking about prior attainment"
HT: "Oh, **** off!"

Green and red lights flash on and off. Sound of rain. A dog barks.

10) Recreating levels
We've been talking about this for nearly 2 years now and yet I'm still trying to convince people that those steps and bands commonly used in tracking systems - usually emerging, developing, secure - are essentially levels by another name. Instead of describing the pupil's competence in what has been taught so far - in which case a pupil could be 'secure' all year - they instead relate to how much of the year's curriculum has been achieved, and so 'secure' is something that happens after Easter. Despite finishing the previous year as 'secure' they start the next year as 'emerging' again (as does everyone else). Pupils that have achieved between, say, 34% and 66% of the year's curriculum objectives are developing, yet a pupil that has achieved 67% or more is secure. Remember those reasons for getting rid of levels? how they were best-fit and told us nothing about what pupils could or couldn't do; how pupils at either side of a level boundary could have more in common than those placed within a level; how pupils could be placed within a level despite having serious gaps in their learning. Consider these reasons. Now look at the the example above, consider your own approach, and ask yourself: is it really any different? And why have we done this? We've done it so we can have a neat approximation of learning; arbitrary steps we can fix a point score to so we can count progress even if it it's at odds with a curriculum 'where depth and breadth of understanding are of equal value to linear progression'. Then we discover that once pupils have caught up, they can only make 'expected progress' because they don't move on to the next year's content. So we shoehorn in an extra band called mastery or exceeding or above, with a nominal bonus point so we can show better than expected progress for the most able. These approaches have nothing to do with real learning; they've got everything to do with having a progress measure to keep certain visitors happy. It's all nonsense and we need to stop it.

Merry Christmas!




2 comments:

  1. Absolutely superb summary and made me feel so much better about my red and green random fairy lights scattered RAISE.

    ReplyDelete
  2. TBH
    Why do we need targets or tracking at all.
    Arthur Dent dared to question the assumption that we need by-passes.
    Good on him.

    ReplyDelete