Monday, 5 February 2018

The Progress Obsession

Despite my best efforts to convince people of the futility of the exercise, probably the most common question I get asked is:

"How do I show progress?" 

Why is this futile? Because what they are really asking is: "How do I use data to 'prove' that pupils have made 'good' progress?"

The reason for the inverted commas is because data does not really 'prove' anything - especially when it's based on something as subjective as teacher assessment - and what constitutes 'good' progress varies from pupil to pupil. What is regarded as 'good' for one pupil, may not be enough for the next. One pupil's gentle stroll is another pupil's mountain to climb. Progress is a multi-faceted thing. It is catching up, filling gaps, deepening understanding, and overcoming those difficult barriers to learning. It can be accelerating through curriculum content, or it can be consolidating what has been learnt; it can mean no longer needing support with fundamental concepts, or it can be about mastering complex skills. Different pupils progress at different rates and get to their destination in different ways.

Progress is not simple, neat or linear - there is no one-size-fits-all pathway - and yet all too often we assume it is for the sake of a convenient metric. We are so desperate for neat numbers - for numerical proxies of learning - that we are all too willing to overlook the fact that they contradict reality, and in some cases may even shoot us in the foot by presenting an average line that no one follows in reality. Rather than a line that fits the pupil, we make pupils fit the line.

Basically, we want two numbers that supposedly represent pupils' learning at different points in time. We then subtract the first number from later one and, if the numbers go up - as they invariably do - then this is somehow seen as evidence of the progress that pupils have made. Perhaps if they have gone up by a certain amount then this is defined as 'expected', and if it's gone up by more than that it's 'above expected'. We can now RAG rate our pupils, place them into one of three convenient boxes, ready for when Ofsted or the LA advisor pay a visit. Some pupils are always red, and that frustrates us because it doesn't truly reflect the fantastic progress those children have actually made, but what can we do? That's the way the system works. We have to do this because we have to show progress.


First, let's get one thing straight: data in a tracking system just proves that someone entered some data in a tracking system. It proves nothing about learning - it could be entirely made up. The more onerous the tracking process - remember that 30 objectives for 30 pupils is 900 assessments - the more likely teachers are to leave it all to the last minute and block fill. The cracks in the system are already beginning to show. If we then assign pupils into some sort of best-fit category based on how many objectives have been ticked as achieved (count the green ones!) we have recreated levels. These categories are inevitably separated by arbitrary thresholds, which can encourage teachers to give the benefit of the doubt and tick the objectives that push pupils into the next box (depending on the time of year of course - we don't want to show too much progress too early). Those cracks are getting wider. And finally, each category has a score attached, which now becomes the main focus. The entire curriculum is portioned into equal units of equal value and progress through it is seen as linear. Those cracks have now become an oceanic rift with the data on one side and the classroom on the other.

Assessment is detached from learning.

This rift can be healed but only if we a) wean ourselves off our obsession with measuring progress, and b) sever the link between teacher assessment and accountability. Teacher assessment should be ring-fenced: it should be used for formative purposes alone. Once we introduce an element of accountability into the process, the game is lost and data will almost inevitably become distorted. Besides, it's not possible to use teacher assessment to measure progress without recreating some form of level, with all their inherent flaws and risks.

Having a progress measure is desirable but does our desire for data outweigh the need for accuracy and meaning? Do our progress measures promote pace at the expense of depth? Can they influence the curriculum that pupils experience? And can such measures lead to the distortion of data, rendering it useless? It is somewhat ironic that measures put in place for the purposes of school improvement may actually be a risk to children's learning.

It's worth thinking about.

Saturday, 13 January 2018

Coasting? You’re taking the p*ss!

I recently read an Ofsted report for a school judged inadequate and placed into special measures. The report contained the following statement in the ‘outcomes for pupils’ section: 

The school met the government’s definition of a coasting school in 2016 and looks likely to do so again in 2017. 

Although the school has not been officially identified as ‘coasting’ - it would need to be below the so-called ‘coasting’ elements three years in a row to receive that label - it is fairly damning and anyone reading the report will certainly draw that conclusion.

And this really winds me up.

I know this school. I know that it is in an area of very high deprivation - probably the highest in its LA. Many of its pupils have SEND, percentages of pupils on free school meals are well above average, and pupils on entry to the school are well below ‘typical’. Yes, results at key stage 2 are low, but how appropriate is it to describe such a school as coasting? 

In my interpretation of the word, this particular school is as far from coasting as you can get. Coasting suggests it’s all a bit of a doss. But this school is not some kind of country club, where teachers are sitting around relaxing whilst able, well supported, high prior attaining children just get on with their work and pop out the other end with decent results. This is a school with very high levels of disadvantage; with many social challenges. A school where teachers have to sprint to stand still. In short, describing this school as coasting is, quite frankly, taking the piss.

The coasting measure is massively flawed. If anything, it should be identifying schools with above average results but low progress; those schools that benefit from the high attainment of their intakes without trying too hard (if such a school exists). Instead, we have another measure that singles out the lowest performing schools, usually in areas of high deprivation with the most disadvantaged pupils.

Or junior schools, of course. Let’s not forget those junior schools, that are disproportionately represented amongst the ranks of coasting schools. 

Essentially, the coasting measure is an additional floor standard seemingly designed to catch those schools that managed to scrape through the first round. I still haven’t decided whether it’s ‘Floor Plus’ (because the thresholds are higher) or ‘Floor Lite’ (because it’s over three years), but whatever it is, it’s not doing what it should be doing. 

If the government really wants to identify so-called coasting schools, this measure needs a complete rethink. Remember the quadrant plots in RAISE, where VA was plotted against relative attainment? That would provide a far better method. Schools plotting ‘significantly’ in the bottom right quadrant (i.e. those with above average attainment and well below average progress) three years running are the coasting schools. And perhaps those schools plotting ‘significantly’ in the bottom left quadrant (well below average attainment and progress) three years running are those deemed to be below floor. This would provide a clearer distinction between the two types of schools. Certainly clearer and more logical than the current, confused (and confusing) approach. 

Or, better still, just scrap the whole damn measure. 

Sunday, 10 December 2017

How to find national figures for groups in FFT Aspire

One of the frustrations of ASP is the use of different national comparators for the various pupil groups. The comparator is shown in the national column in the data tables, accessed by clicking the 'explore data in detail' link anywhere in the ASP system, and the comparator type can be found by clicking on the question mark beside each group. There are three categories: 'same', 'other', and 'all', which are defined as follows:
  • Same: the group is compared to the national figure for the same group e.g. boys in the cohort compared to boys nationally
  • Other: the group is compared to the national figure for the opposite group e.g. disadvantaged pupils in the cohort are compared to 'other' (i.e. non-disadvantaged) pupils nationally.
  • All: the group is compared to the overall national figure for all pupils e.g. SEN pupils are compared to overall national figures.
I understand the thinking behind comparing disadvantaged pupils to non-disadvantaged pupils (i.e. closing the gap; I still can't quite bring myself to use the phrase 'diminishing the difference') but knowing the national figures for disadvantaged pupils is useful, especially if you fall in that grey zone between the two results. As for comparing the SEN pupils to overall national figures for all pupils, I really can't get my head round this. Clearly, here there is a need to know the national figures for SEN.

So, how do we find this data? We can download it from the DfE Statistics site, but we have to wait a few months from getting results before we get the release that contains results by pupil characteristics. The KS2 data is due on 14th December and KS4 will not be out until 25th January. Fortunately, if you use FFT Aspire, you can access the data much earlier. Here's how.

1) Login to FFT Aspire, click on 'self evaluation' and click 'attainment and progress'

2) Select the indicator that you are interested e.g. % Expected standard reading. To do this click on indicators, uncheck one of the existing selections, and select the desired indicator.

3) Now select the group you are interested in, e.g. SEN Support, by clicking on filters and selecting the desired group

4) Note the change in the national figure beneath the school result (i.e. under the main indicator graphic in large font o n the left)

Congratulations! You have now found the comparable national figure in FFT.

Note that in this example, 37% of SEN support pupils achieved the expected standard in reading in 2017. In ASP, SEN Support are compared against the overall national figure of 71%. A huge difference.

That's why it's definitely worth knowing how to find this data.

Friday, 24 November 2017

IDSR+FFT Summary report template

As promised, here is my template that attempts to summarise IDSR and FFT data into 3-4 pages. Obviously you'll need your IDSR and FFT dashboards, and probably a spare couple of hours. Rather than write a lengthy blog on how to complete it, I've supplied an example (see links below).

Difference no. pupils is provided by the IDSR in some cases but where it isn't, it's calculated in the usual way:

Work out the % gap between the result and the national figure e.g.

School = 56%, National = 72%, gap = -16%

Convert that to a decimal i.e. -0.16

Multiply that by the number of pupils in the group or cohort (e.g. 28)

28 x -0.16 = -4.48

Therefore the gap equates to 4 pupils (in this case 4 pupils below national).

See notes below the tables for explanations. And tweet me if you get stuck.

Link to blank template is here

Link to completed example is here

Tuesday, 21 November 2017

Using standardised scores in progress matrices

Schools are always looking for ways to measure and present progress. Most primary schools have tracking systems that offer some sort of progress measure, but these are almost always based on teacher assessment and involve some sort of level substitute: a best-fit band linked to coverage of the curriculum with a point score attached. Increasingly schools are looking beyond these methods in search of something more robust and this has lead them to standardised tests.

One of the benefits of a standardised test is that they are – as the name suggests – standardised, so schools can be confident that they are comparing the performance of their pupils against a large sample of pupils nationally. Another benefit is that schools will be less reliant on teacher assessment for monitoring of standards - one of the key points made in the final report of the Commission on Assessment without Levels was that teacher assessment is easily distorted when it’s used for multiple purposes (i.e. accountability as well as learning). Standardised tests can also help inform teacher assessment so we can have more confidence when we describe a pupil as ‘meeting expectations’ or ‘on track’.

And finally, standardised tests can provide a more reliable measure of progress across a year, key stage or longer. However, schools often struggle to present the data in a useful and meaningful way. Scatter plots – plotting previous test scores against latest - are useful because they enable us to identify outliers. A line graph could also be used to plot change in average score over time, or show the change in gap between key groups such as pupil premium and others. But here I want to concentrate on the humble progress matrix, which plots pupil names into cells on a grid based on certain criteria. These are easily understood by all, enable us to spot pupils that are making good progress and those that are falling behind, and they do not fall into the trap of trying to quantify the distance travelled. They can also help validate teacher assessment and compare outcomes in one subject against another. In fact, referring to them as progress matrices is doing them a disservice because they are far more versatile than that.

But before we can transfer our data into a matrix, we first need to group pupils together on the basis of their standardised scores. Commonly we see pupils defined as below, average and above using the 85 and 115 thresholds (i.e. one standard deviation from the mean) but this does not provide a great deal of refinement and means that the average band contains 68% of pupils nationally. It therefore makes sense to further subdivide the data and I think the following thresholds are useful:

<70: well below average
70-84: below average
85-99: low average
100-115: high average
116-130: above average
>130: well above average

By banding pupils using the above thresholds, we can then use the data in the following ways:

1)      To show progress.
Plot pupils’ current category (see above) against the category they were in previously. The start point could be based on a previous standardised test taken, say, at the end of last year; or on the key stage 1 result, or an on-entry teacher assessment. Pupils names will plot in cells and it is easy to spot anomalies.

2)      To compare subjects
As above but here we are plotting the pupils’ category (again, based on the thresholds described above) in one subject against another. We can then quickly spot those pupils that are high attaining in one subject and low in another.

3)      To validate and inform teacher assessment
By plotting pupils’ score category against the latest teacher assessment in the same subject, we can spot anomalies – those cases where pupils are low in one assessment but high in the other. Often there are good reasons for these anomalies but if it’s happening en masse – i.e. pupils are assessed low by the teachers but have high test scores – then this may suggest teachers are being too harsh in their assessments. It is worth noting that this only really works if schools are using what is becoming known as a ‘point in time’ assessment, where the teacher’s assessment reflects the pupil’s security in what has been taught so far rather than how much of the year’s content they’ve covered and secured. In a point in time assessment, pupils may be ‘secure’ or ‘above’ at any point during the year, not just in the summer term.

But what will Ofsted think?

The myth-busting section of the Ofsted handbook has this to say about tracking pupil progress:

Ofsted does not expect performance and pupil-tracking information to be presented in a particular format. Such information should be provided to inspectors in the format that the school would ordinarily use to monitor the progress of pupils in that school.

Matrices provide a neat and simple solution: they are easily understood by all, and they allow us to effectively monitor pupil progress without resorting to measuring it.

Definitely worth considering. 

Tuesday, 7 November 2017

Analyse School Performance summary template (primary)

Many of you will have downloaded this already but I thought it'd be useful to put it on my blog. For those who don't already have it, it's a rather low tech and unexciting word document designed to guide you through ASP and pull out the useful data. Aim is to summarise the system down to a few pages.

You can download it here

The file should open in Word Online. Please click on the 3 dots top right to access the download option. Please don't attempt to edit online (it should be view only anyway). Also, chances are it will be blocked by schools computers (schools always block my stuff).

A couple of points about the template:

1) Making sense of confidence intervals
Only worry about this is progress is significantly below average, or if data is in line but high and close to being significantly above.

If your data is significantly below average, take the upper limit of the confidence interval (it will be negative e.g. -0.25). This shows how much each pupils score needs to increase by for your data to be in line (0.25 points per pupil, or 1 point for every 4th pupil). Tip: multiply this figure by the number of pupils in the cohort (eg. -0.25 x 20 pupils = -5). If you have a pupil - for whom you have a solid case study on - that has a score at least equal to the result (i.e. -5 in this case), removing that pupil from the data should make your data in line with national average.

If your data is in line and you are interested to know how far it would need to shift to be significantly above, note the lower part of the confidence interval (it will be negative, e.g. -0.19). This again shows how much your data needs to shift up by, but in this case to be significantly above. In this case, each child's score needs to increase by 0.2 points for the overall progress to be significantly above (we need to get the lower limit of the confidence interval above 0 so it needs to rise by slightly more than the lower confidence limit). Obviously pupils cannot increase their scores by 0.2, so best to think of it as 1 point for every 5th child. Or. as above, multiply the lower confidence limit by the number of pupils in the cohort (e.g. -0.2 x 30 pupils = -6). If you have a pupil with a score at least equal to this result (i.e. -6) then removing them from the data should make the data significantly above average.

Easiest thing to do is model it using the VA calculator, which you can download from my blog (see August) or use the online version

2) Difference no. pupils
This has caused some confusion. It's the same concept as applied in last year's RAISE and dashboards. Simply take the percentage gap between your result and national average (e.g. -12%), turn it into a decimal (e.g. -0.12) and multiply that by the number of pupils in the cohort (e.g. 30). In this case we work out 'diff no. pupils' as follows: -0.12 x 30 = -3.6. This means the schools result equates to 3 pupils below average. If the school result is above national then it works in the same way, it's just that the decimal multiplier is positive.

If you are calculating this for key groups, then multiply by the number in the group, not the cohort. For example, the 80% of the group achieved the result against a national group result of 62%, which means the group's result in 18% above national. There are 15 pupils in the group so we calculate 'diff no. pupils' as follows: 0.18 x 15 = 2.7. The group result therefore equates to 2 pupils above national.

I hope that all makes sense.

Happy analysing.

Wednesday, 25 October 2017

MATs: monitoring standards and comparing schools

A primary school I work with has been on the same journey through assessment land as many other schools up and down the country. Around two years ago they began to have doubts about the tracking system they were using - it was complex and inflexible, and the data it generated had little or no impact on learning. After much deliberation, they ditched it and bought in a more simple, customisable tool that could be set up and adapted to suit their needs. A year later and they have an effective system that teachers value, that provides all staff with useful information, and is set up to reflect their curriculum. A step forward.

Then they joined a MAT.

The organisation they are now part of is leaning on them heavily to scrap what they are doing and adopt a new system that will put them back at square one. It's one of those best-fit systems in which all pupils are 'emerging' (or 'beginning') in autumn, mastery is a thing that magically happens after Easter, and everyone is 'expected' to make one point per term. In other words, it's going back to levels with all their inherent flaws, risks and illusions. The school tries to resist the change in a bid to keep their system but the MAT sends data requests in their desired format, and it is only a matter of time before the school gives in.

It is, of course, important to point out that not all MATs are taking such a remote, top down, accountability driven approach, but some are still stuck in a world of (pseudo-) levels and are labouring under the illusion that you can use teacher assessment to monitor standards and compare schools, which is why I recently tweeted the following:

This resulted in a lengthy discussion about the reliability of various tests, and the intentions driving data collection in MATs. Many stated that assessment should only be used to identify areas of need in schools, in order to direct support to the pupils that need it; data should not be used to rank and punish. Of course I completely agree, and this should be a strength of the MAT system - they can share and target resources. But whatever the reasons for collecting data - and lets hope that its done for positive rather than punitive reasons - let's face it: MATs are going to monitor and the compare schools and usually this involves data. This brings me back to the tweet: if you want to compare schools, don't use teacher assessment, use standardised tests. Yes, there may be concerns about the validity of some tests on the market - and it is vital that schools thoroughly investigate the various products on offer and choose the one that is most robust, best aligned with their curriculum, and will provide them with the most useful information - but surely a standardised test will afford greater comparability than teacher assessment.

I am not saying that teacher assessment is always unreliable; I am saying that teacher assessment can be seriously distorted when it is used for multiple purposes (as stated in the final report of the Commission on Assessment without Levels). We need only look at the issues with writing at key stage 2, and the use of key stage 1 assessments in the baseline for progress measures to understand how warped things can get. And the distortion effect of high stakes accountability on teacher assessment is not restricted to statutory assessment; it is clearly an issue in schools' tracking systems when that data is not only used for formative purposes, but also to report to governors, LAs, Ofsted, RSCs, and senior managers in MATs. Teacher assessment is even used to set and monitor teachers' performance management targets, which is not only worrying but utterly bizarre.

Essentially, using teacher assessment to monitor standards is counter productive. It is likely to result in unreliable data, which then hides the very things that these procedures were put in place to reveal. And even if no one is deliberately massaging the numbers, there is still this issue of subjectivity: one teacher's 'secure' is another teacher's 'greater depth'. We could have two schools with very different in-year data: school A has 53% of pupils working 'at expected' whereas school B has 73%. Is this because school B has higher attaining pupils than school A? Or is it because school A has a far more rigorous definition of 'expected'?

MATs - and other organisations - have a choice: either use standardised assessment to compare schools or don't compare schools. In short, if you really want to compare things, make sure the things you're comparing are comparable.