Apr 11, 2023
Education is often touted as data- or evidence-driven. But in this discussion, John Dues contends that educational data is often fiction, given how easy it is to distort, both via the inputs and outputs and through manipulation.
0:00:02.6 Andrew Stotz: My name is Andrew Stotz and I'll be your host as we continue our journey into the teachings of Dr. W. Edwards Deming. Today, I'm continuing my discussion with John Dues, who is part of the new generation of the educators striving to apply Dr. Deming's principles to unleash student joy in learning. The topic for today is Data is Meaningless Without Context. John, take it away.
0:00:28.3 John Dues: Yeah, thanks for having me back, Andrew. I'm thinking a lot about educational data, and I think about how it's often presented, and I think so often, what we're actually doing with our educational data is what I call writing fiction, which is taking a lot of liberties with the data that comes into our system, whether it's state testing data or some other type of data that gets reported out to the public, and we often sort of manipulate that data or distort that data in a way that paints our organization or our school system or our state in a positive light, and I think we do that sometimes at the detriment of actually working to improve those organizations or those systems because we spend so much time trying to paint this positive picture instead of just putting the data out there.
0:01:26.3 AS: And it's interesting you talked... One of the interesting things about what you're saying is that it could be accurate and good data, but it's just the context or the structure of how it's presented makes it meaningless.
0:01:39.9 JD: Yeah, we try so hard to sort of paint it in this positive light to make it look like we're doing a good job. Everybody wants to do a good job, but I think we often do that at the detriment of our systems.
0:01:54.7 AS: One of the things that made me think about it, in the financial world, we have a code of ethics, and that is basically that... Particularly for CFA charter holders, financial analysts, that you have to present a complete picture of your performance. So if you have 10 customers that you're managing their money and one of them, you really bombed out and you decide you're gonna do the average of the nine that you did well on and then go out to your clients and say, "This is my performance," that's a very... You have an obligation to accurately represent your performance. And when I think about it in all the charts and graphs that people are making in education all around, I would say that most people probably are just, I would call it CYA, cover your ass type of charts [laughter] of, "How do we make this look good?"
0:02:43.9 JD: Yeah, I think... I read somewhere that there's sort of three ways you can respond to your data. You can actually work to improve the system, which would be a positive, and then the other two ways are two forms of a negative, one is you could distort the system itself, or you could actually distort the data. And a lot of times, there's not sort of a nefarious motivation underneath that distortion, but there's sort of, again, this desire to paint your organization or your system in this positive light. So sometimes they're straight up unethical behavior or cheating, but most of the time, that's not what I'm seeing and that's actually not what I'm talking about here today. It's more of this sort of taking liberties, writing fiction. "Okay, we declined from two years ago, but it's up from last year." Those types of sort of distortions of the data that I think are fairly common in education sector, probably all sectors too, so.
0:03:49.9 JD: I think... Maybe I'll share my screen for the folks that have video and I'll talk through it for the listeners that don't have video, but one of the things I often think of and focus on is state testing data, because so many people are looking at that data all the way from State Departments of Education, the school system, the individual schools, the individual teachers and classrooms with their students, and then of course, families get these state testing reports as well.
0:04:23.8 JD: And a handful of years ago, I was looking at one of these reports from the Ohio Department of Education and sort of picture this fancy, glossy, colorful PDF. It's got this big headline on it, it says, "Ohio students continue to show improved achievement in academic content areas." Then it's got a table with all the state tests, all the different grade levels, and three columns for three different years of data. And then in the last column, there's these up green arrows for where there's been improvement from year over year and then these red down arrows for where there's been a decline, and I was thinking to myself, "Well, in some of these areas, one, some of the percentage changes are so small that just on that... In that realm, they're sort of meaningless, like fifth grade science goes from 68.3% in one year and it goes up to 68.5% in another year. That's essentially a rounding error when you're talking about 100,000 or so students that are taking the test. I think calling that improvement is a stretch at best.
0:05:39.0 JD: And then I was focusing on third grade reading specifically because that's such a critical area. In Ohio, there's actually a third grade reading guarantee, so if you don't pass the test, there's the potential there that you could get held back in third grade, so there's a lot of focus on that data. So I was reading on in that state education department document. It said, "Well, third grade did see this decrease this year, but when you look back two years, it actually had... Third graders actually had an increase of proficiency." So again, you actually have a decline from this previous school year to the more recent school year in this document, and they're still making this claim because if you go back two years versus this most recent year, you do see improvement, and so you start to think to yourself, "Well, what is improvement? Do we have a definition of improvement? And if so, what has to be present?"
0:06:43.4 JD: And a few years ago, I came across this definition in sort of a seminal work in our area called The Improvement Guide, and the author sort of outlined a definition for improvement, and it sort of has these three components, and this made a lot of sense to me. If you're gonna claim improvement, you have to, one, alter how work or activity is done or the makeup of a tool. So you had to change something. Basically change something about the work you're doing. That change had to produce these visible positive differences in results relative to historical norms, and then the third thing is it had to have a lasting impact. And so when I go back and I think about that state testing data or really any type of data, you start to ask this question, Is this really improvement, or again, is this writing fiction? Is this not really improvement, but we're twisting the numbers to sort of fit our narrative?
0:07:45.0 JD: So when we think about that state testing data, do we have knowledge for how worker activity has been altered systematically. And if I can't point to that, then how am I gonna take the so-called improvement and bring it to other places in the state that may not have had those same improvements? Do I have these visible positive differences in results going back and comparing to historical norms, not just last year or even two years ago, but five or six or eight or 10 years worth of data. And then have I been able to sustain that improvement? Has there been a lasting impact? Have I been able to hold the gains? And if I haven't been able to do those three things, point to what we change compared to historical norms and then sustain that improvement, I would argue that we haven't really brought about improvement. We can't claim that we've improved our system.
0:08:46.9 AS: It's interesting. Before we go on the numbers that you were showing, roughly, the average there is something like 60%. What's the 40? That 60% is what? And that means 40% is not that.
0:09:07.7 JD: Yeah, I'll go back. So when you're thinking about state test scores, most states have some type of threshold, like we have this goal that X percent of our students are gonna be considered proficient on any given test. So in Ohio, that threshold is 80%. So the state says, in order to meet the benchmark, any given school needs to have 80% of its students, let's say, on third grade reading test have to meet this proficiency standard. And so what we saw in this particular data is that in the 2015-16 school year, 54.9% of the kids met that proficiency threshold. The following year in '16-17, 63.8% of the kids met that threshold, and then in the most recent year in this particular testing document in '17-18 61.2% of the kids were proficient. So just about 40...
0:10:04.8 AS: So even if it was a sizable increase, it wasn't just statistically insignificant, it's still roughly 40% of the students aren't proficient. No matter even what the government says about what's the minimum standard, it would be hard to really argue too much about improvement when you're so low. [chuckle]
0:10:32.8 JD: That's right, yeah. And that's what you often see in these types of these documents. So 40%, a significant minority of students are not proficient on the third grade reading test, and 60% are, and there's these incremental increases and decreases depending on the year that you're looking at.
0:10:54.6 AS: It's like the Titanic heading for an iceberg and you say, "I've turned the ship one degree, but we're still gonna hit the iceberg."
0:11:01.9 JD: But we're still gonna hit, yep, yep.
0:11:04.3 AS: Alright, keep going.
0:11:06.0 JD: Yeah. So I think what's really important thinking about data in context, when you start actually stepping back and saying, "Okay, let's look at third grade reading over the course of 16 or 17 years versus three years," this very different story emerges. Part of that story is that context, so what has changed about Ohio's third grade state reading system over the course of those years? So if you go back all the way back to the 2003-2004 school year, you see Ohio is giving a particular test called the Ohio Achievement test, and you see as that's administered each year for six or seven or eight years, the results are sort of bouncing around this average, somewhere in the neighborhood of 77-78%. Then you have a change in about the 2011-12 school year. Now, we're given this test called the Ohio Achievement Assessment, but it's pretty similar, just the name has changed, the test itself is still the same, and you see basically these very similar results. And then all of a sudden, you sort of fast forward to the 14-15 school year. Anybody that's an educator from back in that time period, they'll sort of recognize that now we're getting these new common core standards, these more rigorous college and career-aligned standards, we start giving these new tests.
0:12:38.7 JD: So Ohio switches to the PARCC Test for the '14-15 school year for one year, and even then, the test itself changed pretty significantly in terms of format, but you still see pretty similar results that you've seen for the past 11, 12, 13 years. Then all of a sudden, that next school year, that 2015-16 school year, so that's the first year from that testing document, you see the results drop off a cliff and you start thinking, "Well, what happened to third graders?"
0:13:11.7 AS: Right. From, let's just say about 77 down to the next data point is 55.
0:13:18.6 JD: Yeah, just under 55% now. So you have this just about a 20, 22% drop in one school year. Now, the test did change again. Now it's called The Ohio State Test, it was called the PARCC Test, but the test itself, the format itself isn't probably what brought about that precipitous drop. Instead, what's happened is the legislature in Ohio has changed what it means to be proficient on the test. So basically, each sort of proficiency level has a cut score, and the cut score has increased for an individual child to be considered efficient. So the kids are no different in '14-15 than the new crop of third graders are in '15-16, but what has changed is what you need to do to be called proficient, and so because of that change, you see this huge drop in test scores along with this new test, and then over the course of the next three years, you sort of see an increased in test scores, and then a decrease in test scores, and then an increase in test scores, and then a decrease in test scores. And the Department of Education is claiming that there's improvement happening, but really what's happened is a whole new system has been created. You really change that third grade reading state testing system into this brand new system, whereas the average had been bouncing around 77% or so. Now you sort of have this new average bouncing around that 60% mark.
0:14:56.9 JD: And again, the kids are no different from those previous years, it's just the test and what you need to do to be considered proficient has changed. And the problem is, is that if you don't look at data like this, if you don't sort of...
0:15:11.5 AS: As a run chart or as a continuum of genuine information that's coming out of the system as measured by some measurement style.
0:15:21.0 JD: Yeah, and annotate it with point to the year that the new test goes into effect, point to the year that the definition of proficiency has changed, point to the year that schools had to switch from paper and pencil test to computer-based test because just a year or two or three after, those sort of memories become really fuzzy, that context becomes very fuzzy and you start to forget, "What year did we switch to computer test? What year did the standard switch? What year did the proficiency cut score switch?" And so if you don't have that sort of running record, that gets completely disconnected, the data gets disconnected from the context, and then you're likely not to make sound decisions because of that lack of context.
0:16:09.2 AS: And maybe I'll raise a few points here about the chart that we're looking at, and this chart is fascinating to me. The first thing that I think about, as a financial analyst in the stock market, basically, if anything is wrong in my chart and in my data and then I put my money down on that, it's gonna get taken from me in the stock market. And I have to really be very rigorous in how I'm looking at data.
0:16:39.1 AS: And when I look at this, I just think this is just so full of so many different ways that could go wrong in the way that things are measured, the way people are incentivized, those types of things. And the other thing that you realize is what you're showing here is that it's a description of the system. It's trying to describe things that are going on, and you're trying to describe certain points, which you can't do in charts that are... Bar charts and things like that. A line chart or a continuous point chart or a run chart really illustrates that. But also I think... I just realized that so much of almost every bit of charting is meaningless or just... Or is even giving you a wrong signal. There's so many things that I think about that and I'm just curious, 'cause you also said something before to me about how maybe people just don't pay much attention to it and then they just accept it for what it says and they don't go and look at the data, think about it and go into more detail. Those are some of the things that come out of my head as I'm looking at this, but what else do you want us to take away from this?
0:17:58.0 JD: Yeah, I think one thing, without the context and the annotations on a line chart or a run chart, data shown over time, you do forget. That's one thing. That's just human nature. You're gonna forget. I'm not gonna remember what happened 10 years ago in my testing system. I'm probably not gonna remember what happened five or even three years ago. The second thing I would say is that the vast majority of data that gets presented is in a table or a spreadsheet, and that data is usually what I would call limited comparison, so this year's data compared to last year's data or this month data compared to the same month last year. And so we're usually trying to draw conclusions with just two or maybe three data points, and that gets even worse when we sort of layer sort of a color-coded stoplight type system where we label certain data red and certain data yellow and certain data green and then we look for the red and the green data, even though the differences between those two, the scales that we use to to assign those colors is often arbitrary and meaningless.
0:19:11.0 AS: One last thing I would add to it, and I think you're gonna show us a good, an example of a good use of data, but also you have to ask the question, Are the people who are preparing this data incentivized to produce a particular outcome, and when you understand the incentives involved, it helps you also understand where it could go wrong.
0:19:33.2 JD: Well, I think that's exactly right. I think what happens oftentimes is the state testing data is a part of an a accountability system, and the point of an accountability system is to sort the good from the bad and to issue sanctions and rewards, and we sort of point to that data and say, Well, your scores are low, you need to improve. And so we sort of conflate this idea of accountability data or accountability goals and improvement goals, and those are really two different things, and so you brought up sort of this idea of CYA or cover your ass type stuff, and when we point towards accountability data, that's what people are gonna do because they are being held accountable for this data, they're gonna cover their ass. If we're truly using data for improvement, there's a completely different mindset. For one, the data tends to be local and well-known to the people that are using it, and if there's not sort of sanctions tied to it, then there is this ability to be more honest and candid about that data because we're using it for improvement purposes rather than using it for accountability purposes.
0:20:51.9 AS: Okay, that's great, great description. Alright, keep going. What do you wanna show us next?
0:20:56.8 JD: Yeah, I think this last chart. And so for the listeners, I've taken the five most recent years of third grade reading test data and put it in an actual process behavior chart, or some people call them control charts, and the advantage here is like the run chart, we're seeing data over time, we're seeing the variation, we're seeing the data move up and down over time, but with the process behavior chart, we're adding these upper and lower natural process limits or some people call them control limits and define, sort of, predict the future of what's gonna happen in this particular system, as long as things move along at the current steady state, and so remember, I was gonna say, just remember in that third grade reading data, and they sort of said, "Well, we improved, and then we did decline this most recent year, but if you look back two years, it's actually an improvement," but actually what you see is, if you play that out over five years, you see the data increase and then decrease, and then increase and then decrease, and that's a very sort of common occurrence with this type of data where there's this natural variation, it becomes obvious when you plot the dots over time, and you really see what is happening with this data is it's just sort of moving about an average, about a 60% proficiency rate.
0:22:35.1 JD: Some years it's a little lower than that, some years it's a little above that, but it's all within the limits of the system, so that tells us that all of that's present is common cause variation, just sort of this every day sort of expected up and down in the data, there's sort of nothing special that's happened to use Deming's terminology, there are no special causes present that would be... So there's these signals we can look for based on patterns in the data, but that's not to say that we're satisfied with this third grade Reading System. So to your point earlier, that average proficiency rate for the state, so we look at all of the third graders in the state of Ohio, and they took this test over the course of five years, about 60% of the kids were proficient in any given year. So that means 40%, two of the five kids that are taking this test are not proficient. So we have a stable but unsatisfactory system, but because there's no special causes, no special events to study or point to what we need to do is improve that third grade reading system across the state.
0:23:47.0 JD: And so that's a completely different mindset than pointing to a single data point saying, "Oh, we've gone down, what are you gonna do, or issue sanctions to this school or to this teacher", that's not the way to improve. What we need to do is improve the system of third grade reading instruction across the state, so a completely different mindset.
0:24:08.1 AS: That's a great explanation. The idea that I get from you is the idea of taking all of the emotion out of it, and let's say how do we use this to improve? And what you're describing here is, and what you've done is you've taken the most recent period of time, now, some people would say, "Oh well, you should look at it over a longer period of time", but what you've described is that the system has changed.
0:24:30.1 JD: That's right.
0:24:31.9 AS: There's been some significant change, and so it may not make sense to look at that prior period, so now you brought it down to the most recent period, what's operating under the same type of system, and what you find is that it's pretty much random variation, which I'm even surprised for 2020, 2021, given COVID and all that, I would have thought that instead of coming down to 53 or so, that that would have come down to 40 or something, just because now maybe it does in the next year, I don't know, but... Okay, that's a great illustration. Now, you had an example to try to show the good use of a chart.
0:25:11.2 JD: Yeah, I have a personal example. I can sort of talk through, this one is a little busier, but I think what I'm trying to illustrate is one, I think when I think of continual improvement, I think that is the same thing as what I would call intermediate statistical methods, that's equivalent to continual improvement. Those two things are the same thing. So what I mean is that in order to bring about improvement, it's very, very powerful to use one of these charts, whether it's a run chart or a process behavior chart, but the point is display your data over time and see how it's performing, and then what you can do then is run these systematic tests, this is sort of that theory of knowledge component of Deming System of Profound Knowledge, specifically the Plan-Do-Study-Act cycle. So you sort of run this structured test to try something within your system, in this case, if you can see the video of this, I'm displaying my weight over the course of three or four months, and you kind of see that over time, it's slowly shifting down, but there's a lot of ups and downs in this data as I'm trying various things, so I have PDSA 1 marked with a vertical dotted line, I gave PDSA two marked with a vertical dotted line and PDSA three marked with a vertical dotted line.
0:26:46.5 AS: And for the listeners that don't know what PDSA is, it's Plan-Do-Study-Act. It's that cycle.
0:26:53.0 JD: Right, Plan-Do-Study-Act cycle, these scientific tests where I'm trying something, maybe it's related to my eating habits, or maybe it's related to my workout habits, or maybe it's sort of a combination of those two, but I'm writing those things down in this Plan-Do-Study-Act cycle. And then the second thing, I'll describe for those that are only listening, I'm also annotating when special events have happened that then lead to signals in my data. So for one, I have in December, towards the end of December, it's probably not hard to guess I have these holiday cheat days marked because you see this jump in weight that goes above the sort of upper limit, which says to me, "Oh, wait, something so different has happened in my system, and that I really need to attend to that", right. Now, if I had waited until February 21st, the day we're recording this, to look back and see this highest data point in my system over the last three or four months, I probably would not remember what caused that. Because I annotate it as it happened, I have this picture, I have this narrative tied to my data that allows me to think back and reflect and figure out what happened to make the weight in this case jump off the page. Over time, what I'm trying to do is both shrink the limits, so lessen the variation around the average, as well as again, in this case, it's weight, so a decrease is good, so I'm also trying to bring that average data down over time and so the idea would be the same no matter what type of data, whether it was those state test scores, whether it's attendance rates, whatever it is, homework completion, whatever it is that you're trying to improve, this sort of same combination of understanding variation, combining it with these Plan-Do-Study-Act cycles. This is the method, This is continual improvement, I think in a nutshell.
0:29:00.2 AS: Fantastic. Alright, I think that illustrates it well, and maybe if you stop sharing that screen and then I'm gonna show you something, John, and for the viewers, they'll be able to see it, but for the listeners, I'll explain it. I'm gonna walk over to my other part of my room here for a second, I'm gonna grab the chart that I fill out each day.
0:29:26.9 JD: Oh my.
0:29:27.8 AS: And this is my chart, and for the listeners out there, I'm holding up a big chart paper, and it's related to my top three goals. I ripped it by... But my goals are related to sales, to health as in yoga and doing exercise every day and my sleep patterns. And I'm tracking how many hours I work on each one, or how many hours I slept, how much time I did on yoga, and I'm not even putting in any kind of limits into it. What I like to do is to track data and just observe, don't try to think about it, don't try to work towards it, just chart it and start observing, and one of my goals is to sleep more. I wanna sleep seven hours, and on average I sleep about six, and I don't have a solution for it, but I know that charting it and observing it and starting to think about it just raises the awareness and gets me thinking, "Okay, I'm far away from my goal, what do I need to do"? So charting is just fantastic, and I think that what you've described is a great way of understanding it.
0:30:34.0 JD: Yeah, I think when you read Deming stuff or listen to him talk, there's often these sort of short phrases that he'll refer to or say, and over time you start to understand what he was saying in just a few words, these powerful statements. When I think of looking at data in a chart over time, Deming said, "Knowledge has temporal spread", four words, knowledge has temporal spread, so what does that mean? So it's not until... Sorry, it's not until you understand or look at data unfolding over time and how it's moving about, how it's varying from point to point, it's not until you see that over the course of 20 or 25 or 30 points that you really start to know how your system is performing, and I think that's really what I was trying to show with the state testing data, with this personal example in a process behavior chart. I think that's the power of the Deming methods when you put all this together.
0:31:43.7 AS: Fantastic. Well, let me try to wrap up a couple of things, we start off with the title of our discussion, which is Data is Meaningless without Context, and you were asking the question like, Are we really improving or are we just writing fiction here? And I was thinking about a lot of cases, people are of massaging the meaning of it. And then another thing that you raised was the idea of what is improvement, do we have a definition? What does it mean? And then you reference the improvement guide book, which talked about the three things that are critical for being real improvement, first that it alters something, second that it produces visible results, and third, that it has a lasting impact. I wrote after that, I was taking notes and I thought, Is it repeatable? Was kind of what they're saying, but I think from a business perspective, and maybe from an education perspective, the better word is it replicable, can it be implemented at other places and brought the same type of improvement? And then finally, I'll wrap up my summary of what you said with your discussion about accountability data versus improvement data, and how improvement data, it's important not necessarily to tie it to incentives, that data is really for how do we understand the system and how do we think about improving that system through a PDSA and other things. Is there anything else you would add to that summary?
0:33:11.5 JD: Yeah, I think that this idea of what's the purpose of the data? Is it for accountability? Or is it for improvement? I think that it sort of gets at one of Deming's 14 points, which is drive out fear, he said, "Where there is fear, there will be wrong figures", and I think that really ties to that idea of Well, what's the purpose of this data? If there's fear and people are thinking that they're gonna be sanctioned in one way or another, then you're not gonna get correct figures, that's just sort of human nature, and I think that's why all this stuff sort of fits together, and you need the sort of full picture about the four components of the system of profound knowledge, the 14 points like drive out fear. And it's bringing all those things together at the leadership level to create the conditions for improvement to actually occur in an organization.
0:34:04.7 AS: Yeah, and I'd imagine as your organization really improves, you'll kinda laugh at all the charts and graphs you used to produce or you used to talk about, and now you're really making use and making data meaningful. So John, I think that's a great discussion. And on behalf of everyone at the Deming Institute, I wanna thank you again for it. And for listeners, remember to go to Deming.org to continue your journey. This is your host Andrew Stotz, and I'll leave you with one of my favorite quotes from Dr. Deming, "People are entitled to joy in work".