I’ve read a dozen articles about Personalized Learning and about bringing Analytics such as artificial intelligence and machine learning into Education – just this week alone. If you’re in education you’ve probably received an increasing number of solicitations and seen a growing number of conference sessions touting Personalized Learning and Analytics. And you may have sat in discussions about innovation in edtech. And what topic consumes most of these discussions? That’s right – Personalized Learning and Analytics.

I’ve spent a long career in the Analytics field specializing in advance analytics, learning algorithms and simulation systems. As I’ve been applying my experience to my current home in education technology I’ve come to the conclusion that, when it comes to data and personalized learning, the proverbial horse is well behind the cart – so much so that the poor horse doesn’t even see the cart and probably long forgot about it.

Cart before the horse

Personalize Learning before Integrated Data

In this analogy, the cart = personalized learning/analytics and the horse = data. My premise, based on this last year’s meetings with school and district leadership, is simple: most schools simply don’t have access to the breadth of data to make personalized learning and advanced analytics meaningful. The data is likely there but it is typically locked away in various software vendor’s systems or data warehouses and not accessible for a truly integrated view.

When applying mathematics to address an issue, the primary and fundamental challenge is to use relevant data that aligns to the nature of what you’re analyzing. The analytics profession, in general, has evolved significantly from exclusively applying statistical models of inference toward mathematical models (often more empirically-based) designed to more closely approximate the system they are meant to represent. This is the notion of mathematical fidelity – how closely does my math represent the actual dynamics of the system it’s meant to represent.

But these types of models and algorithms require a wider breadth of data to be meaningful.

It is easy to question how well a regression equation represents an individual student’s achievements (or lack thereof). Obviously it doesn’t. It represents an aggregation of individual students into a mathematical formulation with a goal to minimize individual, observed error. But real students don’t behave this way. The same reason why a “mean” or an average doesn’t represent people well – most people are not exactly average.

In the past decade, the notion of “big data” also captured our attention. And, one could argue created a distraction. The idea being that we’re able to capture, store and process increasingly obscene amounts of data pushed organizations across industries to focus on the idea of more data. But over that decade it has also become increasingly clear that having much, much more of the same type of data is less valuable than having a lot more variety of data. This argument fits well into the education sphere. Having 10 times more assessment data on a group of students is nice. But (pardon the run on sentence, it’s for effect) being able to integrate assessment data with a student’s attendance and conduct over time, their participation in extracurricular activity, their standardized test scores, their performance against certain standards, their depth of knowledge in specific topic areas, maybe a survey of their interests and perhaps the concentration of homework, projects, quizzes or tests at a given time is likely, collectively, significantly more valuable to understanding the trajectory of that student then 10x more assessments.

Without the variety of data, those responsible for developing effective models of student behavior are operating with only a partial toolbox. This puts more distance between the math they are developing and the real system they are trying to understand. And the opportunity for recommendations based on spurious findings goes up.

Today’s education leadership needs to be sure that all the data that is flowing through their software and systems is available. Then they can turn their focus to what amazing things can happen when that data is harnessed. Then the horse will be pulling the cart and not standing there wondering when it will show up.