March 16, 2019

My Soylent-Green Moment: Instructure and the Future of Canvas

This past week ranks as one of the worst weeks of my professional life: I learned that Instructure is going to be using (is already using?) the data collected about students in Canvas for machine learning and algorithms. I'm still completely shocked. If you haven't read the statements by Instucture CEO Dan Goldsmith in this report by Phil Hill, here is the article:
Instructure: Plans to expand beyond Canvas LMS into machine learning and AI

It's a kind of "Soylent Green" moment for me, realizing that a company and a product in which I had put a lot of faith and trust is going to be pursuing an agenda which I cannot endorse and in which I will not participate.

In this blog post, I'll explain my understanding of the situation, and then close with three main concerns that I have. There will be many more posts to come, and I hope those who know more than I do about machine learning in education will chime in and help me further my own education about this grim topic.

The Now: Canvas Data for Classes and Schools

I've not been impressed by the current Instructure data analytics since their approach is based only on surface behaviors, with no attempt to ask students the "why" for those behaviors (for example, short time spent on content page: because the student is bored? because they are confused? because it was the wrong page? because they have limited time available? because they got distracted by something else? etc.). Yes, Instructure collects a lot of data from students (all those eyeballs! all those clicks!), but just because they have a lot of data does not make it meaningful or useful. Speaking for myself, I get no benefit of any kind from the "Analytics" page for each student in my class that the Canvas LMS wants to show me:

I know that some schools also use the data from Canvas on an institutional level, but that's not something I know a lot about, and I also know there are commercial products, like Dropout Detective, that help schools extend their use of the data in Canvas. Just how a school tracks and uses the data it gathers about its students is for each school to decide.

At my school, for example, there is a strong presumption of student privacy when it comes to enrollment and grading data, as you would expect from FERPA. As an instructor, I use my students' ID numbers to report the students' grades (I am required to do that at the end of the semester, and I am urged to report midsemester grades, but not required), and that is all I can do. I cannot find out what other courses a student is enrolled in or has enrolled in, nor can I find out a student's grades or GPA.

And that is how it should be: it is not my business. Yes, that data exists. And yes, in some cases that data might also be helpful to me in working with a student. But just because the data exists and might be helpful does not mean that I can use it. The student starts with a fundamental right to privacy about their enrollment and grades, and it is up to the school to make decisions about how that data is shared beyond the classroom, like when advisors are able to look at a student's courses and grades overall, or aggregate analysis, like the way the university publicly reports on the aggregate GPA of student athletes, for example.

The Future: Instructure Robo-Tutor in the Sky

So, while my students' performance in their other classes is not my business, Instructure has decided to make it their business. In fact, they have decided to make it the future of their business. Goldsmith is emphatic: the Instructure database is no longer about data reports shared with instructors and with schools. Instead, it is about AI and machine learning. Instructure is going to be using my students' data (my students, your students, all the students) in order to teach its machine to predict what students will do, and then the system will act on those predictions. Quoting Instructure CEO Dan Goldsmith (from Phil's article, and yes, if they do have "the most comprehensive database on the educational experience in the globe," well, that's because we gave them all our data):

Welcome to your worst education nightmare: they are going to lump together all the data across all the schools, predict outcomes, and modify our behavior accordingly... thus sayeth Dan Goldsmith:

In future posts, I'll write in more detail about why this is bound to fail. The hubris here is really alarming; it's as if the executive team at Instructure learned nothing from the costly failures of other edtech machine-learning solutionists during the late, not-great era of the MOOCs. Back in February 2019, Michael Feldstein had speculated that this kind of hype might be subsiding (Is Ed Tech Hype in Remission?), but here we are just a few weeks later, and the hype is strong. Very strong.

Three Concerns

For now, I have three concerns I want to focus on:

1. What exactly did I agree to? To my shame, I put a lot of trust in Instructure, so it is indeed true that I clicked a checkbox somewhere at some point without reading the privacy policy and related legal policies. My students clicked such a checkbox too. At the Instructure website there is a Privacy Policy that relates to personal identifying information (you can access that from the Canvas Dashboard), and once you get to the Instructure website, you can also find an Acceptable Use Policy, but it seems primarily focused on indemnifying Instructure from wrongdoing by users (illegal content, objectionable content, etc.). I'm not a lawyer, but I guess it all hinges on this: "Instructure reserves all rights not granted in the AUP Guidelines." That sounds like they can use all the non-personally-identifying data as delimited in the separate privacy policy in any way they want, is that right?

They do state that they "respect the intellectual property of others and ask that you do too," but it's not clear at all if they regard all the content we create inside the system (assignments submitted, quizzes created and taken, discussion board posts, etc.) as our intellectual property that they should respect and not exploit without our permission. Hopefully someone who knows more than me can figure out how this AUP compares to the kind of terms-of-service that are being used by a company like, say, Coursera, which from the start was committed to machine learning and exploitation of user content in the system.

I don't know what the Coursera terms-of-service looks like now, but back when they first got started, they were very explicit about reusing our content to build their machine-learning system, as I wrote about when I took a first-generation Coursera course back in 2012: Coursera TOS: All your essay are belong to us. See that blog post for language like this: "you grant Coursera and the Participating Institutions a fully transferable, worldwide, perpetual, royalty-free and non-exclusive license to use, distribute, sublicense, reproduce, modify, adapt, publicly perform and publicly display such User Content," etc. I didn't see that kind of language in the Instructure policies, but I'm honestly not sure where to look.

Instructure does have a "Privacy Portal" with a cutesy graphic (visit the page to see the curtain being drawn and clouds of steam arising from behind the shower curtain). I thought the text in bold beside the graphic would be links leading to more information, but they are not links. There's a privacy policy, an acceptable use policy, and a data processing policy linked across the top of the page, but I don't see something labeled "terms of service" like what Coursera had in place. The shower curtain is labeled "privacy shield." Yeah, right.

2. What about opting out? Without an opt-out, Instructure is putting us in an impossible situation, way worse than with TurnItIn, for example. If a student insists that they will not use TurnItIn (as I think every student should do: just say no!), then it's easy to find work-arounds; teachers would just have to read the student's work for themselves without robo-assistance. But if a student says, no, they will not use Canvas because they do not want their data to be exploited for corporate profit, then that puts the teacher in a really awkward position. If you put all your content and course activities and assessments inside Canvas and a student does not want Instructure to use their data, what can the teacher do? It seems to me that Instructure needs, at a minimum, an opt-out for people who do not want their data to be used in this way by our new corporate overlords. Even better: it could all be opt-in, so that instead of assuming students and teachers all want to give their data to Instructure without compensation, you start with the assumption that we do not want to do that, and then Instructure can persuade us to opt in after all.

3. What about FERPA? Right now instructors at my school can put grades in Canvas for institutional reporting purposes (although I actually put mine directly into the SIS instead because the Canvas grading schemes can't accommodate my course design). My school then controls very strictly how that grade data is used, as I explained above. Now, however, it looks like that grade data is something that Instructure is going to be mining, at the course level and at the assignment level, so that its machine-learning engine will track a student's performance both within classes and also from course to course, analyzing their grades and their related data to create the algorithms. To me, that seems like a violation of privacy. In legal terms, perhaps it is not a problem because they are anonymizing the data, but just because it is legal does not make it right. We are apparently giving Instructure extraordinary freedom to take our students' grades and supporting work in order to exploit that not just beyond courses at an institutional level but, as Goldsmith stated (see above), across institutions in ways that will be totally beyond our control. It's like TurnItIn profiting from our students' work (to the tune of 1.7 billion dollars, also in this week's news) without any form of compensation to the students, but way worse. WAY worse. It's not just the students' essays now. It's... everything. Every eyeball. Every click. Teachers and students alike.

Of course, I know Instructure, just like TurnItin, will hire the lawyers they need to make sure they can get away with this. But how sad is that? I never thought I would write a sentence that says "Instructure, just like TurnItIn" ... and yes, I'm angry about it. Angry at Instructure for squandering money, time, and people's trust on what will turn out to be hype rather than reality (but more on that in a separate post). I'm also angry at myself for having put so much trust in Instructure. When I expressed my anger at the Canvas Community this week, I was told that my opinions violated the Community Guidelines which require that everything we post there be "uplifting," so that is why I am back blogging here again after blogging for a couple of years at the Community. I have nothing uplifting to say about the new turn Instructure is taking, and I need a blog space where I am free to say that I am angry about this.

Human Learning

But every cloud (including a SaaS cloud) has a silver lining. I am now going to take my casual layperson's knowledge of machine learning and predictive algorithms in education (mostly gleaned from reading about robograding of student writing) and learn more about that. If the machines are learning, we better get to work on our own learning too! And hey, perfect timing: it's Spring Break and I'll be spending two days in airports. Which means two days of reading.

I'm going to start with Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil.

Then what should I read next? Let me know here or at Twitter (@OnlineCrsLady).

Update: More on Canvas AI, plus a new weekly datamongering round-up. :-)