Tuesday, June 14, 2016


Aggregating Course Evaluations

Most industries passed this point some time ago, but it’s new to me.

I just saw a demo of a program that allows students to do course evaluations on mobile devices.  The data are automatically aggregated, and put into an easy-to-analyze format.  We ran a pilot this Spring; the demo showed what could be done with the data on a large scale.

It got me thinking.

Paper-based course evaluations were misnamed; they were mostly instructor evaluations.  At that level, their merits and demerits are well-rehearsed.  They’re integrated into the promotion and tenure process, for better or worse.  Most of us have a pretty good sense of how to read them.  They also come with compilation sheets showing collegewide averages in various categories.  I’ve written before on how to read them, but the short version is: ignore the squiggle in the middle.  Look for red flags.  And never make them entirely dispositive one way or the other; at most, they’re warning lights.  

But when answers to the same few questions from thousands of students can be sliced and diced quickly and easily, new uses suggest themselves.

For example, with an active dataset, it’s no great challenge to isolate, say, one course from the rest.  In a high-enrollment class with lots of sections taught by many different people -- the English Comps and Intro to Psychs of the world -- you could look at scores across the questions for the entire course to see if there are consistent trouble spots.  If the same red flag pops up in nearly every section of the same class, regardless of who teaches it, then there’s probably a course issue. Administratively, that suggests a couple of things.  First, don’t penalize instructors for a course issue.  Second, target professional development or curricular design resources to those areas.  

I could imagine a department building a question like “among the following topics covered in this class, which one do you wish got more time?”  Getting answers from dozens of sections, taught by many different people, could be useful.  A consensus may exist, but from the perspective of any one person, it may be hard to distinguish between “I didn’t do that part well” and “the course doesn’t do that part well.”  Rack up a large enough sample, though, and the effects of any one person should come out in the wash.  A department could find real value in a consistent answer.

The social scientist in me would love to run other, less action-oriented queries.  For example, if we broke out the ratings by gender of instructor, what would it show?  I wouldn’t recommend basing hiring or scheduling decisions on that -- discrimination is discrimination, and aggregates don’t map cleanly onto individuals anyway -- but it might reveal something interesting about the local culture.  We could break them out by full-time/adjunct status, with the usual caveats about perverse incentives and limited resources.  At some point, I’d love to (somehow) track the correlation between perceived quality of the intro course and student performance in the next course in a sequence: for example, did students who gave higher ratings to their Comp 1 instructors do better in Comp 2?  Anecdotes abound, but we could get an actual reality check.

As with any data, there would have to be procedural and ethical safeguards, as well as some training for the folks looking at it to understand what they’re seeing.  But that doesn’t strike me as a deal-breaker.  If anything, it suggests making the warning lights more accurate.

Wise and worldly readers, if you could slice and dice the data set of student course evaluations, what questions would you ask of it?  What would you want it to reveal?

Great questions!

My research question aligns with what I identify as a potential barrier to these investigations: What characteristics of students are associated with completing course evaluations? For example: Does really liking or disliking the professor make a student more likely to complete the survey? Are students who earn A's more likely to complete the survey? Is higher survey completion associated with registering for the course earlier rather than later?

As these question might hint at, we have tremendous difficulty getting students to actually do the course evaluations. When we switched from paper to online surveys, the completion rate dropped by an enormous margin. We too have piloted course evaluations that students can complete on their phones, but I've had difficulty with these. Too many students have technological difficulties, somehow don't know their student ID number, or say they'd just rather do it at home (and then skip it all together). It's also awkward to troubleshoot for those students who ask for help logging in or want to verify that they're responses were submitted when I don't know that it's ethical for me to even be standing in the room. All good research relies on good data -- so how do we get enough students, and a fairly representative sample, to complete the course evaluation?

I'll note that I do know of one creative approach. My grad school gave a big incentive to complete the surveys: before the beginning of the next semester, those students who submitted all their evaluations were given access to a database that allowed you to look up last semester's evaluation results by course and instructor. The idea was that by contributing to the data you got the privilege of using it to your advantage in deciding which courses/sections to register for. Has anyone else seen something like this?
The questions I as an instructor want answered bear very little relationship to what administrators want to see. See my blog post for some of the questions I asked this quarter.

Note: paper forms had an 85% return rate, while our new, improved online forms have only a 15% return rate. Going to online forms also resulted in almost no written comments, which were the most useful part of the old paper forms. Going online has resulted in less data and lower-quality data, so don't expect miracles of "big data" just because the data are now numeric and searchable.

My personal suggestion, which you have heard before, is to look at how one profs students do in the next class in a sequence at your college. That can expose too easy as well as too hard grading and might expose whether an entire course, regardless of instructor, is failing at the next level. It is also "easy" because it is all in your computer system.

I like CC Bio Prof's first suggestion. It is terrible that Institutional Research can't correlate grade with evaluation. (Or attendance. With on-line evaluations, a student can still evaluate a course despite not attending class for weeks or even months and maybe skipping the final exam.) There is a lot to learn there.

And I like your idea of evaluating the course rather than the instructor. I already do that myself, but can't see across sections taught by others. But we did get some interesting info by having IR looking at course completion correlated with when they registered. You do, however, have to be careful with multivariate data. Don't ask for too much until you can refine the question(s).

Although my own course feedback data isn't as rich as I would like, it (along with more anecdotal feedback from students who drop by to visit after transferring) has been extremely useful in working on the course from year to year. What is frustrating is that I know the computer systems can also look at how my specific students do after transfer, but that is not where IR can spend its limited resources (because that particular task requires the highest level of expertise). We've only seen aggregated data of what happens after transfer, but what we have seen is really interesting.
Reacting to Gas Station Without Pumps:

I get more written feedback now than I used to, but that was a flaw in our paper forms. Students had to use their own paper to write comments so it never looked to them like it was part of the evaluation. Or they just wanted to be done with it.
The response rate experience at my school has been similar to that of CC/HS Bio and Gas Station w/o pumps. We tried an on-line pilot several years ago, and the response rate went down quite a lot compared to paper forms. (Unfortunately I don't recall the numbers.) There continues to be pressure to move to on-line evaluations in terms of lower costs and faster responses, but the poor historical response rate has prevented the change from happening.

I am not sure why the "on line" response is necessary for making more Big Data questions possible to ask. Our paper evaluations have been "fill in the bubbles" for at least 15 years. Won't answers on paper evaluations populate a database (eventually...) in the same way that on-line answers populate a database more quickly?
I would like to see data across Cordes concerning how individual students answer particular questions..

So, student A gave no profs 5/5, student Bgavr all profs 5/5 etc.. It's a way of Mormong scored.
Excellent and very cool idea and the subject at the top of magnificence and I am happy to this post..Interesting post! Thanks for writing it. What's wrong with this kind of post exactly? It follows your previous guideline for post length as well as clarity..

Digital marketing company in Chennai
HS Lab partner:

In principle, an on-line survey can capture who is doing the evaluation and supply that information to institutional research for detailed analysis. (You know, the way Google reads your mail.) You can never do that with paper evaluations. Doing so would require disclosure, of course, but if the college does it like Google does it, students will never know they approved that research.
I think we already know the answer to parsing the data by gender of the instructor...in general, female instructors receive worse evaluations. I don't have citations at hand, but I know I've seen at least a half a dozen published papers in the last 3-5 years.

I would second (third? fourth?) the observation that the number of completed evaluations drops dramatically with the shift from paper to on-line. In my current life as an adjunct, the response rate in my intro econ class in the spring of 2015 was 90% (27 out of 30); the shift to on-line occurred in fall 2015, and my spring 2016 response rate was 13% (4 out of 30).
@CCPhysicist, thanks for the clarification of that nuance in the data collection. I hadn't appreciated that aspect, and now I wonder if it was present in our system. It also makes me think more about the level of trust that we use with our still-on-paper system.

It also makes me wonder if the tracking aspect is related to the low response rate. I suspect it isn't, given the strong love of Google email.
We have online evaluations, but have the students fill them out on their phones in class, the way we used to do with paper forms. There hasn't been a noticeable dip in completion rates since we made this transition. The difference is that, probably because they are so used to texting I have received more written feedback these last couple of years than I did on the old paper forms. (Unfortunately, it is not particularly more useful.)
@HS Lab Partner:

Our system, which is run by a private company, does know who has completed the survey. (I do not know if they tie the responses to the student, which is the critical detail for doing any further analytics.) However, students say they don't get any followup requests to do the survey, so the college doesn't use that info to improve response rates.

We have anecdotes suggesting the falloff is simply due to distractions. A few faculty got great response rates by having students do it in class (if their device is compatible with the system being used). They got great response rates, but also said they discovered the survey took a long time to complete. We asked for, but did not get, information about the fraction that start it and then quit partway through. That could also be a factor.
We use an online system. I'm not sure how much effort is required by the IT people, but by the time it gets to the instructors the data is pretty easy to look at, including question-by-question comparisons across sections of a course, departments, colleges, or the whole university.

As a whole, though, I'm not a fan. There are 40-something questions, many of which are not relevant to many courses. Even the well-meaning students get discouraged answering far too many, and largely irrelevant questions. This is particularly true for science majors, who take more courses (a 1 credit hour lab takes just as long to evaluate as a 3-4 credit hour lecture). As a result, it requires a huge effort to get any sort of meaningful response rate, and I've never been able to get as good response rates as I used to get with the 16 question plus comments paper form.
Stupid question: Why not give the students enough information to make an informed decision? "Here is the paper form. If you put your student ID number on it, we can track feedback across sections and courses, which will help to improve the institution as a whole. However if you fear retaliation by the professor, you do not have to bother."
Can you set up an interactive evaluation, where questions are omitted based on lack of relevance (auto-fill "not relevant")? That would help pare down the total length, hopefully making it more likely to be completed.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?