Review of Calibrated Peer Review Pilot by Judith Annicchiarico
I’m desperate for tools and strategies to help me get my job done and still live a life. Like many other writing instructors over the past twenty years have done, I’ve tried out scores of promising technology solutions that looked like they might help me do my job more effectively or efficiently, whether that means helping students to apply concepts they’re learning, or freeing up time for me to apply to course planning, attending meetings, and grading stacks of papers. No Luddite, I. From building my own Web sites and coaching lit students to develop their arguments in Daedalus integrated computer classrooms in the 90s, to creating interactive online lectures in the 00s, I have adopted, adapted, and more often than not abandoned more technological strategies for supplementing writing instruction than I can recall off hand.
Regardless of past disappointments, I had high hopes for the programs coming out a few years ago that promised well-managed and statistically proven tools for students to “workshop†(peer review) their writing outside of the classroom, on their own time, freeing up time for instructors to teach their actual subject material, rather than how to write about it. Even if their subject material is “how to write,†instructors could theoretically use their classroom time to instruct, and let students workshop on their own time using one of these programs.
Last semester, I joined a group of faculty across the curriculum testing out various programs to support writing instruction. In this case, since we were earning a small stipend, we were asked to pilot programs that the instructional technology department was interested in. I was hoping to be assigned to pilot SWoRD, based on a talk I had heard by one of the program’s creators who described the algorithmic formula of the program as a proven method: the designer’s studies had indicated that if students had peer-review input from five other randomly assigned students in the same class, they would be able to revise their work and make it better before submitting it to the instructor (or that feedback could be used in lieu of instructor feedback). I found the word “algorithm†oddly compelling, especially combined with the stats, and I jumped at the chance to pilot the program. I imagined spending those class days I had previously allocated to in-class “workshopping†in more effective ways—perhaps demonstrating revision strategies to the whole class, or meeting with individuals who required more one-on-one help. And of course, students teaching other students is a proven method for generating deeper understanding of concepts, so SWoRD, with its inticing algorithmic appeal, seemed like a win-win technology to me. Unfortunately, when it came time for my working group to pick our pilot projects, someone else got to the SWoRD before me, and I was advised to pick another pilot – “We need you to test CPR†is actually what someone said to me. So Calibrated Peer Review it was.
This program from UCLA was designed to allow instructors of math, engineering, and science to assign writing projects to their students but not actually have to assess them, thus allowing them to have their cake and eat it too, I bitterly thought, stewing in my envy of faculty who spend no more time grading than it takes to hand a stack of Scantrons to a graduate assistant. But I digress.
Essentially, the instructor first uploads the writing assignment and three models (one very good, one satisfactory, and one poor version) of the assigned paper to the program, and then creates a list of assessment questions for students to apply to the models so the program can “calibrate†the students’ assessment skills. Once the students have been trained by the program to effectively assess good, satisfactory, and poor writing, the program lets them loose on other students’ papers, and on their own writing. Their final grade on the project is some combination of how effectively they assessed the model papers, their fellow students’ papers, and their own paper (the instructor is able to set up what percentage of the final grade is allocated to each of these activities). This program was not designed for writing instructors, but it did potentially offer a way to have students “workshop†their papers outside of class, so I jumped in and gave it a try. And it was an interesting experiment.
Of course, like many new technologies, this project that I embarked on as a way of more efficiently using my time was a gigantic black hole sucking in every minute of my life outside the classroom at first. I had to create the good, satisfactory, and poor models of the writing projects I was testing (a short essay in one class, and a statement of purpose in another class) for use in the calibration stage of the program. I pieced together parts of previous student writing for this stage, but I had to work for quite a while to make these models represent the sorts of writing goals (and problems) that the projects were addressing, while also changing them in large and small ways so that they did not resemble any one actual student’s paper. This stage took hours – two whole nights that I could have and should have been grading papers, actually.
Once that stage was completed, I created fifteen assessment questions that students could answer with a “yes†or “no†for the most part. “Does the thesis/project statement helpfully signal what the paper goes on to argue?†“Is the phrasing concise and clear?†“Is transition phrasing used effectively to guide the reader through the essay’s various sections?†“Is the terminology of rhetorical analysis used effectively?†Etc. Because I was testing the program on two separate projects in two different classes, I had to create two separate sets of assessment questions. And once I created the questions, I figured out that I should go back into the model papers and insert more/less effective phrasing, rhetorical analysis terms, transitions, etc., so that the students could easily determine whether they should click yes or no for each question. This stage of the set-up probably took another couple full nights of work. Thinking in “yes†and “no†question style does not come naturally to me, apparently.
Once I was ready to launch the program, I checked with my school’s administrator in charge and learned that she still had to load the names and IDs of my students into the program in order for me to use it. Since the semester was already half over at that point, and my students already had quite enough to do, I decided to offer participation in the “online workshop†as an extra-credit option (knowing how students love extra credit, and that I never offer it, I figured I’d get enough participants to make the pilot worthwhile). Thirty students (out of about 150 across five sections) signed up. Not a lot, but a good enough sample size to learn how the program really worked.
Students are required by the program to upload their own completed writing project before a certain date that the instructor sets, after which the calibration text models become available. Then students must complete the calibration stage of assessing the models (and reassessing based on feedback the program gives them) to prepare them for the third stage. In that last segment of time, the students apply the assessment questions to other student papers randomly chosen for them by the program. Finally, they apply the assessment questions to their own papers – the ones they submitted before starting the calibration process. After the deadline for assessing is over, students can log back in to read what other students have said about their papers (the answers submitted regarding the assessment questions, and optional comments students are able to add) and learn what score the program gave them for all their efforts.
My goal for the students was that using the program – being “calibrated†to correctly assess the models and apply what they learned from this “calibrationâ€â€“ would sharpen their understanding of those writing concepts we were practicing. In fact, the results were mixed. I surveyed the students after the whole experiment was over (and also read their comments in CPR regarding one another’s work) and learned that some students did indeed appreciate their refined understanding of the writing projects we were working on gained through reviewing and assessing the models. Just as many students report regarding in-class workshops, some of the CPR pilot students enjoyed reading one another’s papers and learning how other students interpreted the project.
Some students, however, were exasperated by the calibration process, feeling as though they weren’t learning to better understand why one of the thesis statements was effective and two others were less so, etc., but instead being forced by the program to answer “correctly†so that they could move on from the calibration stage. It turned out that the program allowed each student to assess the models only twice. Partly because the program includes explanations for why each question should have been answered “yes†or “no†(did I mention the hours it took me to input those explanations?), after the second try, the program deems students ready to move on to assessing peers’ papers, regardless of their calibration success. This irked some students who didn’t feel that they were offered enough training to effectively assess other students’ papers.
What I realized after learning about this frustration, is that there was some information about how to look for and appraise the various moves in writing that I was taking for granted, and that I needed to be even more descriptive about how to assess those moves. I also learned that those students for whom “distance learning†of any sort is not an effective method are not good candidates for a program like CPR. No matter how intricate the descriptions might have been for each assessment question and how to apply it to a piece of writing, some students simply needed to be in the room with me while I talked about the text in question on a projection screen in front of the class or sat in with a workshop group, pointing to phrasing in a student’s paper. Whatever that extra give and take is when the questions and answers are being exchanged in real time between people physically present in the same space—that’s what some students can’t function without (which is of course one of the problems with arguments for abandoning brick and mortar universities to teach and learn in cyberspace)… But I digress.
What the instructional technology people at my school had not entirely understood was that CPR was not intended in any way to be used for peer-review as a stage in the writing process. It was designed to allow a system in which students can be given a writing assignment (including any relevant texts), and then peers grade one another’s work, as the final stage of the process. The instructor for a CPR assignment never need look at the paper. Once I figured out that there was no way to tweak the program to make it work the way I like to use peer review, I decided to try out the program for a credit-no credit “rhetorical précis†assignment (just one short paragraph), which I used this semester. This time I required full participation of all my students, but it was a low-stakes assignment, so some students who encountered technical difficulties or were easily annoyed by technologies that were new to them simply opted out. Other students were never loaded into the system by the admin person (even though this was supposed to be an automatic process). And the same sorts of frustrations that occurred in my first students’ usage occurred again. More than half of students felt they learned something from the assignment, but a good number were put off by the program and completed the assignment more confused and frustrated than anything else.
Part of this frustration was due to the need for right and wrong answers applied to the model texts and students’ own papers. The program can’t give a score without there being only one correct way to assess each attribute of the text in question. Either the thesis statement is effective or it isn’t. Either the prose style is mostly concise or it isn’t. Either source material is effectively synthesized into the paragraphs or, well, you get the point. In face-to-face peer-review workshops, students can discuss the shades of gray regarding each of these questions, and can work out their different assessments in conversation. In a program like CPR, they can’t. Therefore their frustration. I’m sure I could keep working on refining the assessment questions to make them more instructive and sensitive to students’ needs, but this right/wrong issue will not go away and is a major sticking point for some students.
I recently spoke on the phone with a physics professor who was planning to use CPR for the first time with his students. He and a colleague had teamed up (thereby cutting the workload in half – very smart) to create an assignment for their large sections of science students, and this professor was hoping I might know how to get the program to accept certain image files that his students needed to complete their assignment. Unfortunately, I had no idea how to do this and could offer only a recommendation of who else he might call. I’m guessing that the physics professors’ use of the program was more successful than mine since it was designed for applications in science classrooms, after all. When an email arrived recently announcing our campus’s exciting access to a new upgraded version of Calibrated Peer Review, however, I hit delete. It’s not for me, but I’m sure it can be useful to those physical science instructors for whom it was intended – I hope it is. I’ve also decided not to pursue SWoRD, but rather to learn from the experience of my colleague who piloted it and, algorithm or not, leave this program for others. I will still use the peer review function in Turnitin for online classes, but I’m back to in-class peer review workshopping in my on-the-ground classes for now.
I am, however, happily jumping on board the Camtasia Relay lecture capture bandwagon, creating instant audio-visual materials out of real-time or asynchronous instruction. This time I’m sure the software will make me a more efficient, effective teacher. I just need to make a couple more demo runs through the program to work out a few things…
I’m desperate for tools and strategies to help me get my job done and still live a life. Like many other writing instructors over the past twenty years have done, I’ve tried out scores of promising technology solutions that looked like they might help me do my job more effectively or efficiently, whether that means helping students to apply concepts they’re learning, or freeing up time for me to apply to course planning, attending meetings, and grading stacks of papers. No Luddite, I. From building my own Web sites and coaching lit students to develop their arguments in Daedalus integrated computer classrooms in the 90s, to creating interactive online lectures in the 00s, I have adopted, adapted, and more often than not abandoned more technological strategies for supplementing writing instruction than I can recall off hand.
Regardless of past disappointments, I had high hopes for the programs coming out a few years ago that promised well-managed and statistically proven tools for students to “workshop†(peer review) their writing outside of the classroom, on their own time, freeing up time for instructors to teach their actual subject material, rather than how to write about it. Even if their subject material is “how to write,†instructors could theoretically use their classroom time to instruct, and let students workshop on their own time using one of these programs.
Last semester, I joined a group of faculty across the curriculum testing out various programs to support writing instruction. In this case, since we were earning a small stipend, we were asked to pilot programs that the instructional technology department was interested in. I was hoping to be assigned to pilot SWoRD, based on a talk I had heard by one of the program’s creators who described the algorithmic formula of the program as a proven method: the designer’s studies had indicated that if students had peer-review input from five other randomly assigned students in the same class, they would be able to revise their work and make it better before submitting it to the instructor (or that feedback could be used in lieu of instructor feedback). I found the word “algorithm†oddly compelling, especially combined with the stats, and I jumped at the chance to pilot the program. I imagined spending those class days I had previously allocated to in-class “workshopping†in more effective ways—perhaps demonstrating revision strategies to the whole class, or meeting with individuals who required more one-on-one help. And of course, students teaching other students is a proven method for generating deeper understanding of concepts, so SWoRD, with its inticing algorithmic appeal, seemed like a win-win technology to me. Unfortunately, when it came time for my working group to pick our pilot projects, someone else got to the SWoRD before me, and I was advised to pick another pilot – “We need you to test CPR†is actually what someone said to me. So Calibrated Peer Review it was.
This program from UCLA was designed to allow instructors of math, engineering, and science to assign writing projects to their students but not actually have to assess them, thus allowing them to have their cake and eat it too, I bitterly thought, stewing in my envy of faculty who spend no more time grading than it takes to hand a stack of Scantrons to a graduate assistant. But I digress.
Essentially, the instructor first uploads the writing assignment and three models (one very good, one satisfactory, and one poor version) of the assigned paper to the program, and then creates a list of assessment questions for students to apply to the models so the program can “calibrate†the students’ assessment skills. Once the students have been trained by the program to effectively assess good, satisfactory, and poor writing, the program lets them loose on other students’ papers, and on their own writing. Their final grade on the project is some combination of how effectively they assessed the model papers, their fellow students’ papers, and their own paper (the instructor is able to set up what percentage of the final grade is allocated to each of these activities). This program was not designed for writing instructors, but it did potentially offer a way to have students “workshop†their papers outside of class, so I jumped in and gave it a try. And it was an interesting experiment.
Of course, like many new technologies, this project that I embarked on as a way of more efficiently using my time was a gigantic black hole sucking in every minute of my life outside the classroom at first. I had to create the good, satisfactory, and poor models of the writing projects I was testing (a short essay in one class, and a statement of purpose in another class) for use in the calibration stage of the program. I pieced together parts of previous student writing for this stage, but I had to work for quite a while to make these models represent the sorts of writing goals (and problems) that the projects were addressing, while also changing them in large and small ways so that they did not resemble any one actual student’s paper. This stage took hours – two whole nights that I could have and should have been grading papers, actually.
Once that stage was completed, I created fifteen assessment questions that students could answer with a “yes†or “no†for the most part. “Does the thesis/project statement helpfully signal what the paper goes on to argue?†“Is the phrasing concise and clear?†“Is transition phrasing used effectively to guide the reader through the essay’s various sections?†“Is the terminology of rhetorical analysis used effectively?†Etc. Because I was testing the program on two separate projects in two different classes, I had to create two separate sets of assessment questions. And once I created the questions, I figured out that I should go back into the model papers and insert more/less effective phrasing, rhetorical analysis terms, transitions, etc., so that the students could easily determine whether they should click yes or no for each question. This stage of the set-up probably took another couple full nights of work. Thinking in “yes†and “no†question style does not come naturally to me, apparently.
Once I was ready to launch the program, I checked with my school’s administrator in charge and learned that she still had to load the names and IDs of my students into the program in order for me to use it. Since the semester was already half over at that point, and my students already had quite enough to do, I decided to offer participation in the “online workshop†as an extra-credit option (knowing how students love extra credit, and that I never offer it, I figured I’d get enough participants to make the pilot worthwhile). Thirty students (out of about 150 across five sections) signed up. Not a lot, but a good enough sample size to learn how the program really worked.
Students are required by the program to upload their own completed writing project before a certain date that the instructor sets, after which the calibration text models become available. Then students must complete the calibration stage of assessing the models (and reassessing based on feedback the program gives them) to prepare them for the third stage. In that last segment of time, the students apply the assessment questions to other student papers randomly chosen for them by the program. Finally, they apply the assessment questions to their own papers – the ones they submitted before starting the calibration process. After the deadline for assessing is over, students can log back in to read what other students have said about their papers (the answers submitted regarding the assessment questions, and optional comments students are able to add) and learn what score the program gave them for all their efforts.
My goal for the students was that using the program – being “calibrated†to correctly assess the models and apply what they learned from this “calibrationâ€â€“ would sharpen their understanding of those writing concepts we were practicing. In fact, the results were mixed. I surveyed the students after the whole experiment was over (and also read their comments in CPR regarding one another’s work) and learned that some students did indeed appreciate their refined understanding of the writing projects we were working on gained through reviewing and assessing the models. Just as many students report regarding in-class workshops, some of the CPR pilot students enjoyed reading one another’s papers and learning how other students interpreted the project.
Some students, however, were exasperated by the calibration process, feeling as though they weren’t learning to better understand why one of the thesis statements was effective and two others were less so, etc., but instead being forced by the program to answer “correctly†so that they could move on from the calibration stage. It turned out that the program allowed each student to assess the models only twice. Partly because the program includes explanations for why each question should have been answered “yes†or “no†(did I mention the hours it took me to input those explanations?), after the second try, the program deems students ready to move on to assessing peers’ papers, regardless of their calibration success. This irked some students who didn’t feel that they were offered enough training to effectively assess other students’ papers.
What I realized after learning about this frustration, is that there was some information about how to look for and appraise the various moves in writing that I was taking for granted, and that I needed to be even more descriptive about how to assess those moves. I also learned that those students for whom “distance learning†of any sort is not an effective method are not good candidates for a program like CPR. No matter how intricate the descriptions might have been for each assessment question and how to apply it to a piece of writing, some students simply needed to be in the room with me while I talked about the text in question on a projection screen in front of the class or sat in with a workshop group, pointing to phrasing in a student’s paper. Whatever that extra give and take is when the questions and answers are being exchanged in real time between people physically present in the same space—that’s what some students can’t function without (which is of course one of the problems with arguments for abandoning brick and mortar universities to teach and learn in cyberspace)… But I digress.
What the instructional technology people at my school had not entirely understood was that CPR was not intended in any way to be used for peer-review as a stage in the writing process. It was designed to allow a system in which students can be given a writing assignment (including any relevant texts), and then peers grade one another’s work, as the final stage of the process. The instructor for a CPR assignment never need look at the paper. Once I figured out that there was no way to tweak the program to make it work the way I like to use peer review, I decided to try out the program for a credit-no credit “rhetorical précis†assignment (just one short paragraph), which I used this semester. This time I required full participation of all my students, but it was a low-stakes assignment, so some students who encountered technical difficulties or were easily annoyed by technologies that were new to them simply opted out. Other students were never loaded into the system by the admin person (even though this was supposed to be an automatic process). And the same sorts of frustrations that occurred in my first students’ usage occurred again. More than half of students felt they learned something from the assignment, but a good number were put off by the program and completed the assignment more confused and frustrated than anything else.
Part of this frustration was due to the need for right and wrong answers applied to the model texts and students’ own papers. The program can’t give a score without there being only one correct way to assess each attribute of the text in question. Either the thesis statement is effective or it isn’t. Either the prose style is mostly concise or it isn’t. Either source material is effectively synthesized into the paragraphs or, well, you get the point. In face-to-face peer-review workshops, students can discuss the shades of gray regarding each of these questions, and can work out their different assessments in conversation. In a program like CPR, they can’t. Therefore their frustration. I’m sure I could keep working on refining the assessment questions to make them more instructive and sensitive to students’ needs, but this right/wrong issue will not go away and is a major sticking point for some students.
I recently spoke on the phone with a physics professor who was planning to use CPR for the first time with his students. He and a colleague had teamed up (thereby cutting the workload in half – very smart) to create an assignment for their large sections of science students, and this professor was hoping I might know how to get the program to accept certain image files that his students needed to complete their assignment. Unfortunately, I had no idea how to do this and could offer only a recommendation of who else he might call. I’m guessing that the physics professors’ use of the program was more successful than mine since it was designed for applications in science classrooms, after all. When an email arrived recently announcing our campus’s exciting access to a new upgraded version of Calibrated Peer Review, however, I hit delete. It’s not for me, but I’m sure it can be useful to those physical science instructors for whom it was intended – I hope it is. I’ve also decided not to pursue SWoRD, but rather to learn from the experience of my colleague who piloted it and, algorithm or not, leave this program for others. I will still use the peer review function in Turnitin for online classes, but I’m back to in-class peer review workshopping in my on-the-ground classes for now.
I am, however, happily jumping on board the Camtasia Relay lecture capture bandwagon, creating instant audio-visual materials out of real-time or asynchronous instruction. This time I’m sure the software will make me a more efficient, effective teacher. I just need to make a couple more demo runs through the program to work out a few things…