How to Create and Use Formative Assessments at Scale
A version of this blog post was originally published on Dr. Stephens-Martinez's blog.
Codio asked if I'd be interested in sharing my expertise on scalable formative assessments through a webinar on July 14th, 1PM EDT (slides). I liked the idea given our future for the next academic year. Moreover, it was an opportunity for me to consolidate my thoughts on the scalable formative assessment techniques I use and reflect on how I currently do it, as well as how I can be better. To help with my thought process, I wrote this blog post. Then this post could also provide all the relevant resources and references in one place, like a companion piece for the webinar.
For context, when I'm referring to things like lecture, I'm talking about in-person live lectures because that's where my experience lies. I do write about ideas for remote teaching since that is the world many are likely facing.
For full disclosure, I am getting an honorarium from Codio for doing this webinar. However, I do not use Codio nor have plans to use it in the future. They also included in their inquiry email that they were not interested in promoting Codio during the webinar. This was from their interest in providing professional development to the broader community.
Let's start with why you should even listen to my opinion. Since you are on my blog, I'm going to assume you already know some of my background. My understanding of this webinar's main topic stems from my dissertation work and applying what I learned to my teaching since I graduated.
A large part of my dissertation, "Serving CS Formative Feedback on Assessments Using Simple and Practical Teacher-Bootstrapped Error Models" focused on analyzing students' wrong answers from code-tracing, constructed response, answer-until-correct questions. Now that's a lot of complex adjectives, let's break them down. Code-tracing is when a student reads through code to predict the output of the code. Constructed response questions are where the student creates (i.e., constructs) the answer to the question instead of choosing the answer from a set (e.g., multiple-choice questions). Answer-until-correct questions are where the student is reanswering the same, identical question until they get it correct. The student cannot continue to the next question until they answer the current one correctly.
I collected all of the wrong answers students generated from an introductory CS class. I found that while the number of unique answers is large, the number of popular answers is not. I then inspected the most popular answers and tagged them using emergent coding. The tags represented our best guess at how the student arrived at that wrong answer and are in Appendix A of my dissertation (page 81, or page 93 of the pdf). An example of a tag would be a student believing that when a variable is not defined, rather than causing an error, the variable's value is the closest approximation to an "empty" value for that type, so 0 for ints or an empty string for strings.
What is a Formative Assessment?
I believe one of the best definitions of formative feedback, which formative assessments are a part of, is from Valerie Shute's 2008 review, "information communicated to the learner that is intended to modify his or her thinking or behavior to improve learning." An example of formative feedback that is not an assessment is a teacher asking a student how they study and giving them advice on how to improve their study skills.
However, most of the time, when we think formative feedback, we think of using assessments. And that's why I am focusing on them in this webinar. It's easiest to define formative assessments by comparing them to summative assessments. Broadly speaking, formative assessments are given while learning is happening and are low stakes. Summative assessments are given when learning is done and are higher stakes because they usually significantly contribute to the class grade. I like to think of these two distinctions as two extremes of a continuum.
Given our current times, I think it's important to understand ways to assess how students are doing continuously at scale, rather than only sampling two or three times throughout the term (a.k.a. exams/summative assessments). So this webinar focused on formative assessments. I will also explain how it can be used as formative feedback for the student and how I, as a teacher, can use the information to improve the learning happening in class.
How I Use Formative Assessments
This section goes into detail on two of my primary formative assessments. I will explain how I create these assessments, my class policies on them, and how I use their data.
My reading quizzes are based on the reading I assign to help students prepare for lecture. They are administered in our school's learning management system (LMS). A lecture's reading quiz is due when lecture starts.
I create the quizzes from question pools that I split by lecture and concept. I reuse these questions every semester because they are only formative. Since the stakes are low, I'm not concerned about cheating (if students cheat to get these right, they are really cheating themselves). These quizzes are released whenever I have them ready and are always ready by the Friday before they are due, so the students have at least the weekend to work on them. They usually have between 5 to 20 questions, depending on what I am covering. Moreover, I add 2 questions in the front that I call "blast from the past," they are randomly chosen questions from a pool they have already done. Sometimes the question is one the students did very poorly. This is to help give the students spaced practice, which will make their learning more durable.
In terms of policies, quizzes are due right before lecture, but most students get them done the night before. Students can take the quiz up to 3 times. I limit them because most of the questions are multiple-choice and I don't want students to brute force guess the answer. After each quiz submission, students are told which questions they got correct and incorrect. They are not told which option was the correct answer if they got it wrong. My goal here is to get them to reflect on why they got the question wrong and not ruin the answer. All of this is to encourage them to use recall when answering the reading quiz. For a good discussion on recall, see the Cult of Pedagogy's podcast episode on 4 research-based strategies for learning. These quizzes make up 3% of the students' overall grade, and they only need 75% of the possible points to earn full credit. The intention of making it part of their grade is to get students to do it. However, lately, I've been reconsidering making part of their grade something that isn't directly measuring their mastery of the material. For more, I recommend reading KQED How Teachers Are Changing Grading Practices With an Eye on Equity.
I check the results of the reading quizzes right before lecture. I focus on which questions the students performed the most poorly on. Remember, students had potentially three chances to get that question right. So if many left it wrong by the time lecture started, it's a clear indicator of what students struggled with the most. Based on the quiz results, I then spend more time in lecture on what students struggled with and less time on what they did well on.
I use peer instructions in my live lecture right after I've covered a topic. These questions check if the students understood what I had just covered. Generally, the peer instruction process has students answer a question(s) individually, discuss the answers with a neighbor, and then answer the questions again. I use two identical Google forms with the same 1 to 4 questions. The only difference between the forms is for the second form I may make some of the questions optional.
My specific process is as follows.
- Round 1: Open the first google form to accept answers and share the bit.ly link to the form with the class.
- When enough students have answered, I privately check to see how well students did on each question.
- If students are > 75% correct: don't require the question for the second round.
- If students are < 75% and > 35% correct: tell them to discuss the question
- If students are < 35% correct: give them a hint and then tell them to discuss the question
- Round 2: After a minute or two, close the first form, turn on the second google form, and share the bit.ly link with the class.
- After enough students have answered, go over all the results and answers, as well as close the form.
Multiple papers show the benefits of using peer instruction in computer science, so I will not list them here. I will discuss my rationale for my own steps above. First, I open and close the form to encourage students to come to lecture if they want to earn the participation points. I believe students greatly benefit from discussing with their peers. I usually have someone from my teaching staff in class to handle the form logistics for me. Second, I don't have students discuss when 75% of the class is correct because odds are good that there will be little to discuss for that question in the groups. I'd rather they spend more time on the questions fewer got right. Finally, I do not show students the results of the first round until we are done with the second round. I do this because I do not want students to simply default to the most popular answer without arriving at their own conclusions.
To continue the thread that there are multiple ways to do peer instruction. Yes, you can do something different than what I do. Critical elements in the peer instruction are:
Students answer an assessment to find out whether they actually understand the concept.
Students discuss when it's clear the class does not understand the concept at a reasonable level, so discussion would be worthwhile.
Students answer an assessment again to confirm that the discussion benefited their learning, and they now understand the concept.
There are also occasions where students still do poorly on a question even after round two. This is a sign students are really struggling with this concept. At this point, I change the lecture plan to focus on that issue. This usually means I slow the lecture down and go through every incremental step in detail with frequent "thumb checks." A thumb check is where I ask the students to show me a thumbs up if they understand so far or a thumbs down if they are still confused and any thumb in between if they are in the middle.
In terms of the class policy, peer instructions are part of their participation points. This is worth 3% of their overall grade and they need to earn 75% of the possible points to earn full credit. So the same caveats from the reading quiz apply here. I collect student's emails in the form so I can track who participated. It does not matter whether they got the questions right, only that they submitted to at least one of the forms.
What about online?
Given that I'm describing peer instructions within the context of a live lecture, that begs the question: what if I'm teaching online? So if you are planning a synchronous lecture, you can do what I explained above and put students into break out rooms for discussion. I personally do not think requiring synchronous lectures are a good idea, though, because of equity issues. If students are at home, it is not necessarily the case they will have a quiet place and good internet to attend lectures at that specific time of the week.
If you plan a mix of synchronous and asynchronous content, here is my idea with the caveat that I have not tried this. Have the asynchronous content be things that happen every semester, such as the lecture material you would give. Chunk the material up per topic and have students do round 1 of the peer instruction asynchronously as they consume that material.
Then have students do the discussion and round 2 in the synchronous part of class. The discussion should be in break out rooms. A colleague pointed out that break out rooms take time. So if you are concerned about doing multiple breakout rooms during a class, put all of the peer instruction discussions at the beginning of lecture and consolidate the round 2 questions into a single form. This lets you only have one break out session during class with the added benefit of students also getting to socialize at the beginning of class with a clear topic to discuss.
This is only an idea, though, and I have not tried it. So your mileage may vary, and experiments are allowed to fail. If you do the above in your own class, I would love to know how it went. So sending me an email or adding a comment below would be great!
How to Write Formative Assessments
I will start by saying that this will be biased towards the kinds of questions I write. Still, hopefully, the process itself will be generic enough that you'll be able to adapt it to your own needs. I'm also focusing this on questions that can be autograded. The first section will discuss different kinds of questions. The second will walk you through how I write questions and common things to consider.
Kinds of Questions
In two of The CS-Ed Podcast episodes (a podcast I host), we discuss different kinds of questions. One of the episodes is with Dan Garcia. The episode's topic is writing exams. We discussed his "five finger rule" when designing exams and a bunch of different kinds of exam questions. The other episode was a Q/A session with Colleen Lewis. One discussion centered on her AP (Advanced Placement) CS principles work on non-coding strategies for assessing students.
Here is the long list of different kinds of questions that came up in both episodes.
- Simple fact check
- Predict the output of code
- Given code and an output, what was the input?
- What does this code do?
- Given code, could the function ever do X?
- Comment on code
- Find a case that reveals the bug
- Find a case that doesn't reveal the bug
- What class of inputs doesn't trigger the bug?
- What class of inputs does trigger the bug?
- Fix the bug
- Compare code
- What is similar/different about the code?
- Does code A and B do the same thing?
- Compare code between two languages
- Modify existing code
- Parson's problems - Where students are given a list of coding statements and must arrange them in the proper order
- Compare problem prompts
- How are these two problems similar/different in how they could be solved?
- Write a function that does X
I will be the first to admit I don't use all of these. In fact, now that I've created this list, I want to expand my assessment writing skills. In general, I use predict the output (no surprise there given my dissertation), simple fact check, write a function, what does this code do, and debugging. For formative assessment, I mainly use predict the output, simple fact check, and what does this code do. I've used a few question types outside of this list, but not consistently.
My Process for Writing Questions
I start writing questions with the end in mind. My goal is to learn what my students do not understand. So my first step is to think of common ways students misunderstand or get a concept wrong. Then my intention with the question is that if a student has this misunderstanding, they will get the question wrong. If the question is multiple-choice, an added bonus is students with the same misunderstanding will choose the same wrong answer. This clustering lets me target what I need to emphasize in lecture to fix that misunderstanding.
For example, my students struggle with pointers and nested lists. Common ways students misunderstand nested list are:
When two lists have a nested element pointing to the same list, they think there are two copies of that list.
When a nested list is added, it gets "unrolled" into the list, so there is no nested list.
So if I had a code tracing question like below, I'd have the following options (Option 1 is the correct answer):
lstA = [[0, 1], 2]
lstB = [3, 0]
lstB = lstA
What is lstB?
- [3, [0, 1, 3]]
- [3, [0, 1]]
- [3, 0, 1]
- [3, 0, 1, 3]
Option 2 is wrong because it taps into that first misunderstanding. Option 3 taps into the second misunderstanding I mentioned. Option 4 is a bit of a stretch because it'd require the student to understand about pointers but still believe the unrolling misunderstanding. After using this question for the first time, I'd probably check to see how many chose option 4. If not many, then it's not worth keeping because it doesn't provide me with much useful information.
Things to consider:
- A common mistake when writing multiple-choice questions is thinking the question must have a certain number of wrong answers. Don't worry about that. Writing a wrong answer that is obviously wrong is not helpful to you nor your students. As the teacher, you gain very little information except about the students that are both wildly guessing and guessed that wrong answer. As the student, if they are wildly guessing, answering this question is probably not the most helpful thing they could be doing with their time.
- Pay attention to the stem of your question:
- Have the stem highlight what topic the question is checking the student on (e.g., "Which statements are true? [various statements about colors]" becomes "Which statements are true about colors?")
- Fill in the blank questions increase cognitive load without adding any benefit (e.g., "Cars and ____ are both motor vehicles." becomes "Besides cars, what are other kinds of motor vehicles?"
- Thwart students' tactics for guessing
- Pick the longest or most complex answer → All choices are about the same length and use similar prose
- Avoid choices with the words "always" or "never" → Do not use these words
- If two choices are opposites, one is probably the answer → Include two wrong answers that are opposites
- Look for keywords related to the topic → Use keywords in the wrong answers
- True/False questions are more often True → Use True/False equally or avoid them altogether
So that's it! I hope that was helpful. I actually found writing this helped me think through how I can improve my own formative assessments. As Colleen and I discussed in our episode of the CS-Ed Podcast, you're always improving your teaching. And this blog post has definitely shown me that I have areas for improvement.
Dr. Kristin Stephens-Martinez
Dr. Kristin Stephens Martinez, Assistant Professor of the Practice at Duke University’s Computer Science Department and host of the CS-Ed Podcast.