Rethinking Assessment in the Age of AI

Maura Lyons

31 Mar 2026

•

3 min read

Generative AI is reshaping not only how students learn, but also how educators assess that learning. In our recent webinar, Codio’s Principal Research Scientist, Mohit Chandaraana, explored a critical question: What does mastery look like when AI can produce correct answers instantly? The answer lies in moving beyond the final submission.

The Problem with Traditional Assessments

The traditional assessment model is all about assigning a task and having learners complete the work that leads to the final product, which is then submitted for grading.

This process is still critical to how learning happens, but in the age of AI, the final product is no longer a reliable signal of mastery, as Generative AI can produce work that looks correct instantly. Learners can now bypass the traditional struggle and iteration that is necessary for learning.

The Evidence-First Shift:

With AI in the education landscape, educators are shifting from evaluating the learner’s product to evaluating the learner’s process. While correctness is still pivotal for assessments, we’re no longer just asking if they got the answer right. We also need to assess how the learner arrived at that answer, which involves evaluating the trail AND the artifact.

How do we do this?

Firstly, we can change what we ask. Instead of asking a learner to write code from scratch, we can move towards assessing their abilities to critique, evaluate, and inquire.

Secondly, we can change what we evaluate. By focusing on the learner process, instructors can gain a deeper understanding of student mastery.

Watch the full webinar

What is the education community currently doing?

We’ve heard from current educators about the various approaches they're experimenting with to either discourage AI use or better understand mastery, like:

Poison pill problem: to discourage AI use in the classroom, instructors embed secret LLM instructions to “poison” the answers given by an LLM
Pen and paper: for foundational skills, pen and paper assessments can provide critical insight into students' skills.
Interview-based assessments: educators prepare learners for industry-style interviews, where instructors and students go through every line of code together, discussing what the learner’s code does and how they reached their conclusions, or by randomizing interviews.

While these approaches can be successful, they are often time-intensive, lack immediate feedback, and are not built for scale.

Assessments from the research:

The ability to autograde assessments, provide learners with rich, immediate feedback, and scale are all critical for many educators. Luckily, research focused on “anti-shortcut” assessments for the AI era has yielded two types of assessments that factor those needs in: refute questions and probeable problems.

Refute Problems:

In a refute problem, students are given a programming task and a plausible solution. They are then asked to decide whether the solution is correct. If it is, they have to find a valid example, and if it's not, they need to find a counterexample.

These problems measure code reading, comprehension, boundary reasoning, falsification, and explanation skills. A refute question can easily be set up in Codio.

However, LLMs' capabilities are increasing at an unprecedented pace rendering refute problems no longer fully AI-resistant. The latest AI models are evolving to think and reason, and can solve complex problems, but as models evolve, so do researchers and educators as they develop different assessment types.

For example, the same group that researched refute problems has worked on implementing a new feature in the assessments: concrete, cluttered specifications. These specifications entail taking the core learning objectives and changing them into something more of a story, including elements that might not be relevant. A human will know to ignore the cluttered specifications, but an LLM won’t.

Probeable Problems:

LLMs are great at providing correct solutions when given the entire specification. But if the spec is incomplete, an LLM will struggle. This line of thought is the basis of probeable problems.

In these problems, we give learners an intentionally incomplete specification, and they must complete the spec before they can begin working on the assessment. This requires them to probe behavior before coding, infer edge cases, and iterate and implement. The problems measure experimentation, hypothesis-building, and judgment under ambiguity, all of which are crucial skills in a professional setting.

Learners go through an iterative journey with probeable problems: they must continually engage and think critically to determine the exact solution.

These problems can easily be implemented in Codio. Features such as hiding solutions from students and allowing learners to track their test cases and continually test their code enable the smooth delivery of these assessments, which are as easy to set up as writing regular coding tests.

Changing what and how we observe

While we can implement new assessments, concerns about AI agents and web-based LLMs remain valid, making it critical to evaluate and understand learners’ approaches and processes.

In Codio, Behavior Insights does just that: it tracks non-comparison-based learner process metrics and flags findings based on severity and instructor-selected configurations.

Based on a student’s behavior in Codio, Behavior Insights can tell an instructor their:

Time spent
Rate of edits
Coding vs debugging time
History of external pastes
Insertion vs deletions

Instructors can even play the student’s code back to see exactly where students struggled, whether they pasted anything, and how the pasted code looks, flagging LLM use.

Behavior Insights give educators a deeper understanding of learning performance and process, which is necessary in the AI era.

Codio’s evidence-based approach to AI tools:

Behavior insights isn’t Codio’s only evidence-based feature. We have always taken a measured and evidence-driven approach to developing new tools.

For example, our AI helpbot, Codio Coach, is research-based, and our team has continually investigated how learners use the tool. We believe that educators know their students’ needs best and should have control over the tools they use, so Coach is fully customizable and extensible. By helping explain errors, summarizing the assignment, and providing hints, Coach promotes independent learning, unlocks learner potential, and maintains academic integrity.

Speak with an educator on our team

{Ready} to transform your tech learning experience?

Watch Demo Try for Free