
"One evening, my partner Boyan Li sat at the kitchen table marking student submissions for a coding course he was teaching as part of his PhD at Harvard Medical School in Boston, Massachusetts. The assignment required students to implement a computational-biology algorithm on a given data set. Each submission demanded more than a quick check. He ran the code, examined the output and traced the logic line by line. Some submissions were clearly correct; others were clearly wrong. But many fell into a grey zone: they were partly right, but uneven in their execution or reasoning. These were the hardest to assess, and the most time-consuming."
"Assessing coding assignments involves deciding what counts as understanding, what counts as error and how much variation is acceptable. This resonated with my own research on student learning and development, which views educational activities as inherently relational: even something as seemingly mechanical as marking becomes a dialogue between the examiner and the learner. Seeing this interplay of technical skill and human judgement led me to ask: can generative artificial intelligence (genAI) assist in assessing without erasing the interpretative work that makes it meaningful?"
"Coding assignments seem to be especially well-suited to AI tools. Unlike essays, computer code follows clear structures and strict rules, making it easier to evaluate. My partner tested this idea using OpenAI's ChatGPT 5.4. He gave it the assignment prompt alongside the reference solution and asked it to assess a student's code for accuracy. In practice, ChatGPT mainly compared the student's code with the reference solution and struggled to recognize valid alternative approaches."
"It often focused on minor issues - such as lower compu"
A coding assignment marking process involves running student code, examining outputs, and tracing logic line by line. Some submissions are clearly correct or incorrect, while many are partially correct and uneven, creating a difficult grey zone for evaluation. Assessment requires deciding what counts as understanding, what counts as error, and how much variation is acceptable, turning technical checking into interpretive judgement. This judgement functions as a dialogue between examiner and learner. Generative AI is considered as potential support for assessment, especially for coding tasks with clear structures and rules. An experiment using ChatGPT compared student code to a reference solution and often failed to recognize valid alternative approaches, instead emphasizing minor issues.
#educational-assessment #coding-and-programming #generative-ai #student-learning #computational-biology
Read at Nature
Unable to calculate read time
Collection
[
|
...
]