The article examines the effectiveness of large language models like GPT-3.5 and GPT-4 in generating feedback for tutoring practices. It focuses on evaluating praise identification through a correlation analysis that utilized a Modified Intersection over Union (M-IoU) metric. The results indicate a significant positive correlation between the M-IoU scores and human coders' ratings, underscoring the reliability of the metric. The study illustrates the potential of these models in educational settings, particularly in providing nuanced and constructive feedback for students to enhance their learning experiences.
The findings revealed a significant positive correlation between the M-IoU scores and the ratings from both individual coders, highlighting the reliability of our M-IoU metric in evaluating praise.
By conducting a correlation analysis, we validated the efficacy of our Modified Intersection over Union (M-IoU) metric in assessing how well GPT models identified praise elements.
The descriptive statistics of the scores rated by human coders and the M-IoU scores showed a strong correlation, confirming the effectiveness of the GPT models in feedback generation.
This study demonstrates the potential of large language models like GPT-3.5 and GPT-4 in education, particularly in analyzing and generating effective feedback for tutoring.
Collection
[
|
...
]