Making the Grade: Assessing the Assessment Capabilities of ChatGPT-3

Education
28th September 2023

Article by Peter Neal AMIChemE and Sarah Grundy

Peter Neal and Sarah Grundy put ChatGPT to the test to understand how it can reshape education

GENERATIVE AI is a force of disruption that is transforming what it means to teach, learn, and practice chemical engineering. For many educators, the explosion of readily available generative AI platforms over the last year may have felt like a sucker punch in a sector still recovering from the impacts of the pandemic. For others it was a new dawn, shedding light on a landscape of new opportunities and the consummation of dreams.

To help us understand how it would affect engineering education, we joined with colleagues from six other Australian universities to evaluate the output from ChatGPT-3 in assessment tasks from a broad range of science and engineering courses.¹ We found in many cases, ChatGPT could generate passable responses. This finding was even more striking as we watched the capabilities of generative AI improve while we researched, wrote, and published our paper.

To better understand how generative AI requires us to rethink chemical engineering education, consider how ChatGPT performed in common assessment types. When given closed question-type tasks (eg multiple choice, numerical, short answer, and some coding tasks), ChatGPT performed extremely well and often scored full marks. These tasks generally require the retrieval of facts or the application of well-defined processes – something AI can easily accomplish using its expansive training data. However, the advent of new and improved platforms means even graphical and symbolic tasks are now in reach of AI.

A simple response to that finding might be to ditch or devalue these tasks in favour of practical, oral, or project-based or reflective tasks. However, we found that in those tasks too, the AI could achieve component passes with moderate to major prompt refinement. For example, when given sufficient background information, AI can write quite authentic-sounding reflections, and generate plausible presentation scripts and report outlines. Microsoft’s Copilot system will be able to generate reports and presentations from notes or a briefing document.

Even when the AI failed assessments – particularly in research-type assessments (eg literature reviews, theses) – refining prompts through more specific guidance and supplying background improved the output. Issues with fake sources and working with scholarly information are now being overcome by Scite and other platforms.

Turning to vivas, invigilated analogue tasks and emphasising Q&A has its place in verifying learning. However, relying only on those methods will fail our students and our profession. As educators and engineers, we need to understand and explore this technology’s capabilities, and re-examine what it means to authentically assess our students.

Generative AI has great promise to help our students map out complex tasks, partner in concept generation and explore diverse bodies of knowledge. We must be there to guide and exemplify how to do this with integrity and responsibility.

When given sufficient background information, AI can write quite authentic-sounding reflections, and generate plausible presentation scripts and report outlines

References

1. Sasha Nikolic, Scott Daniel, Rezwanul Haque, Marina Belkina, Ghulam M Hassan, Sarah Grundy, Sarah Lyden, Peter Neal & Caz Sandison (2023) ChatGPT versus engineering education assessment, European Journal of Engineering Education, vol 48, Issue 3, pp559-614, https://www.tandfonline.com/doi/full/10.1080/03043797.2023.2213169

Tips on using AI at university

If you missed it, Christopher Honig and colleagues at the University of Melbourne published advice for educators and students on using generative AI as an ideation assistant; as a code reviewer; and having students use “the tool until it breaks” in order to become proficient in its use, understand its weaknesses, and better comprehend chemeng content.
TCE 984: https://bit.ly/44WQG4y

Elsewhere on AI…

A book: in The Coming Wave, AI pioneer and DeepMind co-founder Mustafa Suleyman writes that the future of humanity both depends on AI and synthetic biology and is endangered by the them: “If containing it is impossible, the consequences for our species are dramatic, potentially dire. Equally, without its fruits we are exposed and precarious.”
ISBN: 9781847927484; Penguin Random House; £25; 2023

A podcast series: In Humans vs Machines, AI entrepreneur and cognitive scientist Gary Marcus delves into the history of AI, and talks to engineers and scientists working at the forefront of AI to explain its risks and opportunities. Giving evidence to the US Senate he recommends that AI, like drugs, needs safety reviews before public release and that a nimble monitoring agency should be created to oversee and recall dangerous AI products: https://pca.st/2xvatsxi

An IChemE webinar: Chemical engineer and computer scientist Dinu Ajikutira is presenting a webinar on 25 September, shortly after we go to print, about real-world industrial AI solutions for process management and control. Join the Process Management and Control SIG here to watch a recording of the event: https://www.icheme.org/pmc

IChemE feedback on AI

In July, IChemE fed into a consultation by the Australian government on safe and responsible AI practices. It noted that systematic risk assessments used in the process industries could be effectively adopted to mitigate the risk of AI. It also warned that “any regulation should prevent the use of AI for circumventing or substituting good engineering judgement.” Read the full response here:
https://bit.ly/3rgynti