GitHub Copilot has guided software engineers at the Australia and New Zealand Banking Group (ANZ Bank) toward enhanced productivity and code quality, and the trial was sufficient for the finance house to implement the generative AI programming assistant in production workflows.
From mid-June, 2023 through the end of July during that year, the Melbourne-based ANZ Bank initiated an internal test of GitHub Copilot that involved 100 of the firm’s 5,000 engineers.
The six-week trial, comprising of two weeks of preparation and four weeks of code challenges, aimed to assess how participants perceived using GitHub Copilot with Microsoft Visual Studio Code and to evaluate the impact the AI-based system had on programmers’ productivity, code quality, and software security.
The findings of the experiment have been detailed in a report with a title that could benefit from some enhancement: “The Influence of AI Tool on Engineering at ANZ Bank, An Empirical Study on GitHub Copilot within Corporate Environment.”
Co-authored by Sayan Chatterjee, cloud architect at ANZ, and Louis Liu, engineering AI and data analytics capability area lead at ANZ, the report refers to several previous studies about programming productivity with Copilot.
One study from Microsoft, which now possesses GitHub, discovered coding with an AI assistant improved productivity by over 55 percent – unsurprising given other vendor surveys.
An ACM/IEEE study on programming with AI assistance indicated robo-help was more of a trade-off: It discovered that Copilot generated more code, albeit the quality of software created was inferior to human-built software.
ANZ Bank intended to carry out its own evaluation, citing the potential benefit of AI on productivity while also acknowledging that the technology “raises inherent risks, uncertainties and unintentional consequences regarding intellectual property, data security and privacy.”
Those risks – highlighted by the ongoing copyright lawsuit against GitHub, Microsoft, and OpenAI over Copilot – aren’t addressed in the study, except as an nod to regulatory compliance.
“Before commencing the experiment, risks related to intellectual property, data security and privacy were assessed in collaboration with ANZ’s legal and security teams to arrive at a set of guidelines,” it said.
The bank experiment examined what effect Copilot has on: Developer sentiment and productivity, as well as code quality and security. It necessitated participating software engineers, cloud engineers, and data engineers to confront six algorithmic coding challenges per week using Python. Those in the control group were not permitted to use Copilot but were permitted to search the internet or use Stack Overflow.
“The group that had access to GitHub Copilot was able to finish their tasks 42.36 percent quicker than the control group participants,” the report says. “…The code produced by Copilot participants contained fewer code smells and bugs on average, meaning it would be more maintainable and less likely to break in production.”
Both of these results were regarded as statistically significant. As for security, the experiment was inconclusive.
“The experiment could not generate meaningful data which would measure code security, “the report says. “However, the data suggest that Copilot did not introduce any major security issues into the code.”
The data suggest that Copilot did not introduce any major security issues into the code
This may have been due to the nature of the challenges, which were designed to be short enough that participants could complete them along with their usual daily work. Consequently, the submitted challenges were fairly short and didn’t leave a lot of room for bugs, the report notes.
In terms of sentiment, those using Copilot felt positive about the experience, though not strongly so.
“They felt it helped them review and understand existing code, create documentation, and test their code; they felt it allowed them to spend less time debugging their code and reduced their overall development time; and they felt the suggestions it provided were somewhat helpful, and aligned well with their project’s coding standards,” the report says.
One intriguing finding is that Copilot was the most useful to the most experienced programmers.
“Assessment of productivity based on Python proficiency found Copilot was beneficial to participants for all skill levels but was most helpful for those who were ‘Expert’ Python programmers,” the study says, adding that the AI helper provided the most improvement (in terms of time saved) on hard tasks.
While observing that the mildly positive endorsements from participants indicate that Copilot can be improved further, the report nonetheless endorsed putting Copilot into production workflows at the bank.
“As of the writing of this paper, GitHub Copilot has already seen significant adoption within the organization, with over 1,000 users using it in their workflows,” the report concludes, adding that a broader investigation of the Copilot’s productivity impact is underway. ®
Counterpoint: AI assistance is leading to lower source code quality, researchers claim