
Measuring GitHub Copilot's Impact on Productivity – Communications of the ACM
notes
it's been challenging to follow these studies, many of them entangled with the companies themselves desperate for marketing terms to boost sales.
it's fascinating to see the strongest measures of performance tends to come down to how users "feel". it's not a bad thing, and emotions and motivations are important when working with tools.
unfortunately these LLMs are quick to push tight integrations to support indefinite "vibe coding sessions"
when in reality (like with any tool) learning when and how to use these tools makes the difference between actually being productive vs being less productive because you are fighting the tool.
scaffolding unit tests, auto completing basic routines or even prompting a developer with new ideas that can quickly be dismissed feel like the sweet spot for many co-pilots.
certainly, there is room for improvement both in the technology and the UX, but until then, these studies make it hard to put a high price tag on these integrations for most teams.
link
summary
Code-completion systems offering suggestions to a developer in their integrated development environment (IDE) have become the most frequently used kind of programmer assistance. When generating whole snippets of code, they typically use a large language model (LLM) to predict what the user might type next from the context of what they are working on. This system allows for completions at any position in the code, often spanning multiple lines at once. This article investigates whether usage measurements of developer interactions with GitHub Copilot can predict perceived productivity as reported by developers. The study analyzes survey responses from developers using GitHub Copilot and matches their responses to measurements collected from the IDE. The findings indicate that acceptance rate of shown suggestions is a better predictor of perceived productivity than alternative measures. The study also finds that acceptance rate varies significantly over the developer population as well as over time.