When it comes to using ChatGPT, are you a centaur or a cyborg?

A Harvard working paper, “Navigating the jagged technological frontier: field experimental evidence of the effects of AI on knowledge worker productivity and quality” (pdf) examines the performance implications of AI on realistic, complex, and knowledge-intensive tasks: in eighteen different tasks selected to be realistic samples of the types of work performed in an elite consulting firm, consultants who used ChatGPT outperformed those who did not by a wide margin, in every dimension and measuring performance across the board.
The conclusion is so unequivocal that the study, although not yet reviewed for publication, has already been widely commented on: the methodology used is solid, the sample large and well thought out, and the experiment rigorous. Furthermore, the sample base is about as good as it gets: 758 consultants from Boston Consulting Group, who obviously have the training and ability to make the best use of a tool like ChatGPT.

Consultants with access to ChatGPT-4 completed 12.2% more tasks on average than those without, and did so 25.1% faster, with significantly higher quality results (over 40% higher quality compared to the control group). These results applied to all consultants: those below the average performance threshold increased by 43%, while those above increased by 17% compared to their own scores. In the graph, you can see the distribution of output quality among all tasks: the blue group did not use ChatGPT, while the green and red groups did; and the red group also received additional training.
The thinking behind the study is also worthy of consideration: it contrasts the centaurs, who decide to continue doing tasks in the traditional way, with the cyborgs, who used the generative algorithm for all their tasks. Given that the boundary between tasks that are suitable for processing by ChatGPT and those that are not is extremely fuzzy, hence the “jagged technological frontier”, the criteria used by consultants to understand when to take the ideas or the result of the algorithm or when to modify it, reprocess it, doubt it or take it “with a pinch of salt” is fundamental, and is likely to differentiate the consultants who got the best performance. As always, it’s not the tool, it’s how you use it.
While still awaiting publication in a scientific journal, I found the study to be very well designed and developed, and above all, with a very strong conclusion: regardless of its obvious limitations, generative algorithms can be used to improve productivity across a wide range of tasks, from the most creative, such as coming up with ideas, to more mechanical or applied tasks, such as report writing, and the best results are achieved when used by people with the right skills.
The conclusions of the experiment coincide with my ideas about how this type of tool should be used in higher education: full access, so that students learn to integrate it into their workflows in a natural way, and not simply as a point of reference for certain tasks.
This is a very interesting experiment and one that those of us who work in education should think about.
(En español, aquí)






