ChatGPT and Galactica are taking scientific papers to their logical conclusion

The research articles of the future will be nothing like the 10-page word salads that can be emulated by AI. What will become of them once you take away the prose?

Dec 05, 2022

“... taken to its logical conclusion, every story is sad, because at the end everyone dies.”
― Margaret Atwood, The Blind Assassin

The story of the academic paper has been long and illustrious, but last month we gleaned how it might end. An AI model was fed an academic word salad, and out came gibberish that read surprisingly like an academic paper.

Prompt: Artificial intelligence swallowing an academic article, Banksy style. AI image generated by DALL-E

Several camps formed in response to this development. One camp, led by the model developer Yann LeCun predicted that Galactica will greatly improve academic papers by serving as a writing assistant. The oponents’ camp claimed that this development is outright dangerous because it can generate fake scientific articles that sound authoritative. In less than 72 hours Galactica was taken down.

This is an argument as old as Wikipedia, the Internet, or the Gutenberg press. There have always been the optimists and the Luddites, and they have always disagreed on whether access to a new technology will revolutionize or destroy the world. Yet, there is a third camp, one that suggests that this kind of development will neither improve nor weaponize academic papers. Instead, it will drive them to irrelevance.

Bojan Tunguz @tunguz

I believe that scientific papers are vestiges of the bygone era, especially in the fields of DS/ML/AI. IMHO, ML researchers should skip paper/arXiv altogether. Just publish the code and your data. Explain what code does. Publish a few benchmarking scripts. That’s it.

vitalik.eth @VitalikButerin

@tylercowen My theory is different: the AI-driven commoditization of articulateness in elite registers of the English language is going to degrade its value, and by extension further degrade the social status of those who depend on it as a status marker.

I have spent the better part of a decade arguing that academic articles are already on the path to irrelevance. These arguments are partly based on personal experiences, having seen how the sausage gets made as an author, a reviewer, and an editor. But they are also based on empirical observations that papers are multiplying, that they are becoming longer and more difficult to read, and that scientists are having a hard time keeping up.

Are you really reading a paper a day? Or are you just skimming? Sources: 1, 2

In reality, we have stopped reading papers years (maybe decades) ago. Mark Strasser has convincingly argued that 'close to nothing of what makes science actually work is published as text on the web'. Scientists look at the figures, they read the captions, and very rarely spend more than 10 minutes glancing at the filler, which might as well be written by AI.

And regardless of whether Galactica survives, other language models such as GPT-3 by OpenAI prove that the genie is out of the bottle. Galactica outperformed OpenAI’s GPT-3 by 20 percent on technical knowledge probes, but how long will it take for other models to catch up? Just days after access to Galactica was closed, OpenAI released ChatGPT (based on an improved GPT model) and I am positive the scientific community will use it mercilessly for generating sections of papers and grant proposals. And this will further contribute to the exponential growth of BS in academic literature.

This is not necessarily a bad thing. As science is getting more computational, papers will either evolve or die. Either way, the academic article of the future will be nothing like the 10-page word salads that AI now so successfully emulates. The question is what will become of the academic paper once you take away the prose?

The obvious answer is that what will remain will be the figures, the data and the code, glued together by a minimal interpretation. And while AI is perfectly capable of emulating those as well, once we can verify the provenance of the research, from the experiment to the figures, via the data and the code, we will become immune to AI (or humans) tempering with the truth. And maybe we will also reduce our reading burden in the process.

There are already promising developments in this direction. The Galactica article bypassed traditional PDF publishers and was hosted on Papers with code, a project supported by Meta AI Research. And while similar initiatives such as Distill went on hiatus because they didn’t gain traction, projects such as NeuroLibre hope to survive long enough for the rest of the academic community to catch up. The moment this happens, only high quality prose will stand a chance of ever being read by a human scientist.

Prompt: Artificial intelligence eating a word salad and outputting an academic paper, digital art. AI image generated by DALL-E

My favorite speculative fiction writer, Margaret Atwood, often likes to call death a logical conclusion to any process. And she is right that in the long run everything is dead, academic papers included. The only question is how long it will take them to get there. ChatGPT and Galactica greatly reduced the time horizon. I just hope it is short enough so I can say goodbye to 10-page PDFs in my lifetime.

“Had he been a lunatic or an intellectually honourable man who’d thought things through to their logical conclusion? And was there any difference?”
— Margaret Atwood, Oryx and Crake

P.S. Big thank you to Julien Cohen-Adad, Nadia Blostein, and Agâh Karakuzu for reading drafts of this essay.

P.P.S. For a version of this article written by AI, click here. The beginning of each paragraph was written by me, and submitted as a prompt to GPT-3.

KANTAROT / КАНТАРОТ

Discussion about this post