AI dabbles in science

AI publishing falls flat (Part 7/8)

Dec 20, 2025

Still from "The Funniest Joke in the World" showing a man collapsing in mirth, clutching the joke. His name is Ernest Scribbler, and he is about to die.

This series of 8 posts started^⌘ by ridiculing a chatbot-assisted article about sex and robots, and a nonsensical graphic of a well-endowed rat, mindlessly made by artificial intelligence (AI). We’ve now seen worse.

More and more, AI is being recruited for evil. Mostly it’s about money, sometimes it’s about status, and a lot of the rest reflects people who are either quietly desperate or profoundly stupid. Strangely enough, there’s humour here too.

⚠A warning⚠

The still at the start of my post is from Monty Python’s The Funniest Joke in the World. The nonsense ‘German’ used to lethal effect reads:

Wenn ist das Nunstück git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput!

One author reports that when input into Google Translate, this produced the message ‘FATAL ERROR’.1 On similar lines, some consider Scott Tenorman Must Die to be one of the greatest sitcom episodes of all time. Whether you know the episode or hate South Park, an interesting observation is that Kenny McCormick, who repeatedly suffers excruciating deaths in this series, here dies laughing.

However, fatal laughter has its serious side. There are rare circumstances where vigorous laughter can trigger loss of consciousness, for example in Angelman syndrome, gelastic epilepsy, and Type C Niemann Pick disease. And very rarely indeed, people who have ion channel defects in their hearts can die laughing. Extreme emotion precipitates a fatal rhythm disturbance in the heart.2

Read on with due caution :)

Killer phrases

Where bots are recruited to behave badly, they’re often a bit crap at the job. A side effect of some deployments of such ‘artificial stupidity’ may be excess humour. Take the ‘tortured phrases’ that started appearing around about the time of COVID-19.

Subsequently shielding touchy data from malwares or web real; irregular woods phishing is troublesome.

This is from the Turkish Journal of Computer and Mathematics Education 2021, 12(7) 2649. Congratulations if you turned ‘touchy’ into ‘sensitive’; and the more obscure ‘irregular woods’ into the misuse of ‘random forest’. How about this quote, from the International Journal of Engineering and Technology’?

Clamor decrease is the way toward expelling commotion from a flag. All chronicle gadgets, both simple and computerized, have characteristics that make them vulnerable to commotion.

Tricky. For ‘clamor’ read ‘noise’; ‘flag’ corresponds to ‘signal’, as in ‘flag-to-clamor ratio’, I guess.3 The tricky term here is the one they use for ‘recording devices’. Now can you work out these?

‘profound neural organisation’, ‘colossal information’ and ‘data misfortune’.

Yep. Respectively these are deep neural network, big data, and data loss. But why should anyone do this? It’s tempting to blame incompetent translators, but the truth is more stark. We think the authors stole others’ work and used computer tools to make the substitutions, to avoid plagiarism detectors.4

Just for fun, here are a few more:

Fast Fourier Change—fast Fourier transform (ridiculously common)
Innocent Bayes—naive Bayes
Bosom peril—breast cancer (a rare treasure)
Counterfeit consciousness—artificial intelligence
Haze figuring—cloud computing
Worldwide parameters—global parameters
Subterranean insect settlement—ant colony5
Motor vitality—kinetic energy
individual computerized collaborator—PDA
glucose bigotry—glucose intolerance
mean squared blunder—mean squared error
bogus up-sides—false positives
unfriendly medication responses—adverse drug reactions

And last, but emphatically not least, ‘heterosexual structure of carbon’.6

The Problematic Paper Screener

In 2021, Guillaume Cabanac set up the Problematic Paper Screener. He teamed up with Alexander Magazinov and Cyril Labbé to write a paper on the subject that caught a lot of attention and raised a few laughs. They’ve extended the website to look for other defects too—papers written with SciGen and Mathgen, papers that cite already-retracted papers (“Feet of Clay”), papers that cite cell lines that don’t exist and refer to rubbish RNA sequences, hijacked websites and other increasingly deceptive products of AI ingenuity. According to Cabanac, a tool called Spinbot is responsible for a lot of the tortured phrases.

Papers have also been detected with clear clues like “Regenerate response” in the text, indicating their AI provenance. There’s a never-ending stream. But perhaps we’re missing the point?

Text screenshot from a 'problematic paper' with highlighting of ludicrous comment "I'm very sorry ..." in the middle of pseudoscientific sentence. — Problematic text is from The Conversation

Overconfidence?

Humans generally believe that they can detect AI-generated content. A large part of the time, we’re wrong. We’re unsurprisingly crap at spotting deception. And generative AI has now become really good at writing plausible scientific papers. Disturbingly, the performance of most AI-based detectors is far from stellar. Let’s explore by looking at recent attempts to “set a thief to catch a thief”.7 Some appear good:

Raidar is a tool built around the idea that when you feed an AI human-generated text, it has a relentless tendency to ‘improve’ it. However, if the text was AI-generated, then that tendency is far less, because it’s “already improved”. It seems to work.
xFakeSci is based on the relative frequencies of word pairs (bigrams). In human text, these tend to be rich, varied and a bit idiosyncratic; with AI, they choose a more limited set of ‘the best words’.
xFakeBibs is more recent and also depends on bigrams.
‘Watermarking’ is the idea that you can use a whole array of tricks to modify the distribution of words in an AI-generated text, without changing the meaning that much. Later, you can examine a text and see whether it was generated by a specific AI that applied the watermarking.

All of these are, however, doomed.

Image of Scylla not yet stolen from the Louvre

Missing the joke

Unfortunately, the above trickery is pretty much irrelevant. There’s a problem with swanky detection tools. There are two possible scenarios here, the oracular and the commonplace:

The Oracle sits on high, and makes ultimate pronouncements about the humanness of the text. The problem here is that there are limitations to usage—access, payments (likely) and bureaucracy. The oracle is isolated, and therefore largely ineffective.
The tool is everywhere. Anyone can use it, but then any would-be deceiver can test their text too.

This second point is the killer. This sort of ‘adversarial testing’ will inevitably make your tests useless soon after they are widely available. The deceiver just adjusts their text until it passes.

Every release of AI detectors is also accompanied by ‘humaniser’ tools that will bypass them. There’s no safe course. Even as you skirt the whirlpool of Charybdis, Scylla will reach out and pluck you from the boat. And this observation also misses the main point.

It’s about quality

There’s no magic. We sensibly abandoned^⌘ special pleading like élan vital, qualia and souls ages ago. Ultimately, there’s no reason^⌘ why AI won’t surpass humans at pretty much anything you can imagine (and some you can’t).

So what matters with Science—as always—is whether it’s being done right,^⌘ not who is doing it, AI or human. It’s perfectly conceivable that once they’re thinking consistently at Pearl level 3,^⌘ AIs may do better Science than us. It just so happens that they’re not there yet. Like some authors, actually.

It similarly doesn’t really matter that much whether we’re being deceived by people, bots, or a combination of the two. We’ve seen enough human deception in my past few posts to last a lifetime. And that was just a peek at what’s on display.

What really matters is spotting bad science and and flagging deception. This is never easy—Bradolini’s law.^⌘ We need to fret less about bots, and simply concentrate on upping our game.

Which, coincidentally, is the topic of my next post.

My 2c, Dr Jo.

^{⌘ This ‘of interest’ flag suggests you might profitably read my linked post on Substack.}

The previous post in this series is Karmaceuticals.^⌘

Now Google translate is more prosaic in its response.

Yet others with the long QT syndrome find laughter helpful. What a way to go!

You can even google this phrase. Try it.

Even a superficial literature search pulls up multiple papers similar to our Turkish one—and many of them seem to have copied others’ work! There’s Shwetha DK et al; another by Dutta AK (already retracted by PLOS One) and several more.

See the Evolutionary Computation Bestiary for an explanation, or consult my note here.

There’s a long list of others on Kaggle, if you’re interested.

Or, as Terry Pratchett notes in Guards! Guards!

The phrase ‘Set a thief to catch a thief’ had by this time (after strong representation from the Thieves’ Guild) replaced a much older and quintessentially Ankh-Morporkian proverb, which was ‘Set a deep hole with spring-loaded sides, tripwires, whirling knife blades driven by water power, broken glass and scorpions, to catch a thief.’

Dr Jo

Discussion about this post

Ready for more?