Hirsch hacking
Journals go bad (Part 4/8)

Jorge Eduardo Hirsch is a very smart physicist. All the evidence that I can find also suggests that he’s a good bloke. In April 2006, he wrote a letter to George W Bush that warned of the dangers of using tactical nuclear weapons against Iran.1 He has vigorously pursued fraudsters in the field of high-temperature superconductivity, his domain of knowledge and excellence.
He is however most famous for the h-index. Some even call it the ‘Hirsch index’. Like many useful tools, it has been abused. We can even argue that a lot of the behaviour we see among the papermillers from my last post⌘ is a direct consequence of this index.
First, I’ll explain the index. Next, we’ll examine the forces that inexorably drive its abuse. Finally, we’ll look at examples both grievous and amusing.
What is the h-index?
Let’s say you’re a scientist who got lucky. Among your papers, just one is considered so prominent by other scientists that they refer to it (cite it) hundreds of times in their papers. Does this make you ‘great’? Is your impact on science far-and-wide? Is this a good metric? Very possibly not. Alternatively, you might be immensely prolific—but nobody cites you. Counting citations or papers produced both have their limitations.
Jorge Hirsch suggested a better measure:
I propose the index h, defined as the number of papers with citation number ≥h, as a useful index to characterize the scientific output of a researcher.
This is only slightly tricky. A concrete example from a nearly impactless person like myself will help. I have somewhere north of 30 articles and letters on Google Scholar, but my h-index is just 9, because just 9 of those articles are cited 9 or more times. If I try to plug in the value ‘10’, this doesn’t work, because the tenth article is cited just 9 times.
In contrast, heavy hitters like physicist Ed Witten have high h-indices. Witten’s was 110 when Hirsch wrote the paper, and is now an astronomical 214. That’s a very unusual number. Hirsch points out that after 20 years of hard slog, a solid research scientist might have an h-index of 20, and 40 is pretty exceptional.2
The h-index conveniently combines volume and citations. And naturally, like every metric, it’s prone to abuse. All you need is the wrong set of circumstances.
Goodhart, again
We know two things, and know them well:
Deming⌘ cautioned us that most of our behaviour is a consequence of our environment. We’re less free to choose than we imagine.
Goodhart’s law⌘ says that if you set a metric as a target, it becomes useless for measurement. People will game it.
Around the world, it’s obvious that academic institutions are mostly run by managers who don’t understand Science.⌘ They also don’t understand Goodhart’s law. So they hire and fire, promote and demote based on numeric targets.⌘ And because the h-index seems, and likely is, better than a lot of other simple metrics, they use it. So all we need to understand is how the h-index is gamed. That it will be gamed is a given.
Many academic institutions now live or die by their citation status. In fact, as shown by the comical superposition of the h-index over local dances in my parody at the start, entire nations have been ranked on their ‘h-index’.

* In German, ‘Ente’ is a duck and ‘Hirsch’ is a stag. Make of that what you will.
h-madness
As with every metric, there are anomalies. You can’t quantify genius with a single metric; success isn’t a number. For example, someone can have the most money in the world, and still be a complete jerk. There’s a powerful argument that the entirety of Sigmund Freud’s output is pseudoscience—but one calculation puts his h-index at 269.3
So grrl can we game this metric. Think about it. Although it’s frowned upon, you can self-cite—with the warning that some h-index calculations now automatically omit self-citations. The measurers wised up a little.
Alternatively, you can get others to cite you. Here “candy is dandy, but liquor is quicker”. You can pay people to cite you.4 You can even integrate this seamlessly into the products of papermills⌘ and predatory journals.⌘ Some anomalies may arise.
The Vickers curse
Yet another ardent hunter of questionable science is Alexander Magazinov, who graduated from Moscow State University in 2011. He has contributed a lot to PubPeer and For Better Science. In 2022, he and others spotted something very peculiar. An editorial in the Elsevier journal Animal Communication titled “When I’m Calling You, Will You answer Too?” was receiving hundreds of citations. The author is Niel J Vickers, an entomologist from the University of Utah.
For anyone other than moth biologists, this editorial about pheromone mixtures in the adult cotton bollworm Helicoverpa armigera is a bit obscure.5 Today, if I look up the article on the Dimensions website (free registration required) I find 1994 citations. Just one concerns moths. What on Earth is going on? Magazinov soon found the secret.
It turns out that papermillers are given long lists of digital object identifiers (DOIs) they must cite to bolster the authors’ h-indexes. It seems that if a Word list of DOIs encounters a newline, copy-and-paste into Google Scholar renders this as a space. Here’s an example.
10.1016/j. enganabound.2022.10.020The space is after “10.1016/j”. Previously Google Scholar parsed just that first bit, and popped up the moth paper. The mindless papermiller then copied it!
It’s a pity that Google has now fixed this bug, as it’s a wonderful way of spotting cheaters. But Neil Vickers got a paper out of it :)
A smart cat
Above is Larry Richardson, the world’s most cited cat. His owner servant Reese Richardson (also pictured) earned an h-index of 11 for Larry, simply by uploading multiple self-citing fake articles to ResearchGate.6
Creative authors have found almost endless ways of inflating their h-index. As we’ve just seen, authors upload PDFs to ResearchGate and stuff them with references. Google Scholar doesn’t verify sources. There are better-curated bibliometric databases (Web of Science, Scopus and so on) but these are not readily accessible, so people go for the free stuff: Google Scholar.7
You can achieve an impressive CV by registering your own website, one that looks like a university website but for a few letters, and then building citations around this. If you have a common name like, say, Muhammad Irfan,8 you might harvest articles by “M Irfan” and lay claim to them. And so on.
But the fundamental problem rests with institutions that set targets and measure using the h-index. They simply don’t understand Goodhart’s law, and that’s pretty much that.
In my next post I look at researchers who I think are worse than mere citation manipulators driven to desperation by institutional imbeciles. We will look at some bad players.
My 2c, Dr Jo.
⌘ This symbol indicates my other posts where you can read more on the linked topic.
The preceding article in this series is Papermills and Deceit.⌘
A dozen other physicists signed up.
My co-authors Atul Gawande and Alan Merry respectively have h-indexes of 102 and 60 on Google Scholar. The h-index also depends on the field you’re working in. If you’re researching, say, moth pheromones, don’t expect a vast number of citations, to choose a random example.
Google Scholar currently pegs him at ‘just’ 183.
The authors of this Nature: scientific reports piece purchased 50 citations after assuming a fake identity.
The picture is from the editorial, which describes how moths use smell to identify younger, less fecund females and left-swipe them.
Formal citations of animals as authors are uncommon, but include FDC Willard (Jack Hetherington’s cat), Galadriel Mirkwood (Polly Matzinger’s dog), HAMS ter Tisha (Andre Geim’s hamster), and the 3 Wamba bonobos sharing authorship with Sue Savage‐Rumbaugh, a primatologist.
Addendum: Subsequent to publishing this post on Substack, I found out that Larry is Reese’s grandmother’s cat, and the human in the picture is Reese’s dad. It pays to check :)
Microsoft Academic Graph was discontinued in 2021, and OpenAlex doesn’t quite fill its shoes.
Whose ‘homepage’ on Google Scholar refers to ResearchGate. He’s ostensibly a polymath with papers going back to 1967, on topics as varied as crop productivity, gender dynamics, nanostructures, the 1947 Poonch rebellion, skin photoageing, aerospace design, and tuberculosis diagnostics. Yeah right!



The thing that immediately strikes me with this metric-gaming is the similarities with internet search. Google revolutionised internet search with a simple idea from network theory - the importance of a web page can be gauged by how many incoming links there are. To put it crudely, if everyone is linking to this web page, it must be important!
It wasn't long before people started looking for ways to game the page-ranking system. You soon had paid links, link exchanges (you scratch my back, I'll scratch yours), link farms, hidden links (in white-on-white text); you had people registering multiple domain names, all linking to each other. In the early days, Google didn't even look at whether content was hidden, or if the links came from a page whose content was even vaguely relevant to the target.
Until social media became the dominant marketing arena, there was a constant arms-race between Google and so-called Search Engine Optimizers, a battle between gaming and game-proofing the system.
Fortunately I don't have to involve myself with this nonsense any more. It's probably going on in some other form, both in search and social media marketing.
Goodhart’s law⌘ says that if you set a metric as a target, it becomes useless for measurement. People will game it.
Half true
YES - People will game it!!
But you still NEED measurements and "Targets" if you are going to drive improvements
So it does NOT - Become useless for measurement. -
Your "metric" is still useful and nessesary -
But you need to be very sceptical and keep a close eye on it -
And NEVER use just ONE metric as a target
It's much more difficult to "game" it if you have to meet a number of targets