The horizon is not so far as we can see, but as far as we can imagine

Why Human Judgment Must Be Primary Over Metrics (Academic Edition)

So, you’re all probably aware of the replication and fraud crisis in the scientific community. Psychology’s been hit hardest, and the social sciences, but the physical sciences have not been immune.

Retractions have risen sharply in recent years for two main reasons: first, sleuthing, largely by volunteers who comb academic literature for anomalies, and, second, major publishers’ (belated) recognition that their business models have made them susceptible to paper mills – scientific chop shops that sell everything from authorships to entire manuscripts to researchers who need to publish lest they perish.

These researchers are required – sometimes in stark terms – to publish papers in order to earn and keep jobs or to be promoted. The governments of some countries have even offered cash bonuses for publishing in certain journals. Any surprise, then, that some scientists cheat? (my emphasis)

And these are not merely academic matters. Particularly when it comes to medical research, fakery hurts real people. Take the example of Joachim Boldt – the German anesthesiologist who, with 186 retractions, now sits atop the Retraction Watch leader board of scientists with the most pulled papers.

The key paragraph is #2: academics are judged on how many papers they have, and how many citations those papers receive. Getting hired and getting tenure are based on them. Since it’s hard to get a full time real academic job these days, let alone get tenure, there’s a LOT at stake for academics. Publish or perish.

This isn’t how such decisions were always made, however. At one point, human judgment was given a much bigger sway. Hiring committees read the research, looked at teaching, and talked to the academic. Some academics published only a few papers, but they were good papers, and others were considered to have potential.

Such a system was subject to standard human abuse: hiring people who were liked, in effect, so an independent measure of academic excellent was sought, and what was come up with citations: if your research was important, presumably other academics would refer to it.

But any metric which is used to make monetary decisions is quickly gamed. If you must have those citations, many people will cut corners to get them. After spending 10 years to earn a Ph.D. the idea of being part of the large majority who either get no job or become associate profs, badly paid and treated, isn’t palatable.

For a long time this went on and cutting corners worked: the people inside the system were those who had benefited from it, after all. Everyone knew it was occurring but the incentives to prove it were lacking. Then some outsiders started looking, people funded with outside money, and they found a ton of fraud and sloppiness.

We keep doing this: we keep seeking metrics to cut out human judgment, but it can’t be done. It’s not that metrics aren’t useful, but, again, as soon as everyone knows what the metrics are, they game them. (Note how similar this is to Google’s early metric: how many links a webpage received. Remember how good early Google was before everyone started search engine optimization and Google decided to maximize monetization.)

The solution isn’t to find new metrics, and to get back on the treadmill, it is to go back to judgment, and to review the results over time with groups of outsiders and insiders.

You can’t outsource human decisions on who gets power to algorithms. It never works and it never will, as we’re finding out with “AI”.

Just bit the bullet and take responsibility.

This is a donor supported site, so if you value the writing, please DONATE or SUBSCRIBE.


The “It Can’t Be Done Because It’s Never Been Done” Problem


Quick Takes: Covid, China, Environment & More


  1. Curt Kastens

    Could this be a, It cant be done because its never been done problem?
    Or is it more like choosing an economic system, you can choose Capitalism which does not work for most of the people most of the time. Or can you choose socialism which does not work for a lot of people some of the time. Or, you can choose a mixed economy which has the same problems as try to draw national borders, meaning that no matter what the mix of capitalism and socialism one comes up with a lot of people are going to be disappointed, at least some of the time. The second way, using human discretion can be gamed as well. The choice of using metrics that can easily be gamed, or the personal subjective opinions of those in positions of institutional power looks bad in both cases to me.
    Either way failure may not be completely guaranteed but success is very unlikely. The chances of success can only be increased a bit with “proper” indoctrination (education) of young people. But since there is no consensus of what “proper” indoctrination is that is not going to happen.

  2. someofparts

    Returning to the wider use of human judgement could work where the basis for judgement still exists. Can we really say that is the case in a society as dominated by propaganda as this one? We are in the middle of an upsurge of covid and everyone has agreed to an official policy of just ignoring it. That is ideological blinders on steroids my friends. Looks like we have a long way to go to get back to a context where sensible judgement has a fighting chance of functioning.

  3. “The solution is to use judgment and to review the results”

    Imagine if society had behaved this way.
    The opioid crisis would never have happened because anyone who looked at the evidence would have known opioids were addicting, unsafe and lack long term effectiveness. Instead people pretended that whatever the pharma funded doctors/researchers said was gospel.

    Over 20% of the population wouldn’t be addicted to deadly psych drugs which in the 2010 book “An anatomy of an epidemic” Robert Whitaker showed worsen long term outcomes. Whitaker found hundreds of studies proving this. Psychiatry in response could not find a single study that showed their drugs improve mental health. Instead they hurled insults and yelled how you need to trust the “experts”. The experts being those whose income depends on selling the drugs.

    Pfizer’s own clinical trial for the Covid vaccines had 40%-50% worse overall health outcomes in the vaccine group (depending on if you used adverse events or morbidity). The vaccine group had more deaths and more hospitalizations as well. This is despite the study containing half a dozen major design flaws to make the vaccines look better. People not only didn’t even look at the stated results but censored those who did, forced others to get the vaccine all while shouting “follow the science” .

    A major cause of this problem is peoples egos and sense of morality get attached to a position. The result is they then have a painful time even reading the evidence because being right is now tied to their sense of self worth and morality.

  4. Willy

    Maybe we can get AI to grade these papers, check for plagiarism, or hell, just do all of this kind of work to free up our academics ‘for more important things’?

    I saw Frank Herbert in a Seattle supermarket once. I recognized the face but couldn’t place it, befuddled, and nothing came from the encounter. If I could go back I’d ask him for ideas about how to inspire a Butlerian Jihad. And then how to create a more integrous society, especially at the highest meritocratic levels. Most of the PhDs I’ve met since have been pretty uninspiring, not exactly what I’d call mentats.

  5. Joan

    Good teaching should be valued more. At a lot of universities, students can submit anonymous feedback surveys. Compare those to how the students do in subsequent levels of the topic to see if the teacher and curriculum adequately prepared them. If someone is really good at teaching the information so that students get it, that teacher shouldn’t have to publish at all in order to keep their job. The students footing the bill don’t care about these research articles that they probably don’t even have access to.

  6. sorry, the show is over

    A major cause of this problem is peoples egos and sense of morality get attached to a position.

    This is THE problem. The saying “It’s hard to get a person to understand something if their salary depends on them not understanding it” is only the tip of the iceberg. It’s not only a person’s salary – it’s their entire “ego-identity.”

  7. Feral Finster

    Goodhart’s Law: when a measure becomes a target, it cease to be a good measure. In other words, if you pick a measure to assess people’s performance, then we find a way to game it.

  8. Grey

    I recently finished reviewing 30 papers for a national-level conference with significant international interest. The limit for submissions was 10 pages. The manuscripts from one country in particular stood out, with what I will call the ‘Conspiracy of References’. 1 to 2 pages out of the 10 page budget went to references with many of them being of the ‘rain is wet’ variety. That country also had the primary source of the papers I rejected.

  9. Purple Library Guy

    Put it back in the realm of individual judgement and that brings us back into the realm of individual corruption and deception. Still probably an improvement, but not a solution.

    The real problem is the size of the incentives. If most academics could expect a fairly decent life, the motivation to mess things up would be much less corrosive in the first place.

  10. @Joan
    One problem with the schooling system is that it makes people dislike learning. One example is how it conditions kids to dislike reading. This is done by forcing them to read 400 year old-English books and having them do boring plot by plot book reports. Can’t think of a better way to get people to not read.

    Another problem is that the focus is on “giving a man a fish” instead of “teaching a man to fish”. School is set up to get kids to obey the teacher and memorize things kids (and adults) could care less about. The end result is a society that follows orders, doesn’t know how to think critically, and is emotionally trained to fear thinking for themselves.

  11. Curt Kastens

    How can society value good teaching when there is no consensus on what good teaching is. There is not even a consenus on who should be allowed to decide what good teaching is. This lack of consensus leads to another problem.
    Do feedback surveys work? I do not trust them myself. First of all the idea of such surveys is that those providing the info are actually qualified to pass judgement on someone else. Second of all if those passing judgement are providing info on a subjective thing, like did the proffessor inspire me, and the answer is no, the problem might be with the rater not the ratee. Even if a ratee gets a low score from a large group of people I say if does not count for much because it is certainly known that whole generations, and other subsets of people can be maladjusted, you can not neccessariy blame a proffessor who can not teach his students, or a grade school teacher who fails to teach the pupils reading because the teacher is using 400 year old english books.
    Using 400 year old english books worked just fine for teaching reading for at least 350 years. It has worked not only well in the english speaking world it has worked in China and India and South Africa as well. The problem with using 400 year old or 60 year old books is not that one can not learn to read using them, or that these books make school boring and unispiring. The problem with the old books is that they inspire racism, patriarchy, militarism, and other bad values. Which is exactly why these books are hard to replace, because so many people love them.
    American (US) schools suck because they are filled with violence, even the good ones.
    But as far as bordom goes pupils have to be taught the discipline to accept boredom as a neccessary part of life. In my view it is part of a proper indoctrination. I even think that bordom is a really a crucial ingrediant is encourging pupils to think for themselves.

  12. StewartM

    Hmm, I posted a long reply, only to have my internet connection die just went I hit ‘post’, and the web browser ate it.

    Do we actually use metric-based evaluations more? Perhaps we say we do. But I’m not sure we actually do, based on my experience. Or not universally so.

    In academia, the rule is ‘publish or perish’. Many departments of universities already in the 1990s were in the habit of hiring new professors, not granting them tenure, and saddling them with a huge teaching load. This of course, lightened the teaching load on the existing tenured professors to do research and publish papers. Then, using the justification of ‘you didn’t publish enough’ they’d send the young non-tenured professors packing and then hire in a fresh set. Rinse and repeat.

    Of course, this had two very bad effects–and Camille Paglia is absolutely right about this, and it’s as true in the sciences as it is in Paglia’s humanities. One, is that as the people doing the evaluations aren’t going to be as impolitic to criticize the output (or they simply are incapable of it) quantity gets valued more than quality. Paglia wrote (paraphrasing) that ‘it’s far better to produce one quality work in an academic career than a mountain of trash’. But the system rewards the ‘mountain of trash’ instead, and in any system one gets what one actually rewards. Two; the students–particularly the undergraduates–get cheated of time spent interacting with top-notch professors. And isn’t *education* supposedly the core purpose of having higher education in the first place?

    So in academia one could make the argument that the metric distorts the output. However, I have less direct experience in academia.

    In the corporate world it’s ‘similar’ but different. In the corporate world there are metrics galore, as our MBA-trained business managerial class striving to maximize ‘shareholder value’ have been trained to discover and measure an organization’s ‘key outputs’ or ‘key result areas’. However, once you dig down a bit, the situation becomes more complex. By my experience, the metrics are more fungible than they are made out to be; and their job is more to justify denial or punishment rather than to reward. I have quite literally seen cases where person A had a factor of 10, 20, 30 times or more of an ‘objective output’ than person B, but it was person B who got the good ratings and promotions while person A did not. What seems to be more important than ‘objective metrics’ is:

    a) How well does your immediate supervisor think of you?

    b) How well does *his/her supervision* think of your supervisor?

    c) Does anyone higher up from those two tiers think well of you, or think poorly of you? (Dislike hurts you more than liking you helps you).

    d) What is the dynamic of the team that does the evaluations? Is your supervisor aggressive, would he/she push argue for you, or his he/she passive? Many of these teams contain ‘wolves’ who argue stridently for more funding, more capital, and promotions for the people in their groups, whether objectively justified nor not, while others are ‘sheep’ that will sacrifice their teams and like the ancient Romans, fall upon their own sword ‘for the good of the company’ even when the objective need is dire. Obviously wolves don’t play well with sheep.

    And what is the leader of that team like? Do they actively try to constrain the ‘wolves’ and point out the needs of the ‘sheep’? Or are they passive? Is there someone outside your nominal group that will make a pitch in your behalf (rare, as this behavior is not rewarded at that level, but it occasionally happens).

    What you get, thus, despite a plethora of ‘objective metrics’ being in-place is something akin to ‘high school drama’ or even (in the words of a colleage) “Romper Room”. Likes/dislikes/peeves play a larger factor than one would imagine, and how ‘objective criteria’ is actually used is akin to the old saw about how drunks use lampposts–more for support than for illumination. The same colleague said you could line up a group of people, ask them questions to ascertain the depth of their expertise, and then guess their title, and if you hit 50 % right you’d be doing well.

  13. Trinity

    What’s fascinating to think about is how Indigenous Peoples managed to survive and sometimes even thrive (into the present day, no less) without any kind of computing device, let alone any “metrics”. Best practices survived, things that didn’t work, didn’t.

    They managed to understand and deal with what are very complex processes with nothing more sophisticated than their own minds. They even figured out how to deal with crazy people who tried to do harm to the group (and other living things). They didn’t take anything for granted.

    They did so by just observing the world around them, codifying the knowledge (and wisdom) they gained into stories told to future generations of their people to ensure the continuation of the people (and other living things).

    Of course, it helped that they actually cared about the safety and success of future generations, which meant caring for the land, and suppressing overpopulation, greed, and pretty much all the bad things running amok these days.

  14. Flaser

    This topic really reminds me of the documentaries made by Adam Curtis (The Trap, All Watched Over by Machines of Loving Grace, Hypernormalization)

    One recurring theme of his work is highlighting this exact tendency of neo liberal technocrats to “cut out human bias” and thus “set us free”.

    There is a strong undercurrent of (an almost religious belief) in supposedly ideal systems that’s very similar to their similar obsession with free markets.

    Hmm… A study of the religious underpinnings of neoliberalism, or the religious nature of it would be quiet interesting.

    (Yves Smith has already collected a body of works proving how the ideology isn’t scientific in Econned)

Powered by WordPress & Theme by Anders Norén