On statistics and virtue: ANVUR’s criteria for the evaluation of research quality (VQR 2020-2024) and the European reform of research evaluation (COARA)

The European Union, recognizing that quantitative evaluation of research harms quality for the sake of quantity, urged evaluators, universities, research institutions, and scholarly societies to join together in a coalition (COARA) for the reform of evaluation itself. Even the Italian evaluation agency. ANVUR, joined the coalition and committed itself to reducing bibliometrics to a complement to qualitative evaluation (peer review), which cannot be done without reading the texts. However, ANVUR has hardly fulfilled its commitments: it continues to use bibliometrics in the five-year evaluation of research quality (Valutazione della qualità della ricerca or VQR) and in the national scientific qualification for professorships (Abilitazione scientifica nazionale or ASN). In the SSH field, a list of scientific journals drawn up by experts appointed by the Agency is and will continue to be maintained, while in the STEMMfields, bibliometric criteria calculated on proprietary databases are still used. Despite the COARA commitments, these criteria are mandatory and not complementary for the selection of candidates and potential commissioners for the ASN, as well as for the eligibility of expert evaluators in the VQR. And in cases where they are formally complementary, such as in the evaluation of works subject to the VQR, the rule can be easily circumvented under the cloak of anonymous review.

Why did ANVUR not honor its signature? One possible explanation is that it is not an autonomous entity, and COARA may have made a mistake in including it in its coalition instead of the Italian Ministry of University and Research. However, the literature produced by scholars who are practically and theoretically close to Italian state evaluation suggests at least one other hypothesis: peer review involves reading of texts, a process that is not scalable. Moreover, that personal idiosyncrasies influence both the selection of evaluators and the evaluations themselves. This may explain why a state and mass evaluation agency such as the Italian one is inclined to cling to bibliometrics As a mass evaluation agency, it needs bibliometrics as a weapon of mass evaluation to maintain its pervasive power. And as a state evaluation agency, it can more easily hide its authoritarian nature behind a veil of statistics.

1. Bibliometrics in the VQR 2020-2024

State evaluation of research, especially when it is as hierarchical and pervasive as in Italy, is an act of mistrust in the freedom of public use of reason. This approach undermines the legitimacy of both state universities and research institutions, which are portrayed as so flawed that they need to be evaluated by a source outside the scientific community. Furthermore, it undermines the legitimacy of the government which enforces it as an administrative measure, without being a scientific authority.

Because state evaluation relies on coercion rather than on science, it is not surprising that many state evaluation agencies employ bibliometric criteria in their assessments. These agencies rely on the quantity of publications, the journals in which they are published, and the number of citations, rather than the quality of scholarship, which is incomprehensible to administrators. Even the European Union has recently acknowledged that the quantitative approach to evaluation results in quantity rather than scientific quality: governments always get what they want, often to discover that those outcomes were not, in fact, what they wanted. For this reason, the EU sponsored a coalition, COARA, with the objective of transitioning the evaluation process to a more qualitative approach, primarily based on peer review. The main commitments of those who join are as follows:

recognize the diversity of researchers’ contributions and careers;
base research evaluation primarily on qualitative evaluations focused on peer review, supported by responsible use of quantitative indicators;
abandon the inappropriate use in research evaluation of journal- and publication-based metrics such as JIF and H-index
avoid the use of rankings of research organizations in research evaluation.

The European reform is about how we evaluate, but it doesn’t address who, for what purpose, and with what legitimacy. Consequently, state agencies such as ANVUR, which have imposed and utilized predominantly bibliometric forms of evaluation, have also joined COARA.

As already noted, ANVUR has effectively disregarded commitments 2 and 3 by imposing mandatory and non-complementary bibliometric requirements on both applicants and evaluators involved in the ASN, as well as on expert evaluators who can be appointed or drawn by lot in the five-year evaluation exercise of universities and research institutions, bureaucratically called VQR 2020-2024.

On the other hand, the VQR 2020-2024 expert evaluators’ criteria meet COARA’s commitments. Indeed,

in the VQR 2020-2024 Research Quality Assessment exercise, the GEV assesses the quality of each product using the peer review methodology.[…] This approach also takes into account the provisions of the Coalition for Advancing Research Assessment’s second recommendation, which states that the assessment should be primarily based on qualitative aspects, for which the role of peer review supported by responsible use of quantitative indicators is central.¹<

However, we have to ask whether in the so-called bibliometric fields (STEMM) this respect does not run the risk of being merely pro forma. Citation indicators from expensive databases which are in the hands of commercial oligopolies such as Elsevier (Scopus) and Clarivate Analytics (WoS),² may “inform” the process of peer review. In other words, they can be used to support peer review, but not to determine it.³ However, there is no impediment to the continued influence of bibliometric data in the secrecy of anonymous evaluation, without the need for in-depth reading and human justification of judgments, since their writing could be entrusted to text generation systems marketed as “artificial intelligence”. ASN commissioners also have a heavy workload, which makes it tempting for them to use such tools. But since their names are known, they have to take public responsibility for the texts that are generated. Why are VQR evaluators granted the privilege of avoiding it?

2. A fragile compromise

COARA Commitments 2 and 3 require that peer review be the primary method of evaluating research, which necessitates the reading of texts. However, this does not preclude the use of quantitative and journal- and publication-based criteria, provided that they are employed “responsibly” and “appropriately”. But if evaluating researchers requires reading and understanding texts, how can we use indicators that require neither reading nor understanding “responsibly” – it is not clear to whom – and “appropriately”? How can we possibly view popularity within an expensive journal system, controlled by a few commercial publishing oligopolists, be considered a complement to peer review? We can also ask ourselves a more radical question: can pursuing research quality really be complementary to trying to get published and cited by the journals in this proprietary ecosystem, despite the unreliability of the literature produced under the pressures of “publish or perish” and bibliometrics, which have the effect of making research “not about curiosity anymore,” but “just a career”?

A recent article,⁴ The forced battle between peer-review and scientometric research assessment: Why the CoARA initiative is unsound, helps address these doubts. Bibliometrics – it asserts – does not assess the quality of research, but rather its impact. “Like any goods producer, more is needed for a researcher than merely producing high-quality research products; instead, they must be disseminated effectively, akin to the necessity for selling good”: the good researcher, in other words, must know how to sell his or her (paper) products, and bibliometrics measures this ability. Indeed, “it is not about curiosity anymore, it’s just a career.” “Would a company“- the article asks rhetorically – ”ever evaluate the success of a product already launched in the market by convening expert panels instead of relying on quantitative sales analysis?”

It happens in business that mediocre products, effectively promoted, are nevertheless highly successful. But when researchers are forced to sell themselves – or rather, give themselves away – in an oligopolistic pseudo-market because it is administratively imposed and circumscribed, we must treat their research papers as outputs of mass production systems and not as unique pieces of craftsmanship. And when it comes to making mass evaluations of mass productions, the article argues, we must recognize that qualitative evaluation becomes unreliable, if not unfeasible, under these conditions of overload.

So, is the use of bibliometrics as a weapon of mass evaluation inevitable? The answer is yes, but only if we want mass evaluations to continue.

3. A political question

The primacy of peer review, which is among COARA’s commitments, can only be systematically applied under one condition: that mass evaluation be minimized, if not eliminated. And here the interests of ANVUR, whose raison d’être and extraordinary power depend largely on mass evaluation, do not necessarily coincide with those of science.

Not surprisingly, a significant proportion of the agency’s actions have not aligned with its commitments.⁵ Furthermore, the recent action plan for COARA implementation continues to rely on bibliometric, journal-based evaluation methods rather than content-based assessment approaches. The action plan assumes the continued existence of lists of journals whose scientificity and excellence are directly or indirectly defined by the ANVUR (pp. 6, 7, 10) and the possibility of emancipation from the proprietary databases of Elsevier and Clarivate Analytics is never discussed, not even in the most general terms.

While its action plan (p. 2) presents ANVUR as an independent agency, Roberto Caso (pp. 9 ff.) reminds us that it is, in fact, not legally independent. In 2008, Fiorella Kostoris wrote:

ANVUR’s independence is undermined by its lack of third-party status with respect to the government and by the excessive control exerted by various stakeholders. Notably, all members of its Board are selected directly or indirectly by the Minister for Universities and Research (MUR) and report, inform, propose to the Minister or his department.

In light of the above, it is possible that ANVUR is evading the substance of the commitments it has signed, not because it does not want to, but because it cannot. And that COARA may have been mistaken in including it instead of the body who really makes the decisions, namely the Italian Ministry of Universities and Research.

And yet, on the Italian side, there may be deep-rooted reasons why an agency that clings to bibliometrics and presents itself as independent is a member of COARA. The aforementioned The forced battle between peer review and scientometric research assessment: Why the CoARA initiative is unsound remarks that peer review is heavily influenced by personal bias, and therefore it is better to rely on objective bibliometric experts, possibly supported by SALAMI, who treat scientists as “limited resources” whose use must be “optimized”.

Let us resist the temptation to reply that the use of statistics, whether automatic or not, as weapons of mass evaluation, still aggregates subjective biases. Indeed, authors with a long commitment to state evaluationhave asked: how can research evaluation be reformed in line with COARA standards, to balance or even replace bibliometrics? And they were answered that researchers should be judged on their virtues, i.e. on their motivations and character traits. But in the homeland of Giovanni Gentile, the suggestion that a government-appointed agency be tasked with judging the dianoetic and ethical virtues of researchers may evoke memories – perhaps welcome for some – of the fascist ethical state of the first half of the 20th century.

What would happen if a non-independent, government-appointed agency judged the character of researchers and ranked the most virtuous institutions? It would become much more evident that the state evaluation process is quite authoritarian. Conversely, an esoteric veil of statistics, preferably based on closed and proprietary data, diverts the time of others to tedious administrative procedures and impotent arguments about indicators among subordinates. And this ultimately serves to hide that the Italian research evaluation is not a scientific evaluation, but a state evaluation.

This quotation is taken from article 4 of the expert evaluators panel documents, which can be accessed here.↩︎
Although it is possible to conceive and foster open alternatives.︎ ↩︎
As stated in article 6 of the documents of the expert panels evaluating the so-called bibliometric (STEMM) fields, which can be found here. ↩︎
Its author is a researcher who also works as an official in the service of ANVUR. His positions, just because they are in structural conflict between the Mertonian reasons of science and the administrative reasons of the agency, deserve the greatest attention, at least from the administrative point of view. His article complains that the invocation of a “responsible” use of bibliometrics degrades the professionalism and scientific competence of bibliometricians. And this would certainly be true if the exercise of bibliometrics were only scientific, and therefore open to free discussion and adoption by communities of scholars, and not also administrative, and therefore unquestionable and imposed on the basis of direct or indirect governmental appointment and not on the basis of spontaneous recognition of scientific authority. ↩︎
In addition to the above-mentioned obligatory and non-complementary use of bibliometric criteria for granting participation as a candidate and as a commissioner in the ASN, as well as for the draw and appointment of reviewers in the VQR 2020-2024, we recall the attempts to deny or administratively downgrade the scientificity of open peer review and to minimize the open access requirement with regard to the works evaluated in the VQR 2020-2024, leaving it to the discretion of the publisher and to an extraordinarily long embargo (ASN and VQR after Italy’s accession to COARA, p. 7). ↩︎

This text is licensed under CC BY-SA 4.0 license

Accessi: 414