Analyzing REF outputs: exploring new approaches

December 2022

REF—the Research Excellence Framework—is an evaluation of the research impact of UK higher education institutions. In a recent study, Science-Metrix, collaborating with Technopolis, investigated the possibilities for using machine learning and automated processes on REF 2021 data. The goal of this study, which was commissioned by a group of UK funders, was to deepen understanding of the UK research landscape within disciplines, sub-disciplines and research areas. The REF 2021 used Units of Assessment (UoA) to categorize outputs submitted by UK institutions; this study explored alternatives to that system, using a data-driven and automated approach that would allow more granularity or flexible categorizations that cut across UoA. Through several sub-projects, Science-Metrix tested different methodologies and explored their benefits and limitations.

Several recommendations came out of the study regarding classification approaches. In a sub-project involving medical research areas, Science-Metrix experimented with both bottom-up and top-down classifications, concluding that in the medical sub-disciplines, a top-down approach using machine learning (ML) provided a reliable classification of REF outputs that paralleled the categorizations of those outputs as submitted to the UoA. In contrast, bottom-up approaches provided a classification that was too granular to be meaningful for the REF exercise but could serve to identify emerging research areas.

In a second area of investigation, Science-Metrix took an experimental approach to building a thematic data set on the cross-cutting theme of “aging and gerontology” (a theme which was not part of the REF UoA classification, but which is highly relevant to one of the UK’s Grand Challenges). Publications on this theme were identified using a query-based approach and covered all scientific disciplines. The thematic query was then extended to incorporate not only traditional research outputs such as peer-reviewed journal publications, but also non-traditional outputs (for example, films or books).The use of the bottom-up approach on full-text (i.e., title, abstract, author, author keywords and reference) created unreliable results, because it included the text of references, which were not necessarily relevant to the topic, and author names, which might coincidentally match search terms. The study recommended structuring the full text to identify and search only certain sections (keywords, title, abstract) to reduce false positive results.

Similarly, in a sub-project on interdisciplinarity, a broader approach to measure disciplinary diversity of research was tested, again through the ML algorithm, with the goal of producing a new metric. More work will be needed to refine the use of this metrics on full-text, particularly for non-traditional research outputs.

In another sub-project on interdisciplinarity, an attempt was made to develop a new ML method to capture the degree of knowledge integration, from diverse disciplines (interdisciplinarity), using the full text of submitted REF outputs. Although interdisciplinary indicators capturing the disciplinary diversity of individual scientific publications using information on references and authors already exist, these indicators are not, contrary to a method built on full-text, directly applicable to non-traditional outputs. The results based on the new ML-based indicators were not conclusive at this stage, but Science-Metrix alternatives (using references and authors) worked out well to mitigate the risk and provided insightful results on interdisciplinary patterns in the UK. Further work is needed to improve the ML-based approach.

Overall, the study concluded that caution is warranted when automatic methods are used in assessment. Although automatic processes can be applied, substantial work is still required in prepping the data and automated methods “do not replace the need for thematic expertise and peer-reviewed assessment.” Expert guidance on thematic areas, for example, is still essential to shape classification methodologies and ensure meaningful results.

Read the report [PDF].

Image credit: iStock

gears hanging in the air above an open laptop