How should academic institutions best assess science and scientists? Burgeoning interest in this question accompanies a growing recognition of significant problems in how scientific research is conducted and reported. The right questions are not being asked; the research is not appropriately planned and conducted; reproducibility is lacking; and when the research is completed, results remain unavailable, unpublished, or selectively reported.
Such problems are connected to the processes by which scientists are assessed to inform decisions about their hiring, promotion, and tenure. Building, writing, presenting, evaluating, prioritizing, and selecting curriculum vitae is a prolific and often time-consuming industry for grant applicants, faculty candidates, and assessment committees. Institutions need to make decisions in an environment of limited time and constrained budgets. Many current assessment efforts consider primarily what is easily determined, such as the number and amount of funded grants and the number and citations of published papers.
Even for readily measurable aspects of a scientist's performance, though, the criteria used for assessment and decisions vary across institutions and are not necessarily applied consistently, even within the same institution. Moreover, many institutions use metrics that are well known to be problematic. For example, a large literature documents the problems with journal impact factor (JIF) for appraising citation impact. That faculty hiring and advancement at top institutions requires papers published in journals with the highest JIF (Nature, Science, Cell, etc.) is more than just a myth circulating among postdoctoral students. The JIF is still a benchmark that most institutions use to assess faculty or even to determine monetary rewards. But emphasis on the JIF does not make sense when only 10%-20% of the papers published in a journal are responsible for 80%-90% of a journal's impact factor. More important, other aspects of research impact and quality for which automated indices are not available are ignored. For example, faculty practices that make a university and its research more open and available through data sharing or education could feed into researcher assessments. Few assessments of scientists focus on the use of good or bad research practices, nor do currently used measures say much about what researchers contribute to society--the ultimate goal of most applied research. In applied and life sciences, the reproducibility of methods and findings by others is only now starting to be systematically evaluated. Most of the findings indicate substantial concerns. A former dean of medicine at Harvard University, Jeffrey Flier, has indicated that reproducibility should be a consideration when assessing scientists' performance.
Using more appropriate incentives and rewards may help improve clinical and life sciences and their impact at all levels, including their societal value. A number of existing efforts demonstrate the growing awareness of the need for reform as well as the range of approaches and ideas on the table. Large group efforts, including the Leiden Manifesto for Research Metrics and the Declaration on Research Assessment (DORA), both developed at academic society meetings, and both international in focus, are but two examples. Individual or small-group proposals for assessing scientists include one...