Strengthening the Open Science Ecosystem Through Preprints

What We Learned at the Preprints I/O Workshop

Chan Zuckerberg Initiative Science
10 min readJan 21, 2020

Preprints, or versions of manuscripts posted online by authors ahead of peer review, are seeing a strong increase in adoption and recognition among many communities in the biomedical sciences. As a vehicle for innovation, preprints allow scientists to build off of each others’ work faster. They also offer an opportunity to incorporate new types of content — and transformations of that content — into the scholarly record that anyone can access and build upon.

Aside from the emergence of preprints, the dissemination of scientific research is tightly constrained by tradition. Biomedical science research findings are typically shared via journal articles, which can accommodate only a static presentation of results that fit into a narrative structure. The publishing process is slow, data and code are often excluded, and the majority of journal articles remain hidden behind paywalls.

This makes it hard for practitioners, clinicians, patients and non-academic researchers to access results and resources to reproduce and build on existing research quickly. When rapid and open sharing occurs, it is usually in venues (like scientific conferences or within networks of collaborators) accessible only to researchers from well-resourced and established institutions, creating additional barriers to researchers from emerging countries or under-resourced areas, preventing them from participating in the scientific discourse.

Preprints are poised to change this. In addition to enabling rapid sharing, preprints also 1) offer novel opportunities for feedback and peer review; 2) improve the overall quality, integrity, and reproducibility of research outputs; and 3) help prevent scooping and incentivize early collaboration.

These benefits can be dramatically enhanced by third-party services (authoring tools, commenting platforms, and machine extraction projects) that act as both inputs and outputs to preprints. As arXiv founder Paul Ginsparg envisioned in the early 1990s, preprints can provide “a relatively complete raw archive, unfettered by any unnecessary delays in availability” on top of which “any type of information could be overlayed… and maintained by any third parties,” including tools for validation, filtering, and communication.

Preprints have also fundamentally changed journal policies and researcher attitudes toward sharing outputs ahead of formal publication. By permitting researchers to share content outside of the context of the journal, preprints allow for sharing new types of outputs such as data, research code, or methods. These products can be referenced, linked, or transformed into a narrative preprint as the research progresses.

While preprints as a whole still represent only 2% of the biomedical literature published every month, a strong community of early adopters is already beginning to experiment with such value-enhancing tools.

Fostering open practices through preprints

In December 2019, CZI convened a group of experts to identify opportunities that preprints can offer to foster open and more efficient collaboration among scientists. The Preprints I/O workshop, co-hosted by ASAPbio executive director Jessica Polka, brought together open science advocates, preprint service providers, researchers in biomedicine invested in rapid dissemination, scholarly publishers, and tool developers.

Participants in the Preprints I/O workshop.

The workshop focused on exploring open practices that can be built on top of preprints, in order to accelerate the pace of scientific discovery and collaboration. From the participants, we sought to understand:

  1. What are researchers’ unmet needs that preprints can help support? What would the world look like if these needs were addressed?
  2. What are the blockers? What stands in the way of meeting these needs?
  3. What opportunities should we (as a community) consider in the short term? What should we build?

The groups identified several emerging themes and opportunities in response to these questions, and we’re excited to share some of what learned.

1. The interoperability and enrichment of preprints needs to be enhanced

As works in progress, preprints are more suited than published articles to enrichment and integration with tools and services that can support the scientists’ needs, while still in the process of developing their research. It’s analogous to continuous integration in software engineering, or the iterative process through which code can be collaboratively improved and made more stable, efficient, and secure through contributions from multiple parties. Opportunities that the participants identified centered on:

  • Making preprints a “linking hub” for all associated outputs — data, code, protocols and methods, reagents, and other resources — that may help other scientists reproduce and extend analyses;
  • Developing automated processes to enrich and link preprints to other outputs, and perform additional checks and validations, in the form of post publication recommendations and badging, as a way of reducing the burden for researchers and editors. Projects such as manubot or editorial assistants like Whedon were discussed as examples of possible models;
Outbreak Science, a nonprofit founded in 2016, aims to support open data, open access, and open science in the context of epidemic responses.
  • Creating derivative works from preprints, such as stand-alone “methods papers,” or individually shareable figures and micropublications, through the help of semi-automated processes;
  • Creating frameworks for overlay journals, or collections of preprints curated and aggregated on top of preprint servers, catering to the needs and interests of individual scientific communities; and
  • Developing services to notify authors, readers and scientific communities at large about new versions of a preprint, the availability of a peer-reviewed article based on a preprint, or any flags and updates that apply to a preprint after its original creation.

The participants also identified several blockers to the implementation of these opportunities, including the lack of standards to facilitate interoperability and cross-platform integration; the inconsistent support for versioning and revision control different preprint platforms currently offer; and the lack of native, machine-readable content (for example HTML or XML) available in real time that automated processes can interact with.

2. Publishing smaller units of knowledge requires stronger incentives and more granular sharing

While preprints offer many benefits for researchers, they are still limited by replicating the format of a narrative journal article. Researchers can share smaller units of discovery in micro- or nano-publications, which might range from a single scientific figure accompanied by methods and underlying data down to a single data point accompanied by metadata. While researchers share such units in informal or private settings, public release of micropublications is in its infancy. Researchers are not rewarded for sharing micropublications, and they may feel vulnerable to being scooped because of their low visibility in the community.

One solution to encourage more sharing, paradoxically, may be to begin normalizing the sharing of these objects within private, close-knit communities to raise trust among participants. This would parallel the success of an early preprint system operated by the NIH in the 1960s, which sent photocopied research objects to a fixed distribution list. Crucially, such communities would need to avoid replicating existing biases and inequalities while piloting ways of disseminating smaller units of scientific knowledge.

Jessica Polka, Executive Director of ASAPbio, at the CZI Preprints I/O workshop.

3. Preprints can help remove key barriers to participation in the scientific discourse

One of the tenets of open science is to enable broader and more equitable participation in scientific discourse. Ensuring research outputs aren’t hidden behind paywalls or shared privately within limited communities means that more people can have unrestricted access to the products of science. However, access isn’t sufficient to ensure full participation in the scientific process. We identified three categories of obstacles:

  • Major inequalities in terms of who participates in peer review. Preprints offer, at least theoretically, an opportunity for anyone to comment on scientific outputs. However, public feedback and open peer review are often hindered by many researchers’ reluctance to share comments in the open. Ideas that were discussed as possible solutions to this problem include the notion of progressive disclosure to trusted circles and the exploration of review systems that support pseudonymity. The PREreview project is currently testing this model by involving more scientists in the peer review process, and by focusing on the role of preprints in fighting outbreaks. Building a progressive on-ramp to open as a process, as opposed to a single all-in decision to shoulder the risks of open science, can also be key for increasing adoption and participation.
  • Language barriers. A large fraction of the biomedical literature is in English, making its discovery challenging for non-native speakers. While machine translation may contain errors that could eventually be fixed with human intervention, machine translation tools provide a practical solution to increasing equity. Humberto Debat proposed an automatic preprint translation site called PanLingua; another attendee, Richard Abdill prototyped it at the meeting, and the two later continued to develop the site to create the version available today.
PanLingua, a tool to improve the discoverability of preprints across languages through machine translation, was developed as a prototype during the workshop.
  • Ownership and governance of preprint servers, which determine whose needs are served by them. For example, preprint servers centered in North America and Europe can exclude non-English speakers and be limited in the perspectives that drive policies and practices. As more publishers enter the preprint space, the application of journal-like brands to preprint servers may perpetuate the same barriers that exist in the peer-reviewed literature. Regional preprint servers, while able to promote a sense of community, may actually balkanize a scientific field, further marginalizing groups that currently lack attention. The group discussed possible models to best empower local communities to participate in coordinated, but decentralized models.

4. Funders should help incentivize preprints and openness

A significant part of the discussion bore on the issue of openness and reusability of contents shared by scientists in the form of preprints. The creation of derivative works, the ability to apply text and data mining to extract information from the body of a preprint, and the reuse of figures in educational materials — all critically depend on the availability of freely and to some extent, consistently licensed content.

While preprints platforms like In Review only support freely licensed content (under a Creative Commons Attribution license), other popular platforms like OSF and bioRxiv are license agnostic, leaving it up to the authors to decide whether or not to open their preprints for reuse or retain all rights. Because authors can be unfamiliar with licenses (and journal and funder attitudes toward them), this situation is far from ideal, but preprint platforms are often reluctant to be prescriptive about licensing options. Adding to this, many group participants acknowledged the complexity of explaining the benefits of licenses in plain language, adding more confusion to the researcher.

Participants in the Preprints I/O workshop share ideas.

There seemed to be strong consensus at the workshop that funders such as philanthropic foundations and government entities have a significant potential to spearhead and scale up the adoption of open practices. Many participants advocated a broader adoption of a preprints mandate, such as the one described in the Plan U proposal and similar to the mandate adopted by CZI for its grantees.

However, such a mandate would provide no guarantee that outputs are openly licensed and reusable on top of being immediately accessible. A proposal emerged during the group discussion to explore the possibility of asking funders to issue statements about recommended licenses for preprints. This would enable researchers to defer to their funders as to the best choice of a preprint when they don’t have a specific preference, while allowing platforms to be license-agnostic if they perceive the use of specific licenses as threatening to the growth and broader adoption of preprints.

Sustaining Preprints as a Vehicle for Innovation

While not immediately focused on user needs, an underlying theme of the group discussion at the workshop was the question of sustainability. Preprints are generally free to readers and authors, but the infrastructure to support them is not. Few preprint servers have a long-term sustainability plan, and some experience difficult in fundraising beyond initial seed funding.

However, arXiv, which is over 25 years old and is supported by libraries and individual community members through fundraising, proves that preprints can be sustainable. Each server and community may require a unique business model, ranging from fee-for-service (e.g., paid for by funders whose research is being disseminated), in-kind contributions from journals (who may benefit from preprint moderation), and library subscriptions.

Through this workshop and the generous contributions of its participants, we identified a number of strategies to enhance, reuse, and build on top of preprints — as well as the necessary underlying infrastructure enabling these integrations — to foster the adoption of open practices among researchers.

While many of these directions are at a nascent stage and will require additional user research and development to be explored, we are excited at the optimism and opportunities for innovation that the participants shared during the workshop. We look forward to applying these learnings to inform our strategies in supporting and advocating for open science practices in the future.

Jessica Polka, Executive Director, ASAPbio
Jessica serves as Executive Director of ASAPbio, a researcher-driven nonprofit organization working to promote innovation and transparency in life sciences publishing in areas such as preprinting and open peer review. Prior to this, she performed postdoctoral research in synthetic biology following a PhD in biochemistry & cell biology. Jessica is an affiliate of the Knowledge Futures Group and the MIT Libraries, a Plan S ambassador, and a member of the PREreview steering committee.

Dario Taraborelli, CZI Science Program Officer, Open Science
Dario is a social computing researcher and an open knowledge advocate. As the Science Program Officer for Open Science at CZI, his goal is to build programs and technology to support open, reproducible, and accessible research. Prior to joining CZI, he served as the Director, Head of Research at the Wikimedia Foundation, the non-profit that operates Wikipedia and its sister projects. As a co-author of the Altmetrics Manifesto, a co-founder of the Initiative for Open Citations, and a long-standing open access advocate, he has been designing systems and programs to accelerate the discoverability and reuse of scientific knowledge by scholars, policy makers, and the general public alike.



Chan Zuckerberg Initiative Science

Supporting the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of the century.