Pre-print, Post-print, Publisher’s Version: Who Cares and What Does It Mean for Open Access?

Doing research online on an everyday basis, more often than not I come across newspaper and magazine articles, blog posts, online CVs, postings on Academia and ResearchGate, tweets etc. that link to direct downloads to PDFs of very recent articles that do look like publisher’s versions of otherwise paywalled journal articles. Often they are indeed the publisher’s versions. (More on what ‘publisher’s version’ means below).

The reader clicks on a link and the joy of a free PDF download pops up on the reader’s screen. No questions asked.

This would require a much longer discussion, so this is just an initial contribution. The culture of direct links to PDF downloads irrespective of licensing within academia and to a more limited extent within journalism is troublesome for more than one reason.

I am not only refering here to so-called ‘dark social’ (Madrigal 2012) but to public sharing of deep links to PDF files that have often been re-hosted not on the publisher’s sites but on users’ personal cloud folders such as Dropbox folders and Google Drive. Enabling direct downloads to publisher’s versions of academic articles originally published online as paywalled articles circumvents the paywall, and renders said paywall (by definition a barrier) invisible to users. The direct link disconnects the PDF from the online record or full text HTML version on the journal’s web site. The article simply downloads onto our computers. Some think that if one is an Open Access advocate then one should be happy that research is being made available at all. To me it demonstrates a lack of understanding or willingness to be honest about the very restrictions that slow down the dissemination of scholarly work. Circumventing paywalls does not help to communicate why open access and therefore open licenses are so urgent in the first place.

It seems to me that in academic publishing it is indeed easier to ask forgiveness than it is to get permission, therefore there is widespread sharing of direct links to publisher’s version PDFs. Open Access seeks to legally create a culture of fairer, faster dissemination, exchange and reuse of academic publications, and I’ll argue here that the widespread tolerance of the circumvention of paywalls makes the case for Open Access much harder.If articles still circulate ‘freely’ amongst scholars and some members of the public through these direct links, it will not be clear to users what the problem with paywalled outputs is. As long as one is not in charge of a library budget, that is.

As journals and publishers seen as ‘reputable’ offer very expensive Open Access ‘options’ via article processing charges, many academics do not think twice about sticking to publishing their work as paywalled articles. Institutions, research officers and some (but not all) funders are happy for authors to paywall their outputs. In spite of being paywalled, these articles seem to continue being freely shared amongst those interested. (Grey areas around what is ‘fair dealing’ in educational networks do contribute to encourage the inter-academic sharing, at no cost nor friction to the receiver, of otherwise-paywalled PDFs). As a result many don’t wink at the fact there is a paywall somewhere on the main output online. Win-win, right? I think not.

In the 21st century, awareness of academic publishers’ copyright and self-archiving policies should be a key academic literacy. The publishing process is today, perhaps more than ever before, an integral component of the research life cycle. Publishing is not the work that strangers perform for authors; authors are directly embedded in the process and do take important decisions throughout that determine the ways in which research is produced, disseminated, consumed and potentially adopted or reused.

My view is that we need wider awareness of publishers’ and journals’ self-archiving and licensing policies. It is easy to find out what we as authors and readers can do online with published academic outputs. SHERPA/RoMEO should be bookmarked on every academic’s browser’s tool bar; the user can simply search by journal title or ISSN. Below, as an example, you can see a screenshot of the information I got after searching for a journal’s ISSN:

This is faster than trying to find out this information directly on the journal’s web site, but if there’s doubt it’s always a good idea to double check. The article that got me to check the journal policies above is paywalled, and I came across it as a direct PDF download via a newspaper article. Because of its formatting and layout characteristics and download watermark, the file that I obtained freely via a newspaper article appears to be a ‘Publisher’s Version/PDF’. According to SHERPA/RoMEO, that particular journal requires that the ‘Publisher’s Version/PDF’ is embargoed for 12 months. (Alternatively, it is also also possible this particular journal asks authors to use the publisher-generated .pdf as Post-print, but I would have to check).

The ‘Pre-print’, ‘Post-print’ and even ‘Publisher’s Version/PDF’ terminology can be confusing. Luckily, SHERPA/RoMEO recognised this and has useful information that seeks to clarify what is meant by them (link):

To try to clarify the situation, this listing characterises pre-prints as being the version of the paper before peer review and post-prints as being the version of the paper after peer-review, with revisions having been made.

This means that in terms of content, post-prints are the article as published. However, in terms of appearance this might not be the same as the published article, as publishers often reserve for themselves their own arrangement of type-setting and formatting. Typically, this means that the author cannot use the publisher-generated .pdf file, but must make their own .pdf version for submission to a repository.

Having said that, some publishers insist that authors use the publisher-generated .pdf – seemingly because the publishers want their material to be seen as a professionally produced .pdf that fits with their own house-style.”

That apparently “some publishers insist that authors use the publisher-generated .pdf” contributes to the opacity of publishers’ licensing information and to authors’ and readers’ confusion whether we are legally allowed to share the publisher’s version at all (i.e. a ‘version of record’ that features the type-setting, layout and design of the professionally produced publication. For CrossRef’s definition of ‘Version of Record’, see their Glossary).

Some publishers have been known to send take-down notices to some authors, but it seems that publishers either don’t have the capacity to ensure license enforcement or intentionally tolerate the practice. As in other forms of piracy, in the end the sharing of publisher’s versions when the journal’s self-archiving policy does not allow it perpetuates the culture of brand reputation and the wider dissemination helps promote the brand. When good citation practice is followed, the published version’s DOI and or URL gets clicked on and cited without having been the location that enabled access to the full version of the article in the first place. The paywall remains, and the libraries (some libraries) keep paying the subscriptions. Meanwhile, many within academic networks get the papers without ever having to log in to their libraries.

Recently, so-called ‘hybrid’ journals (essentially paywalled journals that also offer Open Access options) have confused things further by making some articles ‘free’ (but not Open Access due to non-open licensing and temporary nature of the free access allowed). The journal may be enabling free downloads of an article, but that does not necessarily mean an author or anyone else is allowed to share the PDF freely for indefinite time too. The small print must be read at all times.

If we want more colleagues and students to understand the reasons behind Open Access we need to communicate better what the effects of restrictive policies such as embargoes are. If authors are not respecting the licensing terms they have signed, and therefore fail to see the disadvantages, it is hard to demonstrate why open licensing is needed.

This will sound prescriptive and it is likely to be an unpopular opinion, but if authors decide to publish their work as a paywalled article, then they need to be aware that links to direct downloads of the publisher’s version PDF are most likely not to be allowed by the journal’s policy -otherwise what is the point of the paywall and in reserving all rights?

If authors want to share freely not pre-prints nor post-prints but the publisher’s versions of their newly published articles, they can seek funding to pay (or seek waivers for) the Article Processing Charges (APCs) for Open Access options. Even better, authors could choose to submit to non-APC, fully-fledged Open Access journals. In other words, if authors want to share their shiny, professionally-produced PDF of their article, they should ensure they have submitted to a journal with a copyright and self-archiving policy that allows such sharing.

Perhaps one of the main myths to debunk around Open Access is that it is an anti-copyright stance. It is quite the opposite. Precisely, it is because of an acute awareness of copyright and self-archiving policies why Open Access seeks to ensure there are legal and technological frameworks to enable academics to publish under more flexible paradigms. As long as the real, pragmatic obstacles to accessing and reusing academic research are avoided by most academics, it does feel like Open Access has a long, long road ahead.