The scholarref tools: never deal with journal webpages again

Last modification on

Rationale

During the writing phase of an academic paper, common tasks include downloading PDFs of publications and getting their references into your bibliography. However, I am not a fan of navigating the slow, bloated, tracker-filled, and distracting webpages of academic journals and publication aggregators. For some reason, many publishers decided that clicking the "Download PDF" link should redirect the user to an unusable in-browser PDF viewer instead of providing the PDF file directly. While the majority of journal webpages provide formatted citations for their publications, these are inconsistent in style and content.

For these reasons, I constructed a set of shell tools called scholarref that allow me to perform most of the tasks without having to open a web browser. As the title of this post indicates, the goal of the toolset is to provide as much functionality a person might need during scientific writing without leaving the command line. The tools are under continuous development. At present I avoid roughly 90% of visits to journal webpages. I hope to get to 100% someday.

The scholarref design goals are the following:

  • Written as POSIX shell scripts with minimal external dependencies: Ensures maximum flexibility and portability.
  • Aim for simplicity: Fewer lines of code make the programs easier to understand, maintain, and debug.
  • Each tool should do one thing, and do it well: Let the users piece the components together to fit their workflow.
  • Return references in BibTeX format.

DISCLAIMER: The functionality provided by these programs depends on communication with third party webpages, which may or may not be permitted by law and the terms of service upheld by the third parties. What is demonstrated here are examples only. Use of the tools is entirely your own responsibility.

Installation

$ git clone git://src.adamsgaard.dk/scholarref
$ cd scholarref
# make install

The make install command may require superuser priviledges to install the tools to /usr/local. Prefix with doas or sudo, whatever is appropriate for the target system.

The scholarref toolset

The core functionality is provided by the scripts getdoi, getref, and shdl. All programs accept input as command-line arguments or from standard input (stdin). The programs come with several OPTIONS, and it is encouraged to explore the help text (invoke with option -h). The -t option may be of particular interest, since it tunnels all communication through Tor via torsocks (if available on the system).

getdoi

This tool accepts either names of PDF files or arbitrary search queries. If a PDF file name is supplied, getdoi scans the PDF text in order to find the first occurring DOI entry, which typically is the DOI of the publication itself. If an arbitrary query is supplied, the CrossRef API is used to find the DOI of the closest publication match. You can supply author names, parts of the title, ORCID, journal name, etc. Examples:

$ getdoi damsgaard2018.pdf
10.1029/2018ms001299
$ getdoi 'damsgaard sergienko adcroft journal advances modeling earth systems'
10.1029/2018ms001299

getref

The getref tool fetches the BibTeX citation for a given DOI from doi.org. By default, the journal names and author first names are abbreviated, which is what most journals want. I have taken abbreviations from the Caltech Library list of Journal Title Abbreviations. The getref ruleset of journal-title abbreviations is incomplete, and is expanded on a per-need basis. If desired, the abbreviation functionality can be disabled. See getref -h for details.

$ getref 10.1029/2018ms001299
@article{Damsgaard2018,
        doi = {10.1029/2018ms001299},
        year = 2018,
        publisher = {American Geophysical Union ({AGU})},
        volume = {10},
        number = {9},
        pages = {2228--2244},
        author = {A. Damsgaard and A. Adcroft and O. Sergienko},
        title = {Application of Discrete Element Methods to Approximate Sea Ice Dynamics},
        journal = {J. Adv. Mod. Earth Sys.}
}
$ getref -j 10.1029/2018ms001299   # do not abbreviate journal title
@article{Damsgaard2018,
        doi = {10.1029/2018ms001299},
        year = 2018,
        publisher = {American Geophysical Union ({AGU})},
        volume = {10},
        number = {9},
        pages = {2228--2244},
        author = {A. Damsgaard and A. Adcroft and O. Sergienko},
        title = {Application of Discrete Element Methods to Approximate Sea Ice Dynamics},
        journal = {Journal of Advances in Modeling Earth Systems}
}

shdl

This tool takes a DOI as input and attempts to download the corresponding publication as a PDF through sci-hub. Unfortunately, the sci-hub web interface often puts up captias to restrict automated downloads. If that's the case, shdl opens the tor browser (if installed) or the system web browser in order to manually complete the download. Output PDF files are saved in the present working directory.

Usage examples

The scholarref tools are meant to be chained together. For example, if you want a BibTeX reference a search query, simply use UNIX pipes to send the getdoi output as input to getref:

$ getdoi 'damsgaard egholm ice flow dynamics' | getref
@article{Damsgaard2016,
        doi = {10.1002/2016gl071579},
        year = 2016,
        publisher = {American Geophysical Union ({AGU})},
        volume = {43},
        number = {23},
        pages = {12,165--12,173},
        author = {A. Damsgaard and D. L. Egholm and L. H. Beem and S. Tulaczyk and N. K. Larsen and J. A. Piotrowski and M. R. Siegfried},
        title = {Ice flow dynamics forced by water pressure variations in subglacial granular beds},
        journal = {Geophys. Res. Lett.}
}

The scholarref program itself is an aggregation of the getdoi and getref commands. If called with the -a option, the reference is directly inserted into the system bibliography. The full path to the bibliography file (.bib) is assumed to be set in the $BIB environment variable, for instance defined in the user ~/.profile.

$ echo $BIB
/home/ad/articles/own/BIBnew.bib
$ scholarref -a 'damsgaard egholm ice flow dynamics'
Citation Damsgaard2016 added to /home/ad/articles/own/BIBnew.bib

Integrating into your favorite $EDITOR

The scholarref tool is particularly useful if called from within a text editor. Below I demonstrate how keyboard bindings can be bound in various editors to provide scholarref functionality.

vi

My editor of choice is the plain, old, and simple vi(1). I have the following binding in my ~/.exrc, including a trailing space:

map qr :r !scholarref 

The rest of my editor configuration can be found under my dotfiles source code repository.

vim

You can add the following bindings to ~/.vimrc or ~/.vim/vimrc in order to get scholarref functionality within vim(1):

nnoremap <leader>r :r !scholarref<space>        " insert reference into current buffer
nnoremap <leader>R :r !scholarref --add<space>  " append reference into $BIB file

vis

The vis(1) editor is an interesting combination of modal editing and structural regular expressions from the plan9 editor sam(1). If desired, add the following binding to ~/.config/vis/visrc.lua:

vis:map(vis.modes.NORMAL, leader..'r', ':< scholarref ')

emacs

Don't know, figure it out yourself.

Integrating into your pdf viewer

My PDF viewer of choice is zathura(1), which has a minimal graphical user interface and is keyboard-centric. The following configuration calls getdoi on the currently open file if I press Ctrl-i. The resultant DOI is copied to the clipboard. Similarly, Ctrl-s tries to extract the DOI in the same manner, but fetches the accompanying reference and adds it directly to the bibliography.

map <C-i> feedkeys ":exec getdoi --notify --clip '$FILE'<Return>"
map <C-s> feedkeys ":exec scholarref --add '$FILE'<Return>"

My full zathura configuration is available here.

Questions/bugs/feedback/improvements

Please get in touch if you encounter any. Improvement suggestions are best sent as patches by e-mail.