From about 2003, when I started to compile scientific references, till now, I have been entering references in my sci.bib bibTeX file by hand! That included 3549 entries according to bibtex-count-entries. This stops today.
I have always suspected this should not be terribly difficult to do, but I got bullied into doing by by Eduardo seeing me entering one such bibliographic record and telling me that zotero does it automatically. I had to explain this does not do quite what I want and need. In fact, better tools like doi2bib.org almost bring me there but, again, not quite, since my format is quite strict.
So on a very hot Sunday, I resolved to hack the code to do it. It happened, not surprisingly either, to be more complicated than I had expected, but a day against 20 years put the balance against me and my lazyness. Here are the details of the script:
The bibliographic information is first obtained from curl:
curl -LH "Accept: application/json" http://dx.doi.org/[...] > from_doi.json
where [...] contains the actual doi, e.g.,
laussy@covid:~$ curl -LH "Accept: application/json" http://dx.doi.org/10.1038/lsa.2015.123 > from_doi.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 217 100 217 0 0 927 0 --:--:-- --:--:-- --:--:-- 927 100 16695 0 16695 0 0 22274 0 --:--:-- --:--:-- --:--:-- 22274
I then rely on jq to parse it.
This produces the list of authors:
<from_doi.json jq -r '.author|map([.given,.family]|join(" "))|join(" and ")'
with output:
David Colas and Lorenzo Dominici and Stefano Donati and Anastasiia A Pervishko and Timothy CH Liew and Ivan A Shelykh and Dario Ballarini and Milena de Giorgi and Alberto Bramati and Giuseppe Gigli and Elena del Valle and Fabrice P Laussy and Alexey V Kavokin and Daniele Sanvitto
and thus, with plain firstnames, which maybe is good but in principle I do not keep that information, although this should probably be sorted at the bst level. Anyway, wanting to enforce that:
<from_doi.json jq -r '.author|map([.given,.family]|join(" "))|join(" and ") | splits (" and ")' | awk '{for(i=1; i<NF; i++){printf substr($i,1,1) ". "} print($NF)}'
The other bibliographic information is more straightforwardly extracted:
<from_doi.json jq '.title' <from_doi.json jq '."container-title-short"' <from_doi.json jq '.published."date-parts"[0][0]' <from_doi.json jq '.volume' <from_doi.json jq '.page' <from_doi.json jq '.DOI'
Quite regrettably, some journals replace the page with so-called article-number:
<from_doi.json jq '."article-number"'
One can also use the long name of the journal:
<from_doi.json jq '."container-title"'
To replace that with ISO 4 titles, I prefer to use sed:
<from_doi.json jq -r '."container-title-short"' | sed -f iso4
where iso4 contains things like:
s/J. Phys. B: At. Mol. Phys./jpb/ s/Light Sci Appl/lsa/ s/Phys. Rev. B/prb/
The same can (must) be done for special names too, in particular given the mayhem with Spanish names (which have two) or titles:
<from_doi4.json jq -r '.author|map([.given,.family]|join(" "))|join(" and ") | splits (" and ")' | awk '{for(i=1; i<NF; i++){printf substr($i,1,1) ". "} print($NF)}' | sed -f bibnames
with bibnames containing something like:
s/J. C. L. Carreño/J. C. {L\'opez Carre\~no}/ s/E. Z. Casalengua/E. {Zubizarreta Casalengua}/ s/E. d. Valle/E. {del Valle}/
The bibTeX key is, in my case, the first author's name, the last two digits of the year and a letter for lifting degeneracies.
Now everything can be packed in a perl script to provide the final output. This is (my version of) doi2bib, which works as follows: