In a script like doi2bib, you'd want to have a simple
$bibkey =~ tr/àáâãäåèéêëìíîïòóôõöùúûüçñ/aaaaaaeeeeiiiiooooouuuucn/;
to get rid of all the pesky accents; but with unicode, that doesn't work so well.
This is because the original string uses UTF-8 bytes where ñ in UTF-8 is two bytes c3 b1 and tr/// operates on characters; say on pdf = {sci/lópezcarreño25a}, you get:
pdf = {sci/lanpezcarreano25a},
The substitution
$bibkey =~ s/ó/o/g; # this would work but is not scalable
works but is not scalable (you need one line per character). Nightmare!
A way out, which I implemented in v°0.8.0 of doi2bib, is to use NFKD + decode_utf8 which handles all characters, like ä, ü, ç, å, etc., and is fairly straightforward. In the preamble, add:
use Encode qw(decode_utf8);
use Unicode::Normalize qw(NFKD);
and at the time of sanitizing your string:
$bibkey = decode_utf8($bibkey); # interpret bytes as UTF-8
$bibkey = NFKD($bibkey);
$bibkey =~ s/\p{NonspacingMark}//g; # remove diacritics
$bibkey =~ s/[^\x00-\x7F]//g; # safety: strip any remaining non-ASCII
What is done here is
and this works.