Say you want to extract the citation counts from someone's Google Scholar page (here from Jeremy Baumberg):
It could be of coursed copied by hand (the data is tool-tipped by hovering over), however this is impractical if you want to do it for several people (something I had to do for PLMCN24).
Peeking at the source, the data is in plain sight, being enclosed through a particular gsc_g_al class. This is, for instance, the number of citations for Baumberg in 2009:
<span class="gsc_g_al">1080</span>
The following command thus extracts the wanted data (from the page saved here as _Jeremy J. Baumberg_ - _Google Scholar_.html, which is the standard Google name) and saves it in file Baumberg.txt:
grep -oP '(?<=gsc_g_al">).*?(?=</)' _Jeremy\ J.\ Baumberg_\ -\ _Google\ Scholar_.html > Baumberg.txt
The year could also be extracted similarly although there is an extra varying style that would demand to make further filtering:
span class="gsc_g_t" style="right:451px">2009</span><span class="gsc_g_t" style="right:419px">2010</span>
So it is probably easier (it was in my case) to reconstruct the year axis backward from the number of items returned, since you are doing this, probably, within the same year!
I did that with the following Mathematica code:
DateThisList[list_, year_] := Module[{}, Transpose[{Reverse[year + 1 - Range[Length[list]]], list}] ]
And that's how I processed the files:
fncit = FileNames["*txt"] Do[cit[FileBaseName@fncit[[i]]] = DateThisList[Flatten[Import[fncit[[i]], "CSV"]], 2023], {i, Length[fncit]}]
Not all years of publications are shown, unfortunately (the first ones are chopped off), but the total amount is given. For Baumberg, for instance, he has 517 citations from before 1998:
43711 - Total[cit["baumberg"] [[All, 2]]] 517
This is, for instance, the citation counts for all the people nominated at least twice by the PLMCN24 program committee:
Note that the same can be done for citations to papers using the gsc_oci_g_al class instead:
grep -oP '(?<=gsc_oci_g_al">).*?(?=</)' paper-citations.html
It is interesting to compare scientists pairwise:
ratioCitations[name1_, name2_] := Module[{min}, min = Min[{Length[cit[name1]], Length[cit[name2]]}]; Reverse[cit[name1] [[All, 2]][[-Range[min]]]]/ Reverse[cit[name2] [[All, 2]][[-Range[min]]]] ]
There is much to extract from this. One compelling thing is when you start to get "established" or "settled" in your field, as measured by when you stop fluctuating wildly as compared to a more senior Author (say your Ph. D advisor in my case). For me, that happened around 2009.
It does matter less whether you plateau or increase/decrease as compared to the reference, but the large fluctuations mean you are still at the early-career stage, while when that smooths out, you probably have penetrated your market.
Here is the Mathematica Notebook if you want to play with your own scientists.