The Schwartz factor of a CPAN author is the ratio of the number of tarballs sitting in his CPAN directory over the number of distributions. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).
As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:
#!/usr/bin/perl # see https://use.perl.org/~brian_d_foy/journal/8314 use strict; use warnings; use 5.10.0; use LWP::Simple qw/ get /; use List::Util qw/ sum /; my $author = 'YANICK'; $author =~ s#(.)(.)#$1/$1$2/$&#; # YANICK => Y/YA/YANICK my $page = get "https://search.cpan.org/CPAN/authors/id/$author"; my %dist; $dist{$1}++ while $page =~ /<a href="(.*)-v?[\d_.]+\.tar\.gz"/ig; say "Schwartz factor: ", keys( %dist) / sum values %dist; while( my ( $dist, $num ) = each %dist ) { say $dist, ' - ', $num; }
This is not exactly the most robust code I’ve ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Depending on which mirror site you’ll hit, the factor may vary a little bit.
But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:
The Schwartz is weak with this one.
The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.
The GreaseMonkey script is available on the userscripts.org site, and on GitHub.
No comments