Your Schwartz Factor on your CPAN Page

Posted in: Technical Track

The Schwartz factor of a CPAN author is the ratio of the number of tarballs sitting in his CPAN directory over the number of distributions. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).

As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:

#!/usr/bin/perl
# see https://use.perl.org/~brian_d_foy/journal/8314
use strict;
use warnings;
use 5.10.0;
use LWP::Simple qw/ get /;
use List::Util qw/ sum /;
my $author = 'YANICK';
$author =~ s#(.)(.)#$1/$1$2/$&#;  # YANICK => Y/YA/YANICK
my $page = get "https://search.cpan.org/CPAN/authors/id/$author";
my %dist;
$dist{$1}++  while $page =~ /<a href="(.*)-v?[\d_.]+\.tar\.gz"/ig;
say "Schwartz factor: ", keys( %dist) / sum values %dist;
while( my ( $dist, $num ) = each %dist ) {
    say $dist, ' - ', $num;
}

This is not exactly the most robust code I’ve ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Depending on which mirror site you’ll hit, the factor may vary a little bit.

But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:

The Schwartz is weak with this one.

The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.

The GreaseMonkey script is available on the userscripts.org site, and on GitHub.

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *