Your Schwartz Factor on your CPAN Page

Posted in: Technical Track

The Schwartz factor of a CPAN author is the ratio of the number of tarballs sitting in his CPAN directory over the number of distributions. A low number indicates that it’s probably time for this author to do some clean-up (without fearing to lose the old tarballs, as they will always be available via the BackPAN, natch).

As such, I wanted to include a periodic check of my Schwartz factor to my monitoring system. Coming up with a script to extract the information from my CPAN home directory was simple enough:

# see
use strict;
use warnings;
use 5.10.0;
use LWP::Simple qw/ get /;
use List::Util qw/ sum /;
my $author = 'YANICK';
$author =~ s#(.)(.)#$1/$1$2/$&#;  # YANICK => Y/YA/YANICK
my $page = get "$author";
my %dist;
$dist{$1}++  while $page =~ /<a href="(.*)-v?[\d_.]+\.tar\.gz"/ig;
say "Schwartz factor: ", keys( %dist) / sum values %dist;
while( my ( $dist, $num ) = each %dist ) {
    say $dist, ' - ', $num;

This is not exactly the most robust code I’ve ever written — the parsing of the page should be left to HTML::Tree, really — but it’s doing what it’s supposed to do. Depending on which mirror site you’ll hit, the factor may vary a little bit.

But then I thought, why keep the fun offline? So I imported the logic into a GreaseMonkey script and I now have the Schwartz factor of CPAN authors added to their CPAN pages:

The Schwartz is weak with this one.

The script will not work for authors who dropped an index.html in their home directory, or if they use sub-directories, but I expect that they should be more the exception than the rule.

The GreaseMonkey script is available on the site, and on GitHub.

Want to talk with an expert? Schedule a call with our team to get the conversation started.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *