Scheduling a reboot via puppet after a kernel update with RHEL style distributions

Posted on March 30, 2023April 6, 2023 by lindsey

I’ve had people ask me how to schedule a reboot of their system via puppet if a kernel upgrade was installed. I like to use the ‘at’ command


class yoursite::common::rebootforkernelchange {

   exec {'schedule-kernel-reboot':
      command => '/bin/echo "/usr/sbin/reboot" | at -m 4:00',
      path => '/usr/bin:/bin',
      unless => ['test `atq | wc -l` != 0', 'test `uname -r` = `rpm -q kernel --queryformat "%{RPMTAG_BUILDTIME} %{VERSION}-%{RELEASE}.%{ARCH}\n" | sort -nr | cut -d" " -f 2 | head -1`']
   }
}

This compares the current kernel against the latest built kernel that is installed, and if they don’t match it sets an at job to reboot the system at 4am.

If you want to check the latest installed kernel instead of the latest built kernel, change RPMTAG_BUILDTIME to RPMTAG_INSTALLTIME.

Setting up GitLab as a docker container behind a secure reverse proxy on CentOS 7.x

Posted on October 29, 2020November 2, 2020 by lindsey

Last weekend I decided to install GitLab for Mallorn Computing. I’d been meaning to do it for ages, but I never really got around to it. Surprisingly, I couldn’t find much about the process online, so I thought I’d document and share it.

I dutifully downloaded the installation package to a fairly empty server, ran it, and immediately freaked out. My server wasn’t just going to run GitLab; it was going to have Apache on it, other Redis databases, MariaDB, etc. GitLab didn’t seem to play nice with other applications and took over all kinds of ports. The install eventually failed when it realized it couldn’t bind to port 80 since Apache was already installed, and I was left with a mess (hint: the command to clean things up is gitlab-ctl cleanse).

I really didn’t want to set up a new dedicated server, so I hoped there was a Docker container I could install instead. Thankfully, a quick Google search showed there was.

The first thing I did was to install the Docker container:

docker run --detach --hostname git.mallorn.com --publish 4443:4443 --publish 2211:2211 --name gitlab --restart always gitlab/gitlab-ee:latest

Then I edited the GitLab config file and tried a bunch of settings for a few hours. Ultimately, I figured out which ones worked. Running docker exec -it gitlab vi /etc/gitlab/gitlab.rb opened a vi session editing the configuration file for GitLab. I had to change the variables below to the values you see.

   
   external_url 'https://git.mallorn.com:4443'
   nginx['ssl_client_certificate'] = "/etc/gitlab/trusted-certs/cert.pem"
   nginx['ssl_certificate'] = "/etc/gitlab/trusted-certs/fullchain.pem"
   nginx['ssl_certificate_key'] = "/etc/gitlab/trusted-certs/privkey.pem"
   gitlab_rails['trusted_proxies'] = ["172.16.1.10"]
   nginx['proxy_set_headers'] = {
     "X-Forwarded-Proto" => "https",
     "X-Forwarded-Ssl" => "on"
   }

Note that trusted_proxies is my internal network; set it to the IP of the interface on your proxy server that will talk to your GitLab installation.

Once that was done, I had to edit the ssh configuration file to tell it to listen on port 2211 (I now realize I could have just changed my startup parameters when I started the container, but too late now): I had to run docker exec -it gitlab vi /assets/sshd_config and change the port to 2211, then save the file. The last step was to restart the sshd process with docker exec -it gitlab /etc/init.d/sshd restart.

At this point I needed to import my SSL certificates because I had told GitLab to use certs that didn’t exist yet. I used letsencrypt to generate a certificate, then copied it into my Docker container:


   cd /etc/letsencrypt/live/git.mallorn.com
   docker cp -a -L cert.pem gitlab:/etc/gitlab/trusted-certs
   docker cp -a -L privkey.pem gitlab:/etc/gitlab/trusted-certs
   docker cp -a -L fullchain.pem gitlab:/etc/gitlab/trusted-certs

Your directory will be different; change into the one for your server. After this is done, restart your container by running sudo docker restart gitlab.

The last step was to create /etc/httpd/conf.d/git.mallorn.com.conf:


   <VirtualHost *:80 *:443>

      ServerName git.mallorn.com
      RewriteCond %{HTTPS} !=on
      RewriteRule ^/?(.*) https://%{SERVER_NAME}/$1 [R,L]
           
      SSLEngine on
      SSLHonorCipherOrder on
      SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
      SSLCipherSuite "EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH+aRSA!RC4:EECDH:!RC4:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS"
      SSLCompression off
      SSLSessionTickets off
      SSLCertificateFile /etc/letsencrypt/live/git.mallorn.com/cert.pem
      SSLCertificateKeyFile /etc/letsencrypt/live/git.mallorn.com/privkey.pem
      SSLCertificateChainFile /etc/letsencrypt/live/git.mallorn.com/chain.pem

      <Proxy *>
         Require all granted
      </Proxy>
   
      ProxyPreserveHost on
           
      SSLProxyEngine on
      SSLProxyVerify none
      SSLProxyCheckPeerCN off
      SSLProxyCheckPeerName off
      SSLProxyCheckPeerExpire off
      ProxyRequests Off
      ProxyPass / https://git.mallorn.com:4443/ nocanon
      ProxyPassReverse / https://git.mallorn.com:4443/ nocanon
      AllowEncodedSlashes NoDecode

      Header edit Location ^http://git.mallorn.com/ https://git.mallorn.com/
      RequestHeader unset Accept-Encoding
      RequestHeader set Host "git.mallorn.com"
      RequestHeader add X-Forwarded-Ssl on
      RequestHeader set X-Forwarded-Proto "https"
           
      LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\""
      ErrorLog /var/log/httpd/git.mallorn.com/error
      TransferLog /var/log/httpd/git.mallorn.com/access
   </VirtualHost>

I ran apachectl graceful to reload Apache, and from that point forward git.mallorn.com answered HTTP requests.

Next I had to make ssh work. Create an ssh key:

ssh-keygen -t rsa -b 4096 -f ~/.ssh/gitlab.id_rsa

Then set up ~/.ssh/config to reflect that key:


   host git.mallorn.com
    HostName git.mallorn.com
    Port 2211
    IdentityFile ~/.ssh/mallorn-git.id_rsa
    User [your login in GitLab]

Paste the contents of your ~/.ssh/gitlab.id_rsa.pub file into your key settings in GitLab and you will now be able to check files in and out via ssh.

Finally, set up a cron job to copy your SSL certs into the container regularly. This cron entry (saved as /etc/cron.d/gitlab) copies the certificates in and restarts the container if they have changed.

   5 30 * * * root (cd /etc/letsencrypt/live/git.mallorn.com ; find . -mmin 2 -exec "docker cp -a -L cert.pem gitlab:/etc/gitlab/trusted-certs; docker cp -a -L privkey.pem gitlab:/etc/gitlab/trusted-certs; docker cp -a -L fullchain.pem gitlab:/etc/gitlab/trusted-certs; docker restart gitlab)

Note that you could make the certs available directly to the container instead by starting it with the -v option, and that may ultimately be the route that I take.

Lastly, don’t forget to setup backups!

Randomly selecting records in MySQL… slowly

Posted on October 10, 2019October 11, 2019 by lindsey

A sample of our quick and (until now) painless image vetting page.

Sometimes it’s easy to become tunnel-visioned, and sometimes it takes a really long time to realize that you’ve done it.

This happened to me today. Well, the realization part did — the process started about three months ago.

We have a giant database at hort.net that contains lots of information about lots of images. We want to identify all of the plant images, so I wrote a little web page that displays fifty images at random from within the database, lets you click on the plant photos, and then sends all of your picks to the database when you’re done. It worked great when we only had 100,000 images to search through in the beginning. The page would load in about a second, fifty photos would be inspected and submitted, rinse, repeat.

Unfortunately, the image vetting process is a lot slower than the database population. We’re at a point now where the database has 1,000,000 photos, and that vetting page became very slow. Usually the page would just give us an error, but even running the database query by hand was taking 3-4 minutes. That’s just not acceptable.

Here’s what we started with as a query:

SELECT photo.id, url FROM photo LEFT JOIN vet ON photo.id=vet.id WHERE (vetting = 'none' OR vetting IS NULL) AND upload_date IS NOT NULL AND server > 0 ORDER BY RAND() LIMIT 0,50

That’s the one that worked great in the beginning, but failed with more records. After a little digging, I discovered that the problem was the RAND() function in MySQL. When you use ORDER BY RAND(), MySQL actually retrieves every row and then generates a random value for all of them. That means that I was running through the random number generator 1,000,000 times each time that the page was loaded, even though I was only displaying fifty.

Since the same table contained data for vetted and unvetted images, we couldn’t just randomly select fifty numbers from 1 .. COUNT(photo.id). Continuing down the same path, the fix was to increment a counter and prepend it to our search results.

SELECT photo.id, url FROM photo JOIN (SELECT @rownum := @rownum + 1 as row_number, photo.id FROM photo LEFT JOIN vet ON photo.id=vet.id  CROSS JOIN (SELECT @rownum := 0) r WHERE (vetting = 'none' OR vetting IS NULL) AND upload_date IS NOT NULL AND server > 0) b;

@rownum was now set to the number of matching database entries for the query and each row that was retrieved was numbered, but we still didn’t have a way to randomly select fifty entries that wasn’t slow. Just generating a random number is pretty fast, so telling it to generate fifty random numbers from 1 .. @rownum and then extracting only those entries is faster. I wrote a little script on the calling program to generate a lengthy subquery that was appended into MySQL’s IN function, and this is what we ended up with:

SELECT photo.id, url FROM photo JOIN (SELECT @rownum := @rownum + 1 as row_number, photo.id FROM photo LEFT JOIN vet ON photo.id=vet.id  CROSS JOIN (SELECT @rownum := 0) r WHERE (vetting = 'none' OR vetting IS NULL) AND upload_date IS NOT NULL AND server > 0 ORDER BY photo.id) b ON (photo.id=b.id AND b.row_number IN (CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum))) LIMIT 0,50

Sometimes this was fast(-ish). We’d see two minute load times every once in a while, but often it was around seven seconds. But when the database was under load, things were usually annoyingly slow. I kicked this around in my head for a couple of days, tried different things, and then it hit me (this is where the recognition of tunnel vision occurs, by the way).

I had become fixated on the idea that I needed to randomly select from the entire database, but in reality, I didn’t. Entries were added to the database fairly randomly already (more like chunks of 50 similar plants at a time), but I certainly didn’t need to choose from 1,000,000 plants. Randomly selecting from the first 10000 entries would probably work too. And there you have it. Query times were consistently shortened to under one second, and everything was right with the world again:

SELECT photo.id, url FROM photo JOIN (SELECT @rownum := @rownum + 1 as row_number, photo.id FROM photo LEFT JOIN vet ON photo.id=vet.id  CROSS JOIN (SELECT @rownum := 0) r WHERE (vetting = 'none' OR vetting IS NULL) AND upload_date IS NOT NULL AND server > 0 ORDER BY photo.id LIMIT 10000) b ON (photo.id=b.id AND b.row_number IN (CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum), CEIL(RAND()*@rownum))) LIMIT 0,50

Tightening up security, restricting by country

Posted on June 8, 2015May 9, 2019 by lindsey

Lately we’ve noticed that more and more traffic to our servers involves attempted brute-force logins. That means that a remote computer is connecting to our server and trying to guess usernames and passwords, with different variations of both, and hoping they’ll get in.

Things got to the point where we were seeing hundreds of thousands of login attempts daily for accounts like ‘admin’, ‘hortnet’, ‘wordpress’, variations of my name, and so on. That’s crazy! The attacks were also coming in bursts, so there might be a 30-second lull followed by thirty attempts within a few seconds. We weren’t overly concerned because we have other protections in place like two-factor authentication, but there were so many failed logins that our server performance was starting to suffer.

The first thing we did is what everyone else does: install software that blocks multiple failed attempts. OSSEC is great for that, but it only catches people after they’ve jiggled the door handle a few times. Even with OSSEC installed we were still seeing 50,000 attempts daily.

After studying the logs, a commonality became apparent. Almost all of the attacks were coming from other countries. Russia and China were the worst offenders, but just about every country around the globe was represented. What if we could limit administrative access only to computers based in the US? Nobody outside of the US needs to get in, and MaxMind maintains a database of IP to country code mappings that would work perfectly. Fortunately, they’re gracious enough to share a free version that will meet our needs.

We crafted some scripts and discovered that limiting access from only US systems reduced login attempts by 99.9 percent.

The first thing we did was install ipset on our server and configure the system to use it and our scripts (this is RedHat/CentOS-centric). Cut and paste the section below into your shell.

sh -c "PERL_MM_USE_DEFAULT=1 perl -MCPAN -e 'install Net::CIDR'"
yum install -y ipset perl-Net-CIDR perl-libwww-perl
echo "create -exist US hash:net family inet hashsize 2048 maxelem 65536" >> /etc/sysconfig/ipset
mkdir -p /var/cache/ipset
mkdir -p /usr/services/GeoIP
echo "ipset restore < /var/cache/ipset/us.conf" >> /etc/rc.local
echo "15 12 * * 2 root /usr/services/GeoIP/createUSipset" >> /etc/cron.d/ipset
/etc/rc.d/init.d/ipset start

This creates a set named ‘US’ in ipset that will store the IP ranges of all known US IPs. Note that the actual population of the set is done via /etc/rc.local — this is because /etc/sysconfig/ipset can’t contain a ‘restore’ command. We could store all of our IPs directly in /etc/sysconfig/ipset, but if you want to use ipset for anything different it becomes complicated. Also, note that our cron entry only downloads on Tuesday because that’s the day that MaxMind releases database changes.

Anyhow, once the stuff above is done, paste the section below into your shell to create the script that downloads the database.

cat <<'EOF' > /usr/services/GeoIP/createUSipset
#!/usr/bin/perl

use Net::CIDR;
use LWP::UserAgent;
use v5.10;
$|++;

my $ua    = LWP::UserAgent->new;
my $dir   = qq(/usr/services/GeoIP);
my $file  = qq(GeoIPCountryCSV.zip);
my $url   = qq(https://geolite.maxmind.com/download/geoip/database/$file);
my $cache = qq(/var/cache/ipset/us.conf);

exit if ! ($ua->mirror($url, qq($dir/$file)))->is_success;

open (IPSET, "|ipset restore");
say IPSET qq(create -exist US-new hash:net family inet);
open (COUNTRY, "zcat $dir/$file|");
while (<COUNTRY>) {
   next if $_ !~ /"US"/;
   my ($ipstart, $ipend) = ($_ =~ /^"([^"]+)","([^"]+)",/);
   foreach my $i (Net::CIDR::range2cidr($ipstart .  "-" .  $ipend)) {
      say IPSET qq(add US-new $i);
   }
}
close COUNTRY;
close IPSET;

system qq(ipset swap US US-new);
system qq(ipset save US | sed 's/create US/create -exist US/' > $cache);
system qq(ipset destroy US-new);
EOF

chmod ug+rx /usr/services/GeoIP/createUSipset
/usr/services/GeoIP/createUSipset

Now you have a set populated with US IP addresses — all we have left is to actually use it with our iptables rules. Here’s an example that blocks all non-US traffic to ssh, then saves the changes for a reboot:

# BE CAREFUL RUNNING THIS, AND ONLY DO IT IF YOU UNDERSTAND WHAT YOU'RE DOING
iptables -A INPUT -m set --set US src -p tcp -m tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp -m tcp --dport 22 -j DROP
iptables-save

The critical part above is the ‘-m set –set US’ bit; it tells iptables to compare the connecting IP against the set contained within ipset.

Detecting if a remote connection is mobile in Apache

Posted on April 12, 2015April 12, 2015 by lindsey

hort.net-mobile

As many of you have heard, Google is requiring that sites pass their ‘mobile usability’ test if they want to show up in mobile search rankings. After April 22nd, Google will start rolling out changes to their listings, and those who don’t pass the test stand to lose a lot of traffic.

I was pretty concerned until I realized that our biggest problem is all of the small menus around our content. That stuff works fine on a desktop, but fat fingers (compared to a mouse pointer) can’t discern between tightly spaced links.

So… Why not just make all of the menus disappear for mobile users?

It seemed like the easiest way would be to create an environment variable from within Apache. This would let me either use Server Side Includes (SSI) or make programmatic changes in scripts.

The trick, of course, was figuring out which visitors were mobile and which ones weren’t. Setting an environment variable wouldn’t be of much use if I was identifying the wrong visitors.

I ended up finding a great Apache regular expression at detectmobilebrowers.com. Although the regular expression had what I needed, they were using it to rewrite mobile visitors to another URL. I just wanted to set a variable, not redirect people. Here’s what I came up with instead.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (android|bb\d+|meego).+mobile|avantgo|bada\/|blackberry|blazer|compal|elaine|fennec|hiptop|iemobile|ip(hone|od)|iris|kindle|lge\ |maemo|midp|mmp|mobile.+firefox|netfront|opera\ m(ob|in)i|palm(\ os)?|phone|p(ixi|re)\/|plucker|pocket|psp|series(4|6)0|symbian|treo|up\.(browser|link)|vodafone|wap|windows\ ce|xda|xiino [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(1207|6310|6590|3gso|4thp|50[1-6]i|770s|802s|a\ wa|abac|ac(er|oo|s\-)|ai(ko|rn)|al(av|ca|co)|amoi|an(ex|ny|yw)|aptu|ar(ch|go)|as(te|us)|attw|au(di|\-m|r\ |s\ )|avan|be(ck|ll|nq)|bi(lb|rd)|bl(ac|az)|br(e|v)w|bumb|bw\-(n|u)|c55\/|capi|ccwa|cdm\-|cell|chtm|cldc|cmd\-|co(mp|nd)|craw|da(it|ll|ng)|dbte|dc\-s|devi|dica|dmob|do(c|p)o|ds(12|\-d)|el(49|ai)|em(l2|ul)|er(ic|k0)|esl8|ez([4-7]0|os|wa|ze)|fetc|fly(\-|_)|g1\ u|g560|gene|gf\-5|g\-mo|go(\.w|od)|gr(ad|un)|haie|hcit|hd\-(m|p|t)|hei\-|hi(pt|ta)|hp(\ i|ip)|hs\-c|ht(c(\-|\ |_|a|g|p|s|t)|tp)|hu(aw|tc)|i\-(20|go|ma)|i230|iac(\ |\-|\/)|ibro|idea|ig01|ikom|im1k|inno|ipaq|iris|ja(t|v)a|jbro|jemu|jigs|kddi|keji|kgt(\ |\/)|klon|kpt\ |kwc\-|kyo(c|k)|le(no|xi)|lg(\ g|\/(k|l|u)|50|54|\-[a-w])|libw|lynx|m1\-w|m3ga|m50\/|ma(te|ui|xo)|mc(01|21|ca)|m\-cr|me(rc|ri)|mi(o8|oa|ts)|mmef|mo(01|02|bi|de|do|t(\-|\ |o|v)|zz)|mt(50|p1|v\ )|mwbp|mywa|n10[0-2]|n20[2-3]|n30(0|2)|n50(0|2|5)|n7(0(0|1)|10)|ne((c|m)\-|on|tf|wf|wg|wt)|nok(6|i)|nzph|o2im|op(ti|wv)|oran|owg1|p800|pan(a|d|t)|pdxg|pg(13|\-([1-8]|c))|phil|pire|pl(ay|uc)|pn\-2|po(ck|rt|se)|prox|psio|pt\-g|qa\-a|qc(07|12|21|32|60|\-[2-7]|i\-)|qtek|r380|r600|raks|rim9|ro(ve|zo)|s55\/|sa(ge|ma|mm|ms|ny|va)|sc(01|h\-|oo|p\-)|sdk\/|se(c(\-|0|1)|47|mc|nd|ri)|sgh\-|shar|sie(\-|m)|sk\-0|sl(45|id)|sm(al|ar|b3|it|t5)|so(ft|ny)|sp(01|h\-|v\-|v\ )|sy(01|mb)|t2(18|50)|t6(00|10|18)|ta(gt|lk)|tcl\-|tdg\-|tel(i|m)|tim\-|t\-mo|to(pl|sh)|ts(70|m\-|m3|m5)|tx\-9|up(\.b|g1|si)|utst|v400|v750|veri|vi(rg|te)|vk(40|5[0-3]|\-v)|vm40|voda|vulc|vx(52|53|60|61|70|80|81|83|85|98)|w3c(\-|\ )|webc|whit|wi(g\ |nc|nw)|wmlb|wonu|x700|yas\-|your|zeto|zte\-) [NC]
RewriteRule ^ - [E=MOBILE_TMP:1]
RequestHeader add MOBILEUSER %{MOBILE_TMP}e env=MOBILE_TMP

As you can see, the first three lines came from detectmobilebrowsers.com. The last two lines set the environment variable in a two-step process.

The mod_rewrite line that starts with RewriteRule sets a variable named MOBILE_TMP to ‘1’ if any of the regular expressions match. Unfortunately, Apache won’t pass that environment variable on to CGI scripts. mod_header, however, can pass variables that it sets.

The last line uses this feature in mod_header to set another variable named HTTP_MOBILEUSER to the value of MOBILE_TMP if MOBILE_TMP is set. Note that the variable is called HTTP_MOBILEUSER, not MOBILEUSER. The RequestHeader directive will prepend the string ‘HTTP_’ to whatever environment variable you pick.

Most of our pages at hort.net use SSI, so it’s very easy for me to tell the system to exclude blocks of code for mobile users now. Here’s a sample SSI snippet that only includes the files if it’s a mobile user. If not, they’re excluded.

<!--#if expr="${HTTP_MOBILEUSER} != /1/" -->
   <!--#include virtual="/include/root-top.html"-->
   <!--#include virtual="/include/vertical-spacer8.html"-->
   <!--#include virtual="/include/root-left.html"-->
<!--#endif -->

Another example sets the background color of a page to green for mobile users, but white for desktop users:

<!--#if expr="${HTTP_MOBILEUSER} != /1/" -->
<body bgcolor="#FFFFFF" vlink="#668F66" link="#000000">
<!--#else -->
<body bgcolor="#004400" vlink="#668F66" link="#000000">
<!--#endif -->

Of course, it would be ideal to use CSS for these, but when you have 800,000 legacy pages dating back nearly twenty years this is much simpler!

Sometimes our Perl scripts also had stylistic changes for mobile users too. In this case, we choose a different image for mobile visitors:

if ($ENV{'HTTP_MOBILEUSER'} == 1) {
   $imagedir = "/mnt/WWW/hort.net/images/gallery/mobile";
} else {
   $imagedir = "/mnt/WWW/hort.net/images/gallery/desktop";
}

Or:

  $imagedir = sprintf(qq(/mnt/WWW/hort.net/images/gallery/%s),
     ($ENV{'HTTP_MOBILEUSER'} == 1) ? "mobile" : "desktop"
  );

Making these changes ended up being easier than I thought, and for that I’m grateful. There are still some tweaks to make to hort.net, and we still need to add some mobile navigation options to our pages. But for now we’ll continue to show up in the search rankings, our traffic won’t disappear, and most importantly, people will be able to use our pages on mobile devices.

Who are the top plant patent holders?

Posted on February 2, 2015February 3, 2015 by lindsey

So far we’ve queried our plant database to find out who the patent examiners are and which plants are most prolific in the patent system. Today we’re going to look at the top inventors and patent holders on granted patents.

First, let’s see who the top 20 plant patent holders are worldwide:

mysql> SELECT COUNT(application_applicant.entity_id) AS total, entity_name, entity_city, entity_country FROM application_applicant LEFT JOIN entity ON entity.entity_id = application_applicant.entity_id LEFT JOIN patent ON patent.application_id = application_applicant.application_id WHERE issue_date IS NOT NULL GROUP BY entity_name ORDER BY total DESC LIMIT 20;
+-------+-----------------------+------------------------+----------------+
| total | entity_name           | entity_city            | entity_country |
+-------+-----------------------+------------------------+----------------+
|   207 | Dümmen, Tobias        | Rheinberg              | DE             |
|   192 | Klemm, Nils           | Stuttgart              | DE             |
|   140 | Bergman, Wendy R.     | Lehigh Acres           | US             |
|   136 | Smith, Mark A.        | Fort Myers             | US             |
|   115 | Korlipara, Harini     | Canby                  | US             |
|   113 | Danziger, Gabriel     | Moshav Nir-Zvi         | IL             |
|   110 | Bradford, Lowell Glen | Le Grand               | US             |
|   110 | Egger, Janet N.       | Wilsonville            | US             |
|   106 | Olesen, Mogens N.     | Fredensborg            | DK             |
|   102 | Zaiger, Gary Neil     | Modesto                | US             |
|   102 | Gardner, Leith Marie  | Modesto                | US             |
|   102 | Zaiger, Grant Gene    | Modesto                | US             |
|    98 | Kordes, Tim-Hermann   | Offenseth-Sparrieshoop | DE             |
|    93 | Pieters, Dirk         | Oostnieuwkerke         | BE             |
|    88 | Sakazaki, Ushio       | Shiga                  | JP             |
|    79 | Trees, Scott C.       | Shell Beach            | US             |
|    77 | Kanaya, Takeshi       | Shiga                  | JP             |
|    76 | Meilland, Alain A.    | Antibes                | FR             |
|    70 | Maillard, Laurence    | Elne                   | FR             |
|    70 | Maillard, Arsene      | Elne                   | FR             |
+-------+-----------------------+------------------------+----------------+
20 rows in set (0.14 sec)

Germany is heavily represented by Tobias Dümmen (who seems to focus on Petunia, Pelargonium, Calibrachoa, Verbena, poinsettia, and a handful of other annuals) and Nils Klemm (who focuses on pretty much the same thing). Mr. Dümmen hasn’t had a patent granted on anything filed since he merged with Agribio in 2013. Mr. Klemm is affiliated with Selecta
(which is partnered with Ball), but also doesn’t have anything granted on a patent filed since September, 2012.

Suprisingly, many of the top patent holders in the US patent system above are not from the US.

mysql> SELECT COUNT(entity_country) AS total, countryname, population FROM entity LEFT JOIN application_applicant ON entity.entity_id = application_applicant.entity_id LEFT JOIN geonames.country ON entity.entity_country = geonames.country.countrycode LEFT JOIN patent ON patent.application_id = application_applicant.application_id WHERE issue_date IS NOT NULL GROUP BY entity_country ORDER BY total DESC limit 10;
+-------+----------------+------------+
| total | countryname    | population |
+-------+----------------+------------+
|  4676 | United States  |  303824000 |
|  1926 | Netherlands    |   16645000 |
|  1366 | Germany        |   82369000 |
|   931 | Japan          |  127288000 |
|   508 | United Kingdom |   60943000 |
|   476 | Denmark        |    5484000 |
|   409 | Australia      |   20600000 |
|   355 | France         |   64094000 |
|   305 | Israel         |    6500000 |
|   217 | New Zealand    |    4154000 |
+-------+----------------+------------+
10 rows in set (0.22 sec)

As you can see, the majority of granted plant patents are assigned to US residents. Netherlands, Germany, and Japan are high on the list too, and Netherlands has greatest number of patents per capita (at about 29 patents for every 250,000 people).

So who are the top US granted patent applicants?

mysql> SELECT COUNT(application_applicant.entity_id) AS total, entity_name, entity_city, entity_country FROM application_applicant LEFT JOIN entity ON entity.entity_id = application_applicant.entity_id LEFT JOIN patent ON patent.application_id = application_applicant.application_id WHERE issue_date IS NOT NULL AND entity_country = "US" GROUP BY entity_name ORDER BY total DESC LIMIT 20;
+-------+-------------------------+---------------+----------------+
| total | entity_name             | entity_city   | entity_country |
+-------+-------------------------+---------------+----------------+
|   140 | Bergman, Wendy R.       | Lehigh Acres  | US             |
|   136 | Smith, Mark A.          | Fort Myers    | US             |
|   115 | Korlipara, Harini       | Canby         | US             |
|   110 | Bradford, Lowell Glen   | Le Grand      | US             |
|   110 | Egger, Janet N.         | Wilsonville   | US             |
|   102 | Zaiger, Gary Neil       | Modesto       | US             |
|   102 | Gardner, Leith Marie    | Modesto       | US             |
|   102 | Zaiger, Grant Gene      | Modesto       | US             |
|    79 | Trees, Scott C.         | Shell Beach   | US             |
|    57 | Hanes, Mitchell         | Morgan Hill   | US             |
|    50 | Kobayashi, Ruth         | Carlsbad      | US             |
|    44 | Dupont, Sr., Robert J.  | Plaquemine    | US             |
|    40 | Zary, Keith W.          | Thousand Oaks | US             |
|    39 | Carruth, Thomas F.      | Altadena      | US             |
|    38 | Wood, Timothy D.        | Spring Lake   | US             |
|    34 | Vandenberg, Cornelis P. | Fort Myers    | US             |
|    32 | Hanes, Mitchell E.      | Morgan Hill   | US             |
|    30 | Lim, Peter P.           | Yamhill       | US             |
|    30 | Ren, Jianping           | Geneva        | US             |
|    29 | Lyrene, Paul M.         | Micanopy      | US             |
+-------+-------------------------+---------------+----------------+
20 rows in set (0.07 sec)

So there’s our list. Some quick Google searches give us background and affiliations for the top listings:

Bergman, Wendy R.

Syngenta

Aris

Hibiscus

Chrysanthemums

Korlipara, Harini

Terra Nova

Echinacea

Coreopsis

Bradford, Lowell Glen

Fruit hybridizer, specializing in cherries, nectarines, plums, peaches…

Egger, Janet N.

Terra Nova

Heuchera

Heucherella

Zaiger, Gary Neil

Inventor of pluot, works with nectarines, peaches, plums, almonds. Father of Grant Zaiger and Leith Marie Gardner.

Gardner, Leith Marie

Daughter of Gary Neil Zaiger (above), with similar patents.

Zaiger, Grant Gene

Son of Gary Neil Zaiger (above), with similar patents.

Trees, Scott

Ball Horticultural Company

PanAmerican Seed Company

Lantana

Pelargonium

Verbena

Angelonia

Based on the totals above, Terra Nova Nurseries probably has the largest ornamental plant patent portfolio for US patent holders, and the Zaiger family has the largest number of plant patents on food crops.

What do you want to know about plant patents? Let us know in the comments and we’ll look it up. Or maybe we can make a tool that you can use regularly.

Which plants are most represented by patents?

Posted on January 20, 2015June 30, 2015 by lindsey

As we looked at the different plant patents that have been granted, we started to wonder which genera are most represented.

This was pretty easy to do with a simple database query; we get a total for the first word (genus) of all plant names that start with the letters ‘A’ through ‘Z’ (which eliminates greges and the like).

mysql> SELECT COUNT(sci_name) AS total, SUBSTRING_INDEX(sci_name, ' ', 1) AS name FROM patent WHERE sci_name REGEXP '^[A-Z]' GROUP BY name ORDER BY total DESC LIMIT 50;
+-------+---------------+
| total | name          |
+-------+---------------+
|   742 | Rosa          |
|   613 | Chrysanthemum |
|   433 | Prunus        |
|   420 | Pelargonium   |
|   323 | Impatiens     |
|   317 | Petunia       |
|   265 | Calibrachoa   |
|   226 | Osteospermum  |
|   200 | Verbena       |
|   173 | Euphorbia     |
|   165 | Hydrangea     |
|   140 | Hibiscus      |
|   134 | Fragaria      |
|   131 | Begonia       |
|   130 | Echinacea     |
|   125 | Malus         |
|   121 | Dianthus      |
|   118 | Dahlia        |
|   115 | Phlox         |
|   109 | Alstroemeria  |
|   108 | Heuchera      |
|   108 | Vitis         |
|   100 | Kalanchoe     |
|    97 | Vaccinium     |
|    95 | Mandevilla    |
|    92 | Nemesia       |
|    81 | Clematis      |
|    78 | Rhododendron  |
|    70 | Argyranthemum |
|    69 | Sedum         |
|    68 | Lobelia       |
|    68 | Coreopsis     |
|    66 | Hosta         |
|    65 | Rubus         |
|    63 | Lantana       |
|    62 | Lavandula     |
|    61 | Salvia        |
|    60 | Diascia       |
|    57 | Helleborus    |
|    55 | Phalaenopsis  |
|    55 | Campanula     |
|    53 | Angelonia     |
|    52 | Aster         |
|    49 | Lilium        |
|    49 | Sutera        |
|    47 | Veronica      |
|    41 | Torenia       |
|    40 | Gerbera       |
|    38 | Citrus        |
|    38 | Caladium      |
+-------+---------------+
50 rows in set (0.02 sec)

So, there you have it. Those are the 50 most popular genera to receive patents in the last decade.

What do you want to know about plant patents? Let us know in the comments and we’ll look it up. Or maybe we can make a tool that you can use regularly.

Meet your plant patent examiners

Posted on January 17, 2015May 9, 2019 by lindsey

We’ve added nearly 10,000 patents to our plant patent database at hort.net going back through 2005, and we thought that it would be fun to look at the patent examiners. How many are there? Which ones are busiest? This won’t reveal anything new, but we’re hoping it offers insights into what can be done with the data and you’ll give feedback about what you’d like to see.

If we query our database for unique primary patent examiners, this list is returned:

mysql> SELECT COUNT(application_examiner.entity_id) AS total,entity_name FROM application_examiner LEFT JOIN entity ON entity.entity_id = application_examiner.entity_id WHERE examiner_type="primary" GROUP BY entity_name ORDER BY total DESC; 
+-------+----------------------------+
| total | entity_name                |
+-------+----------------------------+
|  1922 | Bell, Kent                 |
|  1271 | Bell, Kent L               |
|  1163 | Para, Annette H            |
|  1061 | Hwu, June                  |
|  1048 | Grunberg, Anne Marie       |
|   705 | Para, Annette              |
|   594 | McCormick Ewoldt, Susan    |
|   415 | McCormick Ewoldt, Susan B  |
|   336 | Haas, Wendy C.             |
|   328 | Haas, Wendy C              |
|   218 | Bell, Kent L.              |
|   186 | Haas, Wendy                |
|   114 | Grunberg, Anne             |
|    98 | Locker, Howard J.          |
|    82 | Locker, Howard             |
|    64 | Campell, Bruce R.          |
|    31 | Para, Annette H.           |
|    30 | Ewoldt, Susan McCormick    |
|    21 | McCormick Ewoldt, S. B     |
|    18 | McCormick Ewoldt, S. B.    |
|    16 | Locker, Howard J           |
|     9 | Kruse, David H             |
|     6 | McCormick-Ewoldt, S. B.    |
|     2 | Grunberg, Ann Marie        |
|     2 | McCormick Ewoldt, Susan B. |
|     2 | Krawczewicz Myers, Louanne |
|     1 | Haas, W. C.                |
|     1 | Bell., Kent L.             |
|     1 | Grundberg, Anne Marie      |
|     1 | Para, Annetta H            |
|     1 | Hass, Wendy                |
|     1 | Bell, Kentl L              |
|     1 | Wu, June                   |
|     1 | Ball, Kent                 |
|     1 | Para, Annertte H           |
|     1 | Hwu, Jane                  |
|     1 | McCormick Edwoldt, Susan   |
|     1 | Helmer, Georgia            |
|     1 | Bell, Kenbt L              |
|     1 | Campell, Bruce R           |
|     1 | Ewoldt, S. B McCormick     |
|     1 | McCormick-Ewoldt, S. B     |
|     1 | McCormick, S. B.           |
|     1 | Grunbeg, Anne Marie        |
|     1 | Para, Annnette H           |
|     1 | Campbell, Bruce            |
|     1 | Campbell, Bruce R.         |
|     1 | Hass, Wendy C.             |
|     1 | McCormick-Ewoldt, Susan B  |
|     1 | Campell, Btuce R.          |
|     1 | Grunsberg, Anne Marie      |
|     1 | McCormack Ewoldt, Susan B  |
|     1 | McCormick Ewoldt, Sysan B  |
+-------+----------------------------+
53 rows in set (0.03 sec)

Unfortunately, the list doesn’t deal too well with the misspellings and variations that we talked about in our last post.

If we only group patent examiners based on the first four characters of their last name (which wouldn’t always work, but will in this case) we come up with a much shorter list:

mysql> select count(application_examiner.entity_id) as total,left(entity_name,4) as shortname, entity_name from application_examiner left join entity on entity.entity_id = application_examiner.entity_id where examiner_type="primary" group by shortname order by total desc;
+-------+-----------+----------------------------+
| total | shortname | entity_name                |
+-------+-----------+----------------------------+
|  3414 | Bell      | Bell, Kent                 |
|  1902 | Para      | Para, Annette H            |
|  1167 | Grun      | Grunberg, Anne Marie       |
|  1062 | McCo      | McCormick Ewoldt, Susan    |
|  1062 | Hwu,      | Hwu, June                  |
|   851 | Haas      | Haas, Wendy C.             |
|   196 | Lock      | Locker, Howard J.          |
|    68 | Camp      | Campell, Bruce R.          |
|    31 | Ewol      | Ewoldt, S. B McCormick     |
|     9 | Krus      | Kruse, David H             |
|     2 | Kraw      | Krawczewicz Myers, Louanne |
|     2 | Hass      | Hass, Wendy                |
|     1 | Ball      | Ball, Kent                 |
|     1 | Helm      | Helmer, Georgia            |
|     1 | Wu,       | Wu, June                   |
+-------+-----------+----------------------------+
15 rows in set (0.01 sec)

It’s still not perfect, though. You can see that Kent Bell was once listed as Kent Ball, Wendy Haas was Wendy Hass twice, and June Hwu was once listed as June Wu. If we manually adjust the list, we come up with this:

+-------+-----------+----------------------------+
| total | shortname | entity_name                |
+-------+-----------+----------------------------+
|  3415 | Bell      | Bell, Kent                 |
|  1902 | Para      | Para, Annette H            |
|  1167 | Grun      | Grunberg, Anne Marie       |
|  1093 | McCo      | McCormick Ewoldt, Susan    |
|  1063 | Hwu,      | Hwu, June                  |
|   853 | Haas      | Haas, Wendy C.             |
|   196 | Lock      | Locker, Howard J.          |
|    68 | Camp      | Campell, Bruce R.          |
|     9 | Krus      | Kruse, David H             |
|     2 | Kraw      | Krawczewicz Myers, Louanne |
|     1 | Helm      | Helmer, Georgia            |
+-------+-----------+----------------------------+
11 rows in set (0.01 sec)

Kent Bell has granted a whopping 34.9% of plant patents in the past decade, followed by 19.4% by Annette H. Para, followed by roughly 11% each for Anne Marie Grunberg, Susan McCormick Ewoldy, and June Hwu. Wendy Haas rounds out the top six with 8.7% of plant patent grants.

There aren’t nearly as many examiners as we expected, and they’re busy. They don’t only handle plant patents, either.

So, there you go! Those are the primary plant patent examiners at the USPTO over the past ten years. Now we need to edit our database upload scripts to consolidate the names automatically.

Fun parsing plant patents

Posted on January 16, 2015January 16, 2015 by lindsey

As we mentioned in our non-tech blog, hort.net has been processing plant patents to try to find and understand commonalities and trends in the industry. Besides that, it’s fun.

Well, sort of fun. It would be a lot more fun if the patent data was consistent.

The United States Patent and Trademark Office (USPTO) releases patent data in a machine-parseable format called XML. The layout of the XML is specified by a Document Type Definition (DTD), but these change regularly. Normally these changes are minor, but sometimes they’re subtle and more major.

For example, consider this bit of XML:

<parties>
   <applicants>
      <applicant sequence="001" app-type="applicant-inventor" designation="us-only">
         <addressbook>
            <last-name>Hofmann</last-name>
            <first-name>Birgit Christa</first-name>
         </addressbook>
      </applicant>
   </applicants>
</parties>

It later became this:

<us-parties>
   <us-applicants>
      <us-applicant sequence="001" app-type="applicant-inventor" designation="us-only">
         <addressbook>
            <last-name>Hofmann</last-name>
            <first-name>Birgit Christa</first-name>
         </addressbook>
      </us-applicant>
   </us-applicants>
</us-parties>

Can you spot the difference? At one point ‘us-‘ was prepended to the parties, applicants, and applicant XML tags, so any code that was loading things into the database was suddenly coming up empty.

It’s not a big deal, and things change over time, but it would be nice if the USPTO just converted all of their old documents to the new DTD.

Other issues are less technical and more policy-driven. Consider the case of patent examiner Susan B. McCormick-Ewoldt. Of the patents we’ve processed so far, she examined nineteen different plant patents that were granted, and on all nineteen her name appears differently. At least, we assume it’s the same person. Here are the variations:

mysql> SELECT entity_name, patent_id, sci_name FROM application_examiner LEFT JOIN entity ON application_examiner.entity_id = entity.entity_id LEFT JOIN patent ON application_examiner.application_id = patent.application_id WHERE entity_name LIKE "mccor%";
+----------------------------+-----------+-------------------------------------------------+
| entity_name                | patent_id | sci_name                                        |
+----------------------------+-----------+-------------------------------------------------+
| McCormick, Susan B.        | 15460     | Impatiens hawkeri 'Fisnics Sweet Red'           |
| McCormick-Ewoldt, S. B.    | 15488     | Prunus persica var. nucipersica 'GBN-One'       |
| McCormick-Ewoldt, Susan B. | 16270     | Malus pumila 'Fugachee Fuji'                    |
| McCormick-Ewoldt, S B      | 15496     | Prunus persica 'Calara'                         |
| McCormick-Ewoldt, S B.     | 15794     | Rosa hybrida 'POULac007'                        |
| McCormick, S. B.           | 17451     | Chrysanthemum x morifolium 'Elegant Yomarjorie' |
| McCormick Ewoldt, S. B.    | 19011     | Baptisia x variicolor 'Twilite'                 |
| McCormick, S. B            | 19098     | Phlox hybrida 'USPHL03M'                        |
| McCormick/Ewoldt, S. B.    | 18988     | Lobelia erinus 'Balwalila'                      |
| McCormick Ewoldt, S. B     | 19664     | Styrax japonicus 'Fragrant Fountain'            |
| McCormick Ewoldt, S B      | 19933     | Scoparia hybrid 'USSCO401-3'                    |
| McCormick-Ewoldt, S. B     | 20130     | Begonia x hiemalis 'Binos Pinky White'          |
| McCormick Ewoldt, Susan B  | 20634     | Cordyline australis 'Sunrise'                   |
| McCormick Ewoldt, Susan B. | 20113     | Petunia hybrid 'KLEPH07140'                     |
| McCormick-Ewoldt, Susan B  | 20626     | Penstemon hartwegii benth 'Peni Vio09'          |
| McCormack Ewoldt, Susan B  | 20901     | Pelargonium x hortorum 'Pacneon'                |
| McCormick Ewoldt, Sysan B  | 20809     | Geranium x cantabrigiense 'ABPP'                |
| McCormick Edwoldt, Susan   | 22972     | Rosa hybrida  'AUStobias'                       |
| McCormick Ewoldt, Susan    | 25207     | Mandevilla hybrida 'Sunparaoros'                |
+----------------------------+-----------+-------------------------------------------------+
19 rows in set (0.00 sec)

We have initials, ‘Sysan’ (presumably a typo), ‘Susan’, ‘Susan’ with initials, ‘McCormack’ (presumably a typo), a hyphenated last name, a last name with a slash, and a last name with two words. It becomes very difficult to automate processing of patents if there’s no consistency in values, and that’s something USPTO will have to deal with procedurally.

Fixing Mail::ClamAV to work with >= clamav-0.98.4

Posted on October 28, 2014February 3, 2015 by lindsey

The latest version of clamav relies on OpenSSL, but libclamav doesn’t automatically intialize that connection. This patch we threw together for Mail-ClamAV-0.29 fixes the problem by calling cl_initialize_crypto() first.

*** ClamAV.pm.orig      2014-10-28 16:27:30.000000000 -0500
--- ClamAV.pm   2014-10-28 16:26:48.000000000 -0500
***************
*** 205,210 ****
--- 205,215 ----
      if (stat(path, &st) != 0)
          croak("%s does not exist: %s\n", path, strerror(errno));
  
+     if ((status = cl_initialize_crypto()) != CL_SUCCESS) { 
+        error(status);
+        return &PL_sv_undef; 
+     } 
+ 
      if ((status = cl_init(CL_INIT_DEFAULT)) != CL_SUCCESS) {
          error(status);
          return &PL_sv_undef;