Meet your plant patent examiners

We’ve added nearly 10,000 patents to our plant patent database at hort.net going back through 2005, and we thought that it would be fun to look at the patent examiners. How many are there? Which ones are busiest? This won’t reveal anything new, but we’re hoping it offers insights into what can be done with the data and you’ll give feedback about what you’d like to see.

If we query our database for unique primary patent examiners, this list is returned:

mysql> SELECT COUNT(application_examiner.entity_id) AS total,entity_name FROM application_examiner LEFT JOIN entity ON entity.entity_id = application_examiner.entity_id WHERE examiner_type="primary" GROUP BY entity_name ORDER BY total DESC; 
+-------+----------------------------+
| total | entity_name                |
+-------+----------------------------+
|  1922 | Bell, Kent                 |
|  1271 | Bell, Kent L               |
|  1163 | Para, Annette H            |
|  1061 | Hwu, June                  |
|  1048 | Grunberg, Anne Marie       |
|   705 | Para, Annette              |
|   594 | McCormick Ewoldt, Susan    |
|   415 | McCormick Ewoldt, Susan B  |
|   336 | Haas, Wendy C.             |
|   328 | Haas, Wendy C              |
|   218 | Bell, Kent L.              |
|   186 | Haas, Wendy                |
|   114 | Grunberg, Anne             |
|    98 | Locker, Howard J.          |
|    82 | Locker, Howard             |
|    64 | Campell, Bruce R.          |
|    31 | Para, Annette H.           |
|    30 | Ewoldt, Susan McCormick    |
|    21 | McCormick Ewoldt, S. B     |
|    18 | McCormick Ewoldt, S. B.    |
|    16 | Locker, Howard J           |
|     9 | Kruse, David H             |
|     6 | McCormick-Ewoldt, S. B.    |
|     2 | Grunberg, Ann Marie        |
|     2 | McCormick Ewoldt, Susan B. |
|     2 | Krawczewicz Myers, Louanne |
|     1 | Haas, W. C.                |
|     1 | Bell., Kent L.             |
|     1 | Grundberg, Anne Marie      |
|     1 | Para, Annetta H            |
|     1 | Hass, Wendy                |
|     1 | Bell, Kentl L              |
|     1 | Wu, June                   |
|     1 | Ball, Kent                 |
|     1 | Para, Annertte H           |
|     1 | Hwu, Jane                  |
|     1 | McCormick Edwoldt, Susan   |
|     1 | Helmer, Georgia            |
|     1 | Bell, Kenbt L              |
|     1 | Campell, Bruce R           |
|     1 | Ewoldt, S. B McCormick     |
|     1 | McCormick-Ewoldt, S. B     |
|     1 | McCormick, S. B.           |
|     1 | Grunbeg, Anne Marie        |
|     1 | Para, Annnette H           |
|     1 | Campbell, Bruce            |
|     1 | Campbell, Bruce R.         |
|     1 | Hass, Wendy C.             |
|     1 | McCormick-Ewoldt, Susan B  |
|     1 | Campell, Btuce R.          |
|     1 | Grunsberg, Anne Marie      |
|     1 | McCormack Ewoldt, Susan B  |
|     1 | McCormick Ewoldt, Sysan B  |
+-------+----------------------------+
53 rows in set (0.03 sec)

Unfortunately, the list doesn’t deal too well with the misspellings and variations that we talked about in our last post.

If we only group patent examiners based on the first four characters of their last name (which wouldn’t always work, but will in this case) we come up with a much shorter list:

mysql> select count(application_examiner.entity_id) as total,left(entity_name,4) as shortname, entity_name from application_examiner left join entity on entity.entity_id = application_examiner.entity_id where examiner_type="primary" group by shortname order by total desc;
+-------+-----------+----------------------------+
| total | shortname | entity_name                |
+-------+-----------+----------------------------+
|  3414 | Bell      | Bell, Kent                 |
|  1902 | Para      | Para, Annette H            |
|  1167 | Grun      | Grunberg, Anne Marie       |
|  1062 | McCo      | McCormick Ewoldt, Susan    |
|  1062 | Hwu,      | Hwu, June                  |
|   851 | Haas      | Haas, Wendy C.             |
|   196 | Lock      | Locker, Howard J.          |
|    68 | Camp      | Campell, Bruce R.          |
|    31 | Ewol      | Ewoldt, S. B McCormick     |
|     9 | Krus      | Kruse, David H             |
|     2 | Kraw      | Krawczewicz Myers, Louanne |
|     2 | Hass      | Hass, Wendy                |
|     1 | Ball      | Ball, Kent                 |
|     1 | Helm      | Helmer, Georgia            |
|     1 | Wu,       | Wu, June                   |
+-------+-----------+----------------------------+
15 rows in set (0.01 sec)

It’s still not perfect, though. You can see that Kent Bell was once listed as Kent Ball, Wendy Haas was Wendy Hass twice, and June Hwu was once listed as June Wu. If we manually adjust the list, we come up with this:

+-------+-----------+----------------------------+
| total | shortname | entity_name                |
+-------+-----------+----------------------------+
|  3415 | Bell      | Bell, Kent                 |
|  1902 | Para      | Para, Annette H            |
|  1167 | Grun      | Grunberg, Anne Marie       |
|  1093 | McCo      | McCormick Ewoldt, Susan    |
|  1063 | Hwu,      | Hwu, June                  |
|   853 | Haas      | Haas, Wendy C.             |
|   196 | Lock      | Locker, Howard J.          |
|    68 | Camp      | Campell, Bruce R.          |
|     9 | Krus      | Kruse, David H             |
|     2 | Kraw      | Krawczewicz Myers, Louanne |
|     1 | Helm      | Helmer, Georgia            |
+-------+-----------+----------------------------+
11 rows in set (0.01 sec)

Kent Bell has granted a whopping 34.9% of plant patents in the past decade, followed by 19.4% by Annette H. Para, followed by roughly 11% each for Anne Marie Grunberg, Susan McCormick Ewoldy, and June Hwu. Wendy Haas rounds out the top six with 8.7% of plant patent grants.

There aren’t nearly as many examiners as we expected, and they’re busy. They don’t only handle plant patents, either.

So, there you go! Those are the primary plant patent examiners at the USPTO over the past ten years. Now we need to edit our database upload scripts to consolidate the names automatically.

Fun parsing plant patents

As we mentioned in our non-tech blog, hort.net has been processing plant patents to try to find and understand commonalities and trends in the industry. Besides that, it’s fun.

Well, sort of fun. It would be a lot more fun if the patent data was consistent.

The United States Patent and Trademark Office (USPTO) releases patent data in a machine-parseable format called XML. The layout of the XML is specified by a Document Type Definition (DTD), but these change regularly. Normally these changes are minor, but sometimes they’re subtle and more major.

For example, consider this bit of XML:

<parties>
   <applicants>
      <applicant sequence="001" app-type="applicant-inventor" designation="us-only">
         <addressbook>
            <last-name>Hofmann</last-name>
            <first-name>Birgit Christa</first-name>
         </addressbook>
      </applicant>
   </applicants>
</parties>

It later became this:

<us-parties>
   <us-applicants>
      <us-applicant sequence="001" app-type="applicant-inventor" designation="us-only">
         <addressbook>
            <last-name>Hofmann</last-name>
            <first-name>Birgit Christa</first-name>
         </addressbook>
      </us-applicant>
   </us-applicants>
</us-parties>

Can you spot the difference? At one point ‘us-‘ was prepended to the parties, applicants, and applicant XML tags, so any code that was loading things into the database was suddenly coming up empty.

It’s not a big deal, and things change over time, but it would be nice if the USPTO just converted all of their old documents to the new DTD.

Other issues are less technical and more policy-driven. Consider the case of patent examiner Susan B. McCormick-Ewoldt. Of the patents we’ve processed so far, she examined nineteen different plant patents that were granted, and on all nineteen her name appears differently. At least, we assume it’s the same person. Here are the variations:

mysql> SELECT entity_name, patent_id, sci_name FROM application_examiner LEFT JOIN entity ON application_examiner.entity_id = entity.entity_id LEFT JOIN patent ON application_examiner.application_id = patent.application_id WHERE entity_name LIKE "mccor%";
+----------------------------+-----------+-------------------------------------------------+
| entity_name                | patent_id | sci_name                                        |
+----------------------------+-----------+-------------------------------------------------+
| McCormick, Susan B.        | 15460     | Impatiens hawkeri 'Fisnics Sweet Red'           |
| McCormick-Ewoldt, S. B.    | 15488     | Prunus persica var. nucipersica 'GBN-One'       |
| McCormick-Ewoldt, Susan B. | 16270     | Malus pumila 'Fugachee Fuji'                    |
| McCormick-Ewoldt, S B      | 15496     | Prunus persica 'Calara'                         |
| McCormick-Ewoldt, S B.     | 15794     | Rosa hybrida 'POULac007'                        |
| McCormick, S. B.           | 17451     | Chrysanthemum x morifolium 'Elegant Yomarjorie' |
| McCormick Ewoldt, S. B.    | 19011     | Baptisia x variicolor 'Twilite'                 |
| McCormick, S. B            | 19098     | Phlox hybrida 'USPHL03M'                        |
| McCormick/Ewoldt, S. B.    | 18988     | Lobelia erinus 'Balwalila'                      |
| McCormick Ewoldt, S. B     | 19664     | Styrax japonicus 'Fragrant Fountain'            |
| McCormick Ewoldt, S B      | 19933     | Scoparia hybrid 'USSCO401-3'                    |
| McCormick-Ewoldt, S. B     | 20130     | Begonia x hiemalis 'Binos Pinky White'          |
| McCormick Ewoldt, Susan B  | 20634     | Cordyline australis 'Sunrise'                   |
| McCormick Ewoldt, Susan B. | 20113     | Petunia hybrid 'KLEPH07140'                     |
| McCormick-Ewoldt, Susan B  | 20626     | Penstemon hartwegii benth 'Peni Vio09'          |
| McCormack Ewoldt, Susan B  | 20901     | Pelargonium x hortorum 'Pacneon'                |
| McCormick Ewoldt, Sysan B  | 20809     | Geranium x cantabrigiense 'ABPP'                |
| McCormick Edwoldt, Susan   | 22972     | Rosa hybrida  'AUStobias'                       |
| McCormick Ewoldt, Susan    | 25207     | Mandevilla hybrida 'Sunparaoros'                |
+----------------------------+-----------+-------------------------------------------------+
19 rows in set (0.00 sec)

We have initials, ‘Sysan’ (presumably a typo), ‘Susan’, ‘Susan’ with initials, ‘McCormack’ (presumably a typo), a hyphenated last name, a last name with a slash, and a last name with two words. It becomes very difficult to automate processing of patents if there’s no consistency in values, and that’s something USPTO will have to deal with procedurally.

Fixing Mail::ClamAV to work with >= clamav-0.98.4

The latest version of clamav relies on OpenSSL, but libclamav doesn’t automatically intialize that connection. This patch we threw together for Mail-ClamAV-0.29 fixes the problem by calling cl_initialize_crypto() first.

*** ClamAV.pm.orig      2014-10-28 16:27:30.000000000 -0500
--- ClamAV.pm   2014-10-28 16:26:48.000000000 -0500
***************
*** 205,210 ****
--- 205,215 ----
      if (stat(path, &st) != 0)
          croak("%s does not exist: %s\n", path, strerror(errno));
  
+     if ((status = cl_initialize_crypto()) != CL_SUCCESS) { 
+        error(status);
+        return &PL_sv_undef; 
+     } 
+ 
      if ((status = cl_init(CL_INIT_DEFAULT)) != CL_SUCCESS) {
          error(status);
          return &PL_sv_undef;