{"id":164,"date":"2015-01-16T17:53:09","date_gmt":"2015-01-16T17:53:09","guid":{"rendered":"https:\/\/www.hort.net\/techblog\/?p=164"},"modified":"2015-01-16T18:16:13","modified_gmt":"2015-01-16T18:16:13","slug":"fun-reading-plant-patents","status":"publish","type":"post","link":"https:\/\/www.hort.net\/techblog\/2015\/01\/16\/fun-reading-plant-patents\/","title":{"rendered":"Fun parsing plant patents"},"content":{"rendered":"<div class=\"share_buttons_simple_use_buttons\" style=\"padding: 10px 0; display: inline-block\"><div class=\"tweet_button\" style=\"float: left; vertical-align: top\"><a href=\"https:\/\/twitter.com\/share\" class=\"twitter-share-button\" data-url=\"https:\/\/www.hort.net\/techblog\/2015\/01\/16\/fun-reading-plant-patents\/\" data-text=\"Fun parsing plant patents\" data-count=\"none\">Tweet<\/a><script type=\"text\/javascript\" src=\"https:\/\/platform.twitter.com\/widgets.js\"><\/script><\/div><div class=\"facebook_like_button\" style=\"float: left; vertical-align: top; margin-left: 10px; max-width: 255px\"><iframe src=\"https:\/\/www.facebook.com\/plugins\/like.php?href=https%3A%2F%2Fwww.hort.net%2Ftechblog%2F2015%2F01%2F16%2Ffun-reading-plant-patents%2F&amp;layout=button_count&amp;show_faces=false&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=25\" scrolling=\"no\" frameborder=\"0\" style=\"border:none; overflow:hidden; width:450px; height:25px;\" allowTransparency=\"true\"><\/iframe><\/div><\/div><p>As we <a href=\"https:\/\/www.hort.net\/blog\/2014\/12\/05\/announcing-patent-tracker\/\" target=\"_blank\">mentioned in our non-tech blog<\/a>, hort.net has been processing plant patents to try to find and understand commonalities and trends in the industry.  Besides that, it&#8217;s fun.<\/p>\n<p>Well, sort of fun.  It would be a lot more fun if the patent data was consistent.<\/p>\n<p>The United States Patent and Trademark Office (USPTO) releases patent data in a machine-parseable format called XML.  The layout of the XML is specified by a Document Type Definition (DTD), but these change regularly.  Normally these changes are minor, but sometimes they&#8217;re subtle and more major.<\/p>\n<p>For example, consider this bit of XML:<\/p>\n<div class=\"horttech-code\"><pre class=\"preserve-code-formatting\">\n&lt;parties&gt;\n&nbsp;&nbsp; &lt;applicants&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;applicant sequence=&quot;001&quot; app-type=&quot;applicant-inventor&quot; designation=&quot;us-only&quot;&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;addressbook&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;last-name&gt;Hofmann&lt;\/last-name&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;first-name&gt;Birgit Christa&lt;\/first-name&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;\/addressbook&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/applicant&gt;\n&nbsp;&nbsp; &lt;\/applicants&gt;\n&lt;\/parties&gt;\n<\/pre><\/div>\n<p>It later became this:<\/p>\n<div class=\"horttech-code\"><pre class=\"preserve-code-formatting\">\n&lt;us-parties&gt;\n&nbsp;&nbsp; &lt;us-applicants&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;us-applicant sequence=&quot;001&quot; app-type=&quot;applicant-inventor&quot; designation=&quot;us-only&quot;&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;addressbook&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;last-name&gt;Hofmann&lt;\/last-name&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;first-name&gt;Birgit Christa&lt;\/first-name&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;\/addressbook&gt;\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/us-applicant&gt;\n&nbsp;&nbsp; &lt;\/us-applicants&gt;\n&lt;\/us-parties&gt;\n<\/pre><\/div>\n<p>Can you spot the difference?  At one point &#8216;us-&#8216; was prepended to the parties, applicants, and applicant XML tags, so any code that was loading things into the database was suddenly coming up empty.<\/p>\n<p>It&#8217;s not a big deal, and things change over time, but it would be nice if the USPTO just converted all of their old documents to the new DTD.  <\/p>\n<p>Other issues are less technical and more policy-driven.  Consider the case of patent examiner Susan B. McCormick-Ewoldt.  Of the patents we&#8217;ve processed so far, she examined nineteen different plant patents that were granted, and on all nineteen her name appears differently.  At least, we assume it&#8217;s the same person.  Here are the variations:<\/p>\n<div class=\"horttech-code\"><pre class=\"preserve-code-formatting\">\nmysql&gt; SELECT entity_name, patent_id, sci_name FROM application_examiner LEFT JOIN entity ON application_examiner.entity_id = entity.entity_id LEFT JOIN patent ON application_examiner.application_id = patent.application_id WHERE entity_name LIKE &quot;mccor%&quot;;\n+----------------------------+-----------+-------------------------------------------------+\n| entity_name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| patent_id | sci_name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n+----------------------------+-----------+-------------------------------------------------+\n| McCormick, Susan B.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| 15460&nbsp;&nbsp;&nbsp;&nbsp; | Impatiens hawkeri &#039;Fisnics Sweet Red&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick-Ewoldt, S. B.&nbsp;&nbsp;&nbsp;&nbsp;| 15488&nbsp;&nbsp;&nbsp;&nbsp; | Prunus persica var. nucipersica &#039;GBN-One&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick-Ewoldt, Susan B. | 16270&nbsp;&nbsp;&nbsp;&nbsp; | Malus pumila &#039;Fugachee Fuji&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick-Ewoldt, S B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| 15496&nbsp;&nbsp;&nbsp;&nbsp; | Prunus persica &#039;Calara&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick-Ewoldt, S B.&nbsp;&nbsp;&nbsp;&nbsp; | 15794&nbsp;&nbsp;&nbsp;&nbsp; | Rosa hybrida &#039;POULac007&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick, S. B.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | 17451&nbsp;&nbsp;&nbsp;&nbsp; | Chrysanthemum x morifolium &#039;Elegant Yomarjorie&#039; |\n| McCormick Ewoldt, S. B.&nbsp;&nbsp;&nbsp;&nbsp;| 19011&nbsp;&nbsp;&nbsp;&nbsp; | Baptisia x variicolor &#039;Twilite&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick, S. B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| 19098&nbsp;&nbsp;&nbsp;&nbsp; | Phlox hybrida &#039;USPHL03M&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick\/Ewoldt, S. B.&nbsp;&nbsp;&nbsp;&nbsp;| 18988&nbsp;&nbsp;&nbsp;&nbsp; | Lobelia erinus &#039;Balwalila&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick Ewoldt, S. B&nbsp;&nbsp;&nbsp;&nbsp; | 19664&nbsp;&nbsp;&nbsp;&nbsp; | Styrax japonicus &#039;Fragrant Fountain&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick Ewoldt, S B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| 19933&nbsp;&nbsp;&nbsp;&nbsp; | Scoparia hybrid &#039;USSCO401-3&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick-Ewoldt, S. B&nbsp;&nbsp;&nbsp;&nbsp; | 20130&nbsp;&nbsp;&nbsp;&nbsp; | Begonia x hiemalis &#039;Binos Pinky White&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick Ewoldt, Susan B&nbsp;&nbsp;| 20634&nbsp;&nbsp;&nbsp;&nbsp; | Cordyline australis &#039;Sunrise&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick Ewoldt, Susan B. | 20113&nbsp;&nbsp;&nbsp;&nbsp; | Petunia hybrid &#039;KLEPH07140&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick-Ewoldt, Susan B&nbsp;&nbsp;| 20626&nbsp;&nbsp;&nbsp;&nbsp; | Penstemon hartwegii benth &#039;Peni Vio09&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormack Ewoldt, Susan B&nbsp;&nbsp;| 20901&nbsp;&nbsp;&nbsp;&nbsp; | Pelargonium x hortorum &#039;Pacneon&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick Ewoldt, Sysan B&nbsp;&nbsp;| 20809&nbsp;&nbsp;&nbsp;&nbsp; | Geranium x cantabrigiense &#039;ABPP&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n| McCormick Edwoldt, Susan&nbsp;&nbsp; | 22972&nbsp;&nbsp;&nbsp;&nbsp; | Rosa hybrida&nbsp;&nbsp;&#039;AUStobias&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |\n| McCormick Ewoldt, Susan&nbsp;&nbsp;&nbsp;&nbsp;| 25207&nbsp;&nbsp;&nbsp;&nbsp; | Mandevilla hybrida &#039;Sunparaoros&#039;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|\n+----------------------------+-----------+-------------------------------------------------+\n19 rows in set (0.00 sec)\n<\/pre><\/div>\n<p>We have initials, &#8216;Sysan&#8217; (presumably a typo), &#8216;Susan&#8217;, &#8216;Susan&#8217; with initials, &#8216;McCormack&#8217; (presumably a typo), a hyphenated last name, a last name with a slash, and a last name with two words.  It becomes very difficult to automate processing of patents if there&#8217;s no consistency in values, and that&#8217;s something USPTO will have to deal with procedurally.<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"share_buttons_simple_use_buttons\" style=\"padding: 10px 0; display: inline-block\"><div class=\"tweet_button\" style=\"float: left; vertical-align: top\"><a href=\"https:\/\/twitter.com\/share\" class=\"twitter-share-button\" data-url=\"https:\/\/www.hort.net\/techblog\/2015\/01\/16\/fun-reading-plant-patents\/\" data-text=\"Fun parsing plant patents\" data-count=\"none\">Tweet<\/a><script type=\"text\/javascript\" src=\"https:\/\/platform.twitter.com\/widgets.js\"><\/script><\/div><div class=\"facebook_like_button\" style=\"float: left; vertical-align: top; margin-left: 10px; max-width: 255px\"><iframe src=\"https:\/\/www.facebook.com\/plugins\/like.php?href=https%3A%2F%2Fwww.hort.net%2Ftechblog%2F2015%2F01%2F16%2Ffun-reading-plant-patents%2F&amp;layout=button_count&amp;show_faces=false&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=25\" scrolling=\"no\" frameborder=\"0\" style=\"border:none; overflow:hidden; width:450px; height:25px;\" allowTransparency=\"true\"><\/iframe><\/div><\/div><p>TweetAs we mentioned in our non-tech blog, hort.net has been processing plant patents to try to find and understand commonalities and trends in the industry. Besides that, it&#8217;s fun. Well, sort of fun. It would be a lot more fun if the patent data was consistent. The United States Patent and Trademark Office (USPTO) releases [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,11,1],"tags":[12,14,13],"class_list":["post-164","post","type-post","status-publish","format-standard","hentry","category-databases","category-patents","category-uncategorized","tag-patent","tag-plant","tag-uspto"],"_links":{"self":[{"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/posts\/164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/comments?post=164"}],"version-history":[{"count":7,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/posts\/164\/revisions"}],"predecessor-version":[{"id":172,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/posts\/164\/revisions\/172"}],"wp:attachment":[{"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/media?parent=164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/categories?post=164"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hort.net\/techblog\/wp-json\/wp\/v2\/tags?post=164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}