This is a public-interest archive. Personal data is pseudonymized and retained under GDPR Article 89.

Re: blogs and scrapers


On 25-Jan-08, at 3:42 PM, Lawrence F. London, Jr. wrote:

> Douglas Green wrote:
>
>> A while ago in the "blogging world", the topic of "scrapers" came up.
>> A bunch of blogs went to part feeds as a result of this in what I
>> think was a misguided sense that they could "protect" their content  
>> by
>> doing so.   (for the uninitiated, "scraping" is where automated
>> software links into your blog feed and puts the content of that feed
>> up on another website - it "scrapes" the content from your site and
>> puts it on its own.)
>
> Without dragging this out, it seems that you can either allow or  
> deny RSS feeds.
> So, if you allow them you have no control of who, where or for what  
> use
> your blog content is used by others in their own sites.


You have no electronic control of any of your site's contents at any  
time. Period.  I'm sure Chris Lindsey can comment further on this but  
robot scrapers can take site content from any site -blog or web.  If  
it's electronic and they can find it, they can scrape it/take it.
The control you have is with the DMCA and the ability to take down  
their material (always assuming you can find the owner etc etc - it's  
black and white in text but not always easy to accomplish)


>
>
> I asume this WP plugin at least guarantees equal exchange of  
> blogmatter
> and gives you the ID of the blog using your content so you can  
> contact them or complain, etc.


No - the plugin puts a bit of code on each feed from your blog that  
delivers a back link.  Period.   You have to use Google or Technorati  
or other search engine to find the sites using your material with this  
system.  If you don't use this system, you can use copyscape.com
>
>
>> Site content is not particularly "safe" with only a part feed.   Sad
>
> What is a "part" feed? I thought RSS feeds were or were not.

Nope.  With Wordpress, you have the choice of putting the entire  
article on your feed or only a small portion of it.  This is probably  
software dependent I note.

>
>
>> In other words, if your site is scraped, you get a link back from the
>> scraping site.  This works on two levels.  The first is that you get
>> an inbound link
>
> What if you don't want their content?

You're not getting their content.  All you're getting is an inbound  
link. And that is always good.  No. Google doesn't penalize you for  
the quality of your inbound links as you have no control over that (to  
forestall another common question) :-)
>
>
>> The second is that if you're really determined, you can find
>> and stomp scraper sites with dmca complaints by using a "links:your-
>> site " search on google or going to Technorati and finding inbounds.
>
> So, you can use this Google ccommand to discover who is using your  
> RSS feeds?


You can use google to discover damn near anything. LOL!


All the best

Doug


Douglas Green
Online Garden Publishing
Blog:  http://blog.douggreensgarden.com
Home: http://www.simplegiftsfarm.com



_______________________________________________
gardenwriters mailing list
gardenwriters@lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/gardenwriters

GWL has searchable archives at:
http://www.hort.net/lists/gardenwriters

Send photos for GWL to gwlphotos@hort.net to be posted
at: http://www.hort.net/lists/gwlphotos

Post gardening questions/threads to
"Gardenwriters on Gardening" <gwl-g@lists.ibiblio.org>

For GWL website and Wiki, go to
http://www.ibiblio.org/gardenwriters



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index