What is Web Scraping and How Is It Helpful?
The title above is a little bit misleading. Web scraping is generally thought of as a vile process engaged in by unscrupulous people who want to profit off of someone else’s hard work. However, it can also be helpful to your website and business depending on how it’s employed and when it is employed. Here’s what you need to know.
Just What Is Web Scraping Anyway?
Put simply, web scraping involves taking content from another website and republishing it on your own website. This can be done through a variety of means, some of them more underhanded than others:
You could do a manual cut and paste, taking the content from another website and pasting it up on your own website with your own name on it, offering no credit to the website where the material originally came from.
You could also grab someone’s RSS feed and publish it on your website in its entirety, thus at least giving credit to the originating website, though not giving your readers much reason to go and visit them (after all, if I can see all the content here, why go there?)
Finally, you could simply embed material (generally videos or graphics) from another website on your own site, sometimes giving credit and other times not doing so.
Is it Legal?
Now, is any of this legal you may be asking? The answer is, it depends. For example, web scraping from YouTube is perfectly legal and even an accepted practice. You might for example embed a video from them on your own site; the video still runs from their servers and features their ads, but you also get benefit by showing a video to your visitors.
Publishing an RSS feed gets stickier since you are then taking content from someone else in its entirety. On the other hand, they made the RSS feed publicly available so maybe they want it syndicated. If you want to make sure you’re on solid legal ground, you’d need to request permission from the owner of the content to do republish an RSS feed.
Obviously, taking content from someone else and republishing it as your own with no credit to the original owner is generally going to be looked at as a very big no no.
I personally will contact web hosts for companies that do that with my content, threatening to sue and demanding the site be taken down if my content isn’t removed. The one exception of course is when you buy PLR articles, in which case you simply won’t have very high SERP rankings since 200 other sites have the same stuff up.
By the way, for the record, if you want to republish any content I write here, you need to ask the owner of QuantumSEO Labs, Yasir Khan for permission and you should contact me as well. If it’s at a site I own personally, such as my personal finance blog, you just have to ask me.
When To Engage in Web Scraping
Google tends to frown on web scrapers since they don’t want to serve up duplicate content to their readership. Therefore, basing most or all of your website on someone else’s content is likely to get you delisted, even if you have their permission (exception to the rule: aggregator services such as Yahoo!’s “My Yahoo!” service).
However, occasional web scraping, when you know you’re on solid legal ground (for example, when you post a YouTube video with a comment about it or you put up an occasional article from an RSS feed with permission of the owner) can be helpful in filling out the content of your web site so that your visitors gain a richer experience.
After all, why recreate what someone else did so well and what they are willing to syndicate? Just make sure to add your own comments as well in order to avoid Google’s duplicate content filters and make sure to keep your web scraping to a minimum.
Web Scraping is a one programme or script written in any programming language(PHP, Java, .Net, Ajax, Javascript, ASP) that processes the html web pages of a target website to scrape/extract information or data for converting unstructured row data into structured format or structured records. Our Web Scraping scripts or tools will simulate a person viewing a website with a browser. With help of web scraping you can connect to a website’s html web pages and request a required information or a pages, exactly as a your browser would do. The web server will send back the html web page which you can then extract specific data from that web pages.
The thing I don’t like about such tools is that they grab content which you don’t have the right to use.
I completely disagree – if you put out your information on the Internet freely, then provided it either gives credit, or links back to the source, then I will scrape it and I certainly won’t waste my time getting permission.
There is absolutely no need, and if you don’t want it accessed, don’t put it out there. The responsibility is completely on the content provider to lock it off.
You should be honoured if I scrape you.
I’m not honored. I consider you a petty thief if you steal my content. But hey, that’s just me — I work hard and you make money from my work without payment.
Google panda update 2.1 & 2.2 removed all scarping sites
Not all. Most to be certain. In any event, scraping is generally not such a good idea, though content curation, which is similar has become very hot now.