One of the most common questions people have about data extraction practice is: Is web scraping legal?
There are terms associated with an ethical and unethical scraping of data from the web – “good bots” and “bad bots.”
These “good bots” enable, for example, to index web content for price comparison services and market researchers to measure engagement on social media. In comparison, “bad bots” mine information in order to use it for purposes that fall outside the information owner’s control.
It is estimated that about 10% of all web crawling bots are bad bots. They perform a variety of damaging activities, such as competitive data mining, denial of service attacks, account hijacking, online fraud, stealing of intellectual property, data theft, spam and digital ad fraud, unauthorized vulnerability scans, attacks on stock market, etc.
Newly formed businesses find it super profitable because it’s a cheap and powerful way to gather data. In comparison, big companies like using crawlers for their own gain but don’t want others to use bots against them.
Web scraping companies know that it is essential to understand how to use the information as much as where you got it from. You can mine data about business contacts, for example, in just a few minutes, allowing you to generate leads of potential customers. But this would violate the privacy of your potential clients. I’m sure you remember large spam email campaigns- these are just a small example of abusing scraped data.
So how can you leverage and balance this process from a legal standpoint? Let’s take a look at historical data.
Web Scraping Legal Issues and How Were They Solved
In 2000 eBay filed a preliminary injunction against Bidder’s Edge, which brought attention to scraping practice. Here eBay stated that the use of bots on the site, against the company’s will, broke trespass to Chattels law.
The court granted the injunction because users had to agree to the terms of service on the site. A large number of bots could be damaging to eBay’s computer systems. The lawsuit settled out of court, but the consequence of the process was setting the legal precedent.
Later, in 2001, a travel agency sued a competitor company which “scraped” prices from their website to create a competitive pricing system. The judge ruled that the site’s owner not welcoming this scraping was insufficient to make it “unauthorized access” for federal hacking laws.
Over the next years, the courts repeatedly ruled that merely putting the warning “do not scrape us” in your website terms of service is not enough. To cause web scraping legal issues, a user must explicitly agree or consent to the terms.
A few years later, one of the first copyright lawsuits against a web scraper was won by Facebook in 2009. This laid the foundations for multiple cases that link any data scraping with a clear breach of copyright and apparent monetary damages.
The decision of 2019 of the US Court of Appeals denied LinkedIn’s request to prevent analytics company HiQ from scraping its data. This was a historic moment in the data privacy and data regulation era. This decision showcases that using any publicly available and not copyrighted data can not cause any web scraping legal issues.
However, the decision does not grant web crawlers the possibility to use data gathered by scraping for any commercial purpose or the freedom to obtain data from sites that require authentication even from publicly available ecommerce websites. These sites have terms of service that usually forbid similar activity. But publicly available sites can not require a user to agree to any terms of service, so users are free to use web crawlers to collect data from the site.
Regulations Surrounding Illegal Scraping
In order to prevent illegal scraping, In 2016, Congress passed the Better Online Ticket Sales (BOTS) Act – the first legislation specifically targeting bad bots. They forbid the use of software that bypasses security controls on ticket provider websites. In order to do their dirty work, automated ticket scalping bots use many techniques, including web scraping that utilizes advanced business logic to detect scalping opportunities, input shopping cart purchase information, and even resell stock on secondary markets.
In other words, it is also up to you to protect against this fraudulent behavior, which is illegal scraping during your substantial sales, whether you’re a venue, company, or any software platform.
Great Britain followed the USA and formed the Digital Economy Act 2017, which achieved Royal Assent. In an increasingly digital world, the act aims to protect customers in many ways, including cracking down on ticket promos by making it a criminal offense for those who misuse bot technology to sweep up tickets and sell them on the secondary market at inflated prices.
Information Protection Issue
Although companies are less likely to take legal measures against web crawlers today, they can still use different techniques to limit web crawling for information protection purposes.
Some techniques will limit the access of bots to the site, for example, “rate-throttling” – which means
“to control the rate of requests sent or received by a network interface controller. It can be used to prevent DoS attacks and limit web scraping.”
Sites can also use technology like CAPTCHA to test whether it is a human or a web crawler trying to access the page.
Those techniques are typically used to prevent “bad bots” that overload and consequently crash the site. But the limiting techniques may be used in order to make automated scraping less cost-effective for web crawling companies.
In order to be sure that you are scraping ethically, you should know that there are pages that the website owner doesn’t want bots or crawlers to visit. Now, of course, all search engines are continually crawling the web. To ensure that other parties don’t index or list sensitive information on their sites, website owners take different measures that tell the crawler which parts of the website they should stay out of.
In order to protect the privacy of individuals, governments around the world have been working on information protection drafting legislation, with the most notable being the EU’s GDPR and California’s CCPA, as noted above. However, these laws protect individuals, not businesses. For example, if you want to scrape a piece of business information, name, number, or address, privacy laws don’t have much to say about this since this information is public data.
There is a significant amount of publicly available business information on the internet. Still, some of the biggest datasets are made available by websites only after creating an account or paying for a subscription.
If you sign up with a service like these, you will have to agree to terms and conditions that almost always limit the act of automated data collection and control how you use it.
Basically, if you’re scraping data from a service that requires you to create an account, it’s almost a guarantee that you will be breaking the terms of service. Also, your account allows the service provider to collect additional information about you. For example, how and where you log in, your pattern usage of their system, etc., is accessible to the platform’s owner on which you’ve registered.
These details make it much easier for the service provider to localize web scraping on their platform and ban your account.
With services similar to Google and Bing search engines, web scraping has helped us make the best use of the web. It is a powerful tool that allows corporations to use the information on the internet, but it should be done ethically.
Even though we looked into almost every possible legal aspect, it’s still hard to distinguish whether your data scraping campaign is legal or not without the advice of a lawyer. The truth is that this can be true for just about any other activity on the web. Legality is one aspect of the process. It is equally important to focus on the ethics behind it. Web scraping services give us an incredibly powerful potential of generating business leads, as long as it is used following the wishes of the target website in mind and with the respect of any individual whose data is collected.
It is crucial to be considerate of other people’s sites while trying to use their resources. Respect their rules and wishes. Read over their Terms of Service. Consider contacting the webmaster if you suspect a site prevents you from crawling and asking permission to crawl their site. Be considerate of their resources and try not to burn out their bandwidth – use a slower crawl rate. Do not publish any content you find that was not intended to be published. To put it shortly, web scraping is not illegal unless you use it unethically – e.g., scraping nonpublic data. In order to ensure this, it is smart to hire a data scraping company that takes legal matters into consideration.
And lastly, if you are looking to find out more about this matter or are simply interested in finding out how these processes operate, do not hesitate to contact us; our support team is always happy to hear from you.