Why is web scraping legal? Thats one of the most frustrating questions people have about data extraction practice. There are terms associated with an ethical and unethical scraping of data from the web – “good bots” and “bad bots.”
For example, these “good bots” enable indexing web content for market research to measure engagement on social media platforms. Sometimes companies request creating procurement data scraping software to monitor market activities.
In comparison, “bad bots” mine information to use it for purposes that fall outside the information owner’s control. It is estimated that about 10% of all web crawling bots are bad bots.
Potential Damage of Harmful Bots
They perform various damaging activities, such as competitive data mining, denial of service attacks, account hijacking, online fraud, stealing intellectual property, data theft, spam and digital ad fraud, unauthorized vulnerability scans, attacks on the stock market, and data retrieval, etc.
Newly formed businesses find it super profitable because it’s a cheap and powerful way to gather data. In comparison, big companies like using crawlers for their gain but don’t want others to use bots against them.
It is essential to understand how to use the information and where you got it from. You can mine data about business contacts, for example, in just a few minutes, allowing you to generate leads of potential customers.
But this would violate the privacy of your potential clients. I’m sure you remember large spam email campaigns- these are just a tiny example of abusing scraped data. So how can you leverage and balance this process from a legal standpoint? Let’s take a look at historical data.
Web Scraping Legal Issues and How Were They Solved
In 2000 eBay filed a preliminary injunction against Bidder’s Edge, which brought attention to scraping practice. Here eBay stated that the use of bots on the site broke trespass to Chattels law against the company’s will.
The court granted the injunction because users had to agree to the terms of service on the site. A large number of bots could be damaging to eBay’s computer systems.
The lawsuit settled out of court, but the consequence of the process was setting the legal precedent. Later, in 2001, a travel agency sued a competitor company which “scraped” prices from their website to create a competitive pricing system.
The judge ruled that the site’s owner not welcoming this scraping was insufficient to make it “unauthorized access” for federal hacking laws.
Legal Scraping Transformation
Over the subsequent years, the courts repeatedly ruled that merely putting the warning “do not scrape us” in your website terms of service is not enough.
To cause web scraping legal issues, a user must explicitly agree or consent to the terms. A few years later, one of the first copyright lawsuits against a web scraper was won by Facebook in 2009.
This laid the foundations for multiple cases that link any data scraping with a clear breach of copyright and apparent monetary damages.
The decision of 2019 of the US Court of Appeals denied LinkedIn’s request to prevent analytics company HiQ from scraping its data. This was a historic moment in the data privacy and data regulation era.
This decision showcases that using publicly available and not copyrighted data can not cause legal issues in web scraping.
However, the decision does not grant web crawlers the possibility to use data gathered by scraping for any commercial purpose or the freedom of obtaining data from sites that require authentication.
Sometimes legal aspects may vary around various industries that may be using web scraping. These sites have terms of service that usually forbid similar activity. But publicly available sites can not require a user to agree to any terms of service, so users are free to use web crawlers to collect data from the site.
Regulations Surrounding Illegal Scraping
To prevent illegal scraping, In 2016, Congress passed the Better Online Ticket Sales (BOTS) Act – the first legislation explicitly targeting bad bots.
They forbid the use of software that bypasses security controls on ticket provider websites.
To do their dirty work, automated ticket scalping bots use many techniques, including web scraping that utilizes data-driven business logic to detect scalping opportunities, input shopping cart purchase information, and even resell stock on secondary markets.
In other words, it is also up to you to protect against this fraudulent behavior, which is illegal scraping during your substantial sales, whether you’re a venue, company, or any software platform.
Great Britain followed the USA and formed the Digital Economy Act 2017, which achieved Royal Assent.
In an increasingly digital world, the act aims to protect customers in many ways, including cracking down on ticket promos by making it a criminal offense for those who misuse bot technology to sweep up tickets and sell them on the secondary market at inflated prices.
Information Protection Issue
Although companies are less likely to take legal measures against web crawlers today, they can still use different techniques to limit web crawling for information protection purposes.
Some techniques will limit the access of bots to the site, for example, “rate-throttling,” – which means
“to control the rate of requests sent or received by a network interface controller. It can be used to prevent DoS attacks and limit web scraping.”
Sites can also use technology like CAPTCHA to test whether it is a human or a web crawler trying to access the page. Those techniques are typically used to prevent “bad bots” that overload and consequently crash the site.
But the limiting techniques may be used to make automated scraping less cost-effective for web crawling companies.
To be sure that you are scraping ethically, you should know that there are pages that the website owner doesn’t want bots or crawlers to visit. Now, of course, all search engines are continually crawling the web.
To ensure that other parties don’t index or list sensitive information on their sites, website owners take different measures to tell the crawler which parts they should stay out of.
Nowadays, every website on the internet has Terms and Conditions that state what users can and can not do on their website.
If you are considering crawling a website to obtain data for your web scraper, you should start by looking at these T&C’s, privacy policies, or other terms on their website.
Data Privacy and Protection
To protect the privacy of individuals, governments around the world have been working on information protection drafting legislation, with the most notable being the EU’s GDPR and California’s CCPA, as noted above.
However, these laws protect individuals, not businesses. For example, if you want to scrape a piece of business information, name, number, or address, privacy laws don’t have much to say about this since this information is public data.
There is a significant amount of publicly available business information on the internet. Still, some of the most extensive datasets are made available by websites only after creating an account or paying for a subscription.
If you sign up with a service like these, you will have to agree to terms and conditions that almost always limit automated data collection activities and control how you use them.
If you’re scraping data from a service that requires you to create an account, it’s almost a guarantee that you will be breaking the terms of service. Also, your account allows the service provider to collect additional information about you.
For example, how and where you log in, your pattern usage of their system, etc., is accessible to the platform’s owner on which you’ve registered.
These details make it much easier for the service provider to localize web scraping on their platform and ban your account.
With services similar to Google and Bing search engines, web scraping has helped us make the best web use. It is a powerful tool that allows corporations to use the information on the internet, but it should be done ethically.
Even though we looked into almost every possible legal aspect, it’s still hard to distinguish whether your data scraping campaign is legal or not without the advice of a lawyer.
The truth is that this can be true for just about any other activity on the web. Legality is one aspect of the process. It is equally important to focus on the ethics behind it.
Web scraping gives us a compelling potential to generate business intelligence, as long as it is used following the wishes of the target website in mind and with the respect of any individual whose data is collected.
It is crucial to be considerate of other people’s sites while trying to use their resources. Respect their rules and wishes. Read over their Terms of Service.
Consider contacting the webmaster if you suspect a site prevents you from crawling and asking permission to crawl their site.
Be considerate of their resources and try not to burn out their bandwidth – use a slower crawl rate. Do not publish any content you find that was not intended to be published.