When making HTTP requests with Python Requests library, the default User-Agent string sent along with the request can be easily detected and blocked by websites (e.g amazon). This is done to prevent web scraping or other automated requests from bots. In this article, we will explore how to change the User-Agent string in Python Requests to avoid getting blocked by websites.
What is a User Agent?
When you make an HTTP request using Python’s Requests library, a User-Agent string is automatically included in the request headers. This string identifies the client making the request and typically includes information about the client’s operating system and browser. For example, a User-Agent string might look like this:
“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36”.
This string tells the server that the request is coming from a Windows 10 computer running Google Chrome. Some websites use the User-Agent string to customize the content returned to the client, while others use it to block automated requests from bots.
By customizing the User-Agent string, you can make your request appear to come from a different browser or operating system, which can be useful for web scraping and other purposes.
Sending a Request without a Custom User-Agent
Let’s start with an example where we try to connect to amazon.com without a custom User-Agent string. This is a website which implements proper security to block scrapers/bots, so it serves as a good example to demonstrate the importance of using a custom User-Agent string.
import requests
url = 'https://www.amazon.com/'
response = requests.get(url)
print(response.status_code) # Gives a 503 error
When we run this code, it will send a GET request to Amazon.com and print the status code of the response. However, in most cases, the request will be blocked, and the response will be a 503 error code.
But what is the current user agent being sent? Even though we have not defined one, there is a default user-agent being sent by the requests library.
To check this, we can look at the headers of the request object, which is stored in the response object. Inside these headers (which is a dictionary), there is a key-value pair for “User-Agent”.
import requests
url = 'https://www.amazon.com/'
response = requests.get(url)
print(response.request.headers["User-Agent"])
python-requests/2.25.1
As you can see, the user-agent identifies our request, as being sent from the python requests library. And this obviously indicates that we are a bot.
Change User-agent in Python Requests
To avoid getting blocked by websites like Amazon, we need to use a custom User-Agent string. This can be done easily with the Requests library by passing in a dictionary of headers that includes the User-Agent string.
import requests
url = 'https://www.amazon.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.status_code) # gives a 200 code
In this code, we create a dictionary of headers that includes a custom User-Agent string. The User-Agent string in this case is a common User-Agent string for the Google Chrome browser. We then pass this dictionary of headers as an argument to the get() method of the Requests library.
When we run this code, it should return a 200 status code, indicating that the request was successful.
If you are wondering where to get User agents from, you can either use the one above, or search online. Simply type, “my user agent” into google, and it will tell you your user agent. You can also find websites which will generate random user-agents for you.
This marks the end of the “Change “User-agent” in Python Requests library” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be made in the comments section below.