how to get encoding format of url in python

How to Get Encoding Format of URL in Python?

As a blogger and a programmer, I often encounter the need to get the encoding format of a URL in Python. Recently, while working on a project that involved web scraping, I came across a situation where I had to decode the URL to its original format. Here's how I did it:

Using urlparse library

The easiest way to get the encoding format of a URL in Python is to use the urlparse library. This library is part of the Python Standard Library and provides a way to parse URLs into their component parts. Here's how to use it:


from urllib.parse import urlparse

url = "https://www.example.com/path/to/page.html?query=parameter"
parsed_url = urlparse(url)

print(parsed_url.scheme) # output: https
print(parsed_url.netloc) # output: www.example.com
print(parsed_url.path) # output: /path/to/page.html
print(parsed_url.query) # output: query=parameter

In the above code, we first import the urlparse library from the urllib.parse module. Then, we define a URL string and pass it to the urlparse() function. This function returns an object that contains the parsed components of the URL. We can then access these components using the object's attributes.

Using urllib.parse.unquote()

If you want to decode the URL to its original format, you can use the unquote() function from the urllib.parse module. This function replaces all %xx escapes with their corresponding ASCII characters. Here's an example:


from urllib.parse import unquote

url = "https://www.example.com/path/to/page.html?query=Hello%20World%21"
decoded_url = unquote(url)

print(decoded_url) # output: https://www.example.com/path/to/page.html?query=Hello World!

In the above code, we first import the unquote() function from the urllib.parse module. Then, we define a URL string that contains encoded characters. We pass this URL string to the unquote() function, which returns the URL in its original format.

Using requests library

If you're working with web pages and want to extract the encoding format from the HTTP headers, you can use the requests library. This library allows you to send HTTP requests using Python and retrieve the response headers. Here's an example:


import requests

url = "https://www.example.com"
response = requests.get(url)

print(response.encoding) # output: UTF-8

In the above code, we first import the requests library. Then, we define a URL string and send an HTTP GET request using the requests.get() function. This function returns a Response object that contains the response headers. We can then access the encoding format using the encoding attribute.

These are some of the ways to get the encoding format of a URL in Python. Depending on your specific use case, you can choose the method that works best for you.