python requests beautifulsoup

Python Requests BeautifulSoup

If you are into web scraping or data extraction from websites, you might have come across two libraries - Python Requests and BeautifulSoup. In this post, I will explain what these libraries do and how they can be used together.

Python Requests

Python Requests is a library that allows you to send HTTP/1.1 requests using Python. It provides an easy-to-use interface for making HTTP requests and handling the response.

The library can be installed using pip:


pip install requests

Here's an example of how to use the library to make a GET request:


import requests

response = requests.get('https://www.example.com')
print(response.content)

This will send a GET request to 'https://www.example.com' and print the content of the response.

BeautifulSoup

BeautifulSoup is a Python library for pulling data out of HTML and XML files. It provides ways of navigating and searching the parse tree created from the HTML/XML document.

The library can be installed using pip:


pip install beautifulsoup4

Here's an example of how to use the library to parse an HTML document:


from bs4 import BeautifulSoup

html_doc = """
<html>
  <head>
    <title>Example</title>
  </head>
  <body>
    <p>This is an example.</p>
  </body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())

This will parse the HTML document stored in the variable html_doc and print it in a prettified format.

Using Python Requests and BeautifulSoup together

Now that we know what Python Requests and BeautifulSoup do, let's see how they can be used together for web scraping. We will use Python Requests to get the HTML content of a webpage and then use BeautifulSoup to parse it and extract the data we need.


import requests
from bs4 import BeautifulSoup

url = 'https://www.example.com'
response = requests.get(url)
html_content = response.content

soup = BeautifulSoup(html_content, 'html.parser')
# Now we can use BeautifulSoup to extract the data we need

In this example, we first use Python Requests to send a GET request to 'https://www.example.com' and get the HTML content of the page. We then use BeautifulSoup to parse the HTML content and extract the data we need.

There are other ways of using Python Requests and BeautifulSoup together, such as using Python Requests to send POST requests or using BeautifulSoup's advanced parsing features. However, the basic method shown above should be sufficient for most web scraping tasks.