python requests html

Python Requests HTML

If you want to get HTML from a website, you can use Python Requests HTML library. It is an extension of the Requests library, which makes it easy to send HTTP requests and handle the response.

Installation

You can install Requests HTML using pip:


pip install requests-html

Make sure you have the latest version of pip installed.

Basic Usage

To use Requests HTML, you first need to create a session:


from requests_html import HTMLSession

session = HTMLSession()

Then you can use the session to make a GET request to a website:


response = session.get('https://www.example.com/')

print(response.html)

The html attribute of the response object contains the HTML content of the website.

Rendering JavaScript

Requests HTML also has a built-in JavaScript rendering engine, which allows you to get the full HTML content of a website, including any content loaded by JavaScript:


response = session.get('https://www.example.com/')

response.html.render()

print(response.html)

The render() method will execute any JavaScript on the website and update the html attribute with the full HTML content.

Using CSS Selectors

You can use CSS selectors to extract specific elements from the HTML content:


response = session.get('https://www.example.com/')

# find all links on the page
links = response.html.find('a')

for link in links:
    print(link.text, link.attrs['href'])

The find() method returns a list of elements that match the CSS selector. You can then iterate over the list to extract the desired information.

Conclusion

Requests HTML is a powerful library for scraping websites in Python. It makes it easy to get HTML content, render JavaScript, and extract specific elements using CSS selectors. Try it out for yourself and see how it can simplify your web scraping tasks.