How do I query the internet with Python?

онлайн тренажер по питону
Online Python Trainer for Beginners

Learn Python easily without overwhelming theory. Solve practical tasks with automatic checking, get hints in Russian, and write code directly in your browser — no installation required.

Start Course

What is the Requests Library and Why Do You Need It?

The requests library is one of the most popular and easy-to-use Python libraries for making HTTP requests. It provides a convenient interface for sending GET, POST, PUT, DELETE, and other types of HTTP requests with a minimal amount of code.

Key Advantages of the Requests Library

  • Simple syntax and intuitive API
  • Built-in session support and cookie management
  • Automatic JSON and form handling
  • Support for various authentication methods
  • Ability to work with proxy servers
  • Comprehensive documentation in English
  • Active support from the developer community

Installing the Requests Library

Installing the requests library is done via the pip package manager. The installation process depends on the version of Python you are using.

Standard Installation

pip install requests

Installation for Python 3

If multiple versions of Python are installed on your system, use the pip3 command to install it into the Python 3 environment:

pip3 install requests

Verifying the Installation

After installation, you can verify the success of the procedure by importing the library into the Python console:

import requests
print(requests.__version__)

Basics of Working with GET Requests

GET requests are used to retrieve data from a web server. This is the most common type of HTTP request in web development.

Simple GET Request

import requests

response = requests.get('https://api.github.com')
print(response.status_code)  # Response status (200 - OK)
print(response.text)         # Response body as a string

Analyzing the Server Response

When working with GET requests, it is important to properly analyze the received response. The response object contains many useful attributes:

  • status_code - HTTP status of the response
  • text - response content in text format
  • content - response content in bytes
  • headers - response headers
  • url - the final URL of the request

Working with Request Parameters

URL parameters can be passed through a dictionary, making the code more readable and easier to modify:

params = {
    'q': 'python programming',
    'page': 2,
    'limit': 50
}

response = requests.get('https://www.example.com/search', params=params)
print(response.url)  # Show the formed URL with parameters

POST Requests and Sending Data to the Server

POST requests are used to send data to the server, for example, when filling out forms or creating new records in the database.

Sending Form Data

data = {
    'username': 'admin',
    'password': '12345',
    'email': 'admin@example.com'
}

response = requests.post('https://httpbin.org/post', data=data)
print(response.text)

Sending JSON Data

When working with modern APIs, it is often necessary to send data in JSON format. The requests library automatically sets the correct Content-Type when using the json parameter:

import json

data = {
    'name': 'John Doe',
    'age': 30,
    'city': 'Moscow'
}

response = requests.post('https://httpbin.org/post', json=data)
result = response.json()  # Automatic conversion of the response to JSON
print(result)

Differences Between the Data and JSON Parameters

  • The data parameter sends data as a form (application/x-www-form-urlencoded)
  • The json parameter sends data in JSON format (application/json)

Working with HTTP Headers

HTTP headers contain metadata about the request or response. Proper header configuration is critical when working with APIs and web scraping.

Setting Custom Headers

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json())

Important Headers for Web Scraping

  • User-Agent - client identification (browser, bot)
  • Accept - content types that the client can process
  • Accept-Language - preferred languages
  • Referer - URL of the page from which the transition was made

Session Management and Cookies

Sessions allow you to maintain state between multiple HTTP requests. This is especially important when working with authorization and sites that require authentication.

Creating and Using a Session

session = requests.Session()

# Set cookies via the first request
session.get('https://httpbin.org/cookies/set/sessioncookie/123456789')

# Cookies are automatically sent in subsequent requests
response = session.get('https://httpbin.org/cookies')
print(response.text)

Advantages of Using Sessions

  • Automatic cookie management
  • Reuse of TCP connections to improve performance
  • Ability to set common headers for all session requests
  • Saving authorization parameters

Downloading Files from the Internet

The requests library makes it easy to download files of various formats from web servers.

Downloading Images and Documents

url = 'https://example.com/document.pdf'
response = requests.get(url)

with open('document.pdf', 'wb') as file:
    file.write(response.content)

Downloading Large Files with Streaming

For large files, it is recommended to use streaming to save memory:

url = 'https://example.com/large-file.zip'
response = requests.get(url, stream=True)

with open('large-file.zip', 'wb') as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)

Error and Exception Handling

Proper error handling is critical to building reliable network applications.

Main Types of Exceptions

try:
    response = requests.get('https://api.github.com/invalid-url', timeout=5)
    response.raise_for_status()  # Raises an exception for HTTP errors

except requests.exceptions.HTTPError as err:
    print(f"HTTP error: {err}")

except requests.exceptions.ConnectionError:
    print("Error connecting to the server")

except requests.exceptions.Timeout:
    print("Request timeout exceeded")

except requests.exceptions.RequestException as e:
    print(f"An unexpected error occurred: {e}")

Checking Response Statuses

response = requests.get('https://api.example.com/data')

if response.status_code == 200:
    print("Request was successful")
elif response.status_code == 404:
    print("Resource not found")
elif response.status_code == 500:
    print("Internal server error")
else:
    print(f"Received status: {response.status_code}")

Setting Request Timeouts

Setting timeouts prevents the program from hanging when servers are slow or unavailable.

Types of Timeouts

# Connection and data read timeout
response = requests.get('https://httpbin.org/delay/10', timeout=(5, 30))

# Total timeout for the entire request
try:
    response = requests.get('https://httpbin.org/delay/5', timeout=2)
    print(response.text)
except requests.exceptions.Timeout:
    print("Timeout exceeded")

Recommendations for Setting Timeouts

  • Connection timeout: 3-5 seconds
  • Read timeout: 15-30 seconds
  • For APIs: 10-15 seconds
  • For downloading files: 60-120 seconds

Working with Proxy Servers

Proxy servers are used to bypass geographical restrictions, ensure anonymity, or work through corporate networks.

Setting HTTP and HTTPS Proxies

proxies = {
    'http': 'http://proxy-server.com:8080',
    'https': 'https://proxy-server.com:8080'
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.text)

Proxies with Authentication

proxies = {
    'http': 'http://username:password@proxy-server.com:8080',
    'https': 'https://username:password@proxy-server.com:8080'
}

response = requests.get('https://example.com', proxies=proxies)

HTTP Authentication Methods

Modern web services use various authentication methods to protect data and control access.

HTTP Basic Authentication

from requests.auth import HTTPBasicAuth

response = requests.get(
    'https://httpbin.org/basic-auth/user/pass',
    auth=HTTPBasicAuth('user', 'pass')
)
print(response.status_code)

# Alternative method
response = requests.get(
    'https://httpbin.org/basic-auth/user/pass',
    auth=('user', 'pass')
)

Bearer Token Authentication

headers = {
    'Authorization': 'Bearer YOUR_API_TOKEN_HERE',
    'Content-Type': 'application/json'
}

response = requests.get('https://api.example.com/data', headers=headers)
print(response.json())

API Key Authentication

# In headers
headers = {'X-API-Key': 'your-api-key-here'}
response = requests.get('https://api.example.com/data', headers=headers)

# In URL parameters
params = {'api_key': 'your-api-key-here'}
response = requests.get('https://api.example.com/data', params=params)

Additional Requests Features

Sending Files to the Server

# Sending one file
files = {'file': open('document.txt', 'rb')}
response = requests.post('https://httpbin.org/post', files=files)

# Sending multiple files
files = {
    'file1': open('document1.txt', 'rb'),
    'file2': open('document2.txt', 'rb')
}
response = requests.post('https://httpbin.org/post', files=files)

Configuring Retries

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()

retry_strategy = Retry(
    total=3,                # Total number of attempts
    backoff_factor=1,       # Delay between attempts
    status_forcelist=[429, 500, 502, 503, 504],  # Statuses for retry
)

adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount('http://', adapter)
session.mount('https://', adapter)

response = session.get('https://unstable-api.example.com')

Frequently Asked Questions

What is Requests in Python?

Requests is a third-party Python library designed for making HTTP requests. It provides a simple and convenient interface for working with web services, APIs, and web scraping.

How to Solve the Problem of Request Blocking?

If a website blocks your requests, try the following methods:

  • Add a realistic User-Agent header
  • Use delays between requests
  • Configure proxy server rotation
  • Simulate real browser behavior
  • Follow the site's robots.txt rules

Differences Between Requests and Standard Urllib

The requests library has several advantages over the standard urllib:

  • Simpler and more intuitive API
  • Automatic cookie and session handling
  • Built-in JSON support
  • Better error handling
  • Support for various authentication methods
  • More readable code

Working with REST APIs

The requests library is ideal for working with REST APIs due to its support for all HTTP methods and automatic JSON processing:

# GET - retrieving data
response = requests.get('https://api.example.com/users/1')
user = response.json()

# POST - creating a new record
new_user = {'name': 'John', 'email': 'john@example.com'}
response = requests.post('https://api.example.com/users', json=new_user)

# PUT - updating a record
updated_user = {'name': 'John Updated', 'email': 'john.new@example.com'}
response = requests.put('https://api.example.com/users/1', json=updated_user)

# DELETE - deleting a record
response = requests.delete('https://api.example.com/users/1')

Useful Resources for Learning

For in-depth study of the requests library, it is recommended to familiarize yourself with the official documentation and additional materials:

  • Official Documentation
  • Code examples on GitHub
  • Training articles and video tutorials
  • Forums and Python developer communities

Conclusion

The requests library is a powerful and versatile tool for working with HTTP requests in Python. It is equally well suited for simple tasks of retrieving data from the Internet and for complex integrations with modern APIs and authentication systems.

Key benefits of using requests include ease of learning, reliability, extensive functionality, and active community support. When working with the library, it is important to remember security, properly handle possible errors, and always use trusted data sources.

By mastering the principles of working with requests, you will get a reliable tool for solving a wide range of tasks related to network interaction in Python applications.

News