
When you make an HTTP request using Python’s built-in http.client module, the response you get back is an instance of http.client.HTTPResponse. This object is the gateway to everything the server sends back: status codes, headers, and the actual data.
At its core, an HTTPResponse object represents the raw HTTP response stream. It’s not just a container for data; it’s a live connection from which you read bytes as they arrive, which means you need to be mindful about how you extract information.
Here’s what the key attributes look like:
status: An integer representing the HTTP status code (e.g., 200, 404).reason: The textual reason phrase associated with the status code (e.g., “OK”, “Not Found”).headers: An instance ofhttp.client.HTTPMessagewhich behaves like a case-insensitive dictionary for headers.version: The HTTP version used by the server (usually 10 for HTTP/1.0 or 11 for HTTP/1.1).
To get a sense of this in action, here’s a snippet that makes a simple GET request and inspects the response object:
import http.client
conn = http.client.HTTPSConnection("www.example.com")
conn.request("GET", "/")
response = conn.getresponse()
print("Status:", response.status)
print("Reason:", response.reason)
print("Headers:", response.getheaders())
Notice how getresponse() gives you this HTTPResponse instance directly. You don’t have to parse the raw socket stream yourself-that’s all handled under the hood.
Something less obvious but very important: the HTTPResponse object is a file-like object. This means you can read from it using methods like read(), readline(), or even iterate over it line-by-line. However, since it’s a network stream, you should be cautious with blocking calls.
Here’s a quick demonstration of reading the response body:
body = response.read()
print("Body length:", len(body))
print(body.decode('utf-8'))
Note that read() without arguments reads the entire response body into memory. If you are dealing with large responses or streaming data, this can be a problem. That’s where chunked reading or reading line by line becomes crucial.
Also, keep in mind that once you call read() or consume the stream, the data is gone-you can’t rewind or re-read from the same HTTPResponse object. If you need to process the data multiple times, you’ll have to store it somewhere first.
For large responses, reading the entire body at once is a recipe for disaster. A much safer approach is to read the response in chunks. The read() method can take an optional argument specifying the maximum number of bytes to read. You can call it in a loop until it returns an empty bytes object, which indicates that the entire response has been consumed. This is ideal for downloading files or processing large data streams without exhausting your system’s memory.
# Assuming 'response' is an http.client.HTTPResponse object
# and the connection is still open.
# This example downloads a large response and saves it to a file.
with open("downloaded_file.dat", "wb") as f:
while chunk := response.read(8192): # Read in 8KB chunks
f.write(chunk)
conn.close() # Don't forget to close the connection
Speaking of closing connections, this is critically important. An open HTTPResponse object holds onto the underlying socket connection. If you fail to close it, you’ll leak resources. The simplest way to ensure resources are cleaned up properly is to use a context manager (the with statement). Both http.client.HTTPConnection and http.client.HTTPResponse objects can be used as context managers, which guarantees their close() method is called, even if exceptions occur during processing.
import http.client
try:
with http.client.HTTPSConnection("www.example.com") as conn:
conn.request("GET", "/")
with conn.getresponse() as response:
print("Status:", response.status)
# The response object will be closed automatically
# when this block is exited.
body = response.read()
print("Body length:", len(body))
except http.client.HTTPException as e:
print("An HTTP error occurred:", e)
# Both the response and connection are now closed.
Another detail that often trips people up is character encoding. The response.read() method returns raw bytes, not a string. To get text, you must decode these bytes. While hardcoding utf-8 is a common default, it’s not always correct. A robust application should inspect the Content-Type header to determine the correct encoding. The server usually specifies it with a charset parameter, like Content-Type: text/html; charset=ISO-8859-1.
The HTTPResponse.headers object, which is an instance of email.message.Message, provides a convenient method for this: get_content_charset(). This method will parse the Content-Type header and return the specified charset. If none is found, you can provide a fallback.
import http.client
with http.client.HTTPSConnection("www.python.org") as conn:
conn.request("GET", "/")
with conn.getresponse() as response:
# Determine the encoding, defaulting to 'utf-8' if not specified
encoding = response.headers.get_content_charset('utf-8')
raw_body = response.read()
try:
# Decode the body using the correct encoding
html_content = raw_body.decode(encoding)
print("Successfully decoded", len(html_content), "characters.")
except UnicodeDecodeError:
print(f"Failed to decode the response using {encoding}.")
# Here you might try a different encoding or handle the raw bytes
