Responsible Steam Market Scraping¶
We all have a one, but a big and tasty, pie ๐ฅง, called SteamMarket
. We all want to get a piece of it ๐ฐ,
that's why we're here. However, it's crucial to approach market data collection responsibly.
Consume Steam resources responsibly. Bear in mind, we all benefit if Steam spends fewer resources fighting aggressive market scraping. Excessive scraping can lead to IP bans, rate limits, and a worse experience for everyone.
HTTP Caching with If-Modified-Since¶
One of the most effective ways to scrape responsibly is to implement proper HTTP caching using the If-Modified-Since
header. This standard HTTP mechanism allows clients to:
- Retrieve data only when it has changed since the last request
- Reduce bandwidth usage for both client and server
- Minimize the risk of hitting rate limits
- Create more efficient and responsive applications
How It Works¶
The HTTP caching mechanism works as follows:
- When you first request a resource, the server includes a
Last-Modified
header in the response - For subsequent requests, you include an
If-Modified-Since
header with the timestamp from the previous response - If the resource hasn't changed since that time, the server returns a
304 Not Modified
status code without the resource body - If the resource has changed, the server returns the updated resource with a new
Last-Modified
timestamp
In aiosteampy
, this mechanism is implemented through the if_modified_since
parameter and the ResourceNotModified
exception.
Implementation in aiosteampy¶
Several methods in aiosteampy
support the if_modified_since
parameter:
SteamCommunityPublicMixin.get_item_orders_histogram
SteamCommunityPublicMixin.get_item_listings
SteamCommunityPublicMixin.market_listings
These methods:
- Accept an optional
if_modified_since
parameter (either adatetime
object or a formatted string) - Return a
last_modified
timestamp along with the requested data - Raise a
ResourceNotModified
exception when the resource hasn't changed
Basic Example¶
Here's a simple example of how to use this mechanism:
from aiosteampy import ResourceNotModified, SteamPublicClient
client = SteamPublicClient(...)
# Initial request to get data and last_modified timestamp
histogram, last_modified = await client.get_item_orders_histogram(123456)
# Later, when you need to check for updates
try:
# Pass the previous last_modified timestamp
histogram, last_modified = await client.get_item_orders_histogram(
123456,
if_modified_since=last_modified, # Use the timestamp from the previous response
)
# Process the updated data
print("Data has been updated!")
# Do something with the new histogram data
except ResourceNotModified:
print("Data hasn't changed since last request")
# Use your cached data instead
Advanced Implementation with Caching¶
For a more complete implementation with caching:
from aiosteampy import ResourceNotModified
import time
class SimpleCache:
def __init__(self):
self.data = {}
self.timestamps = {}
def get(self, key):
return self.data.get(key), self.timestamps.get(key)
def set(self, key, data, timestamp):
self.data[key] = data
self.timestamps[key] = timestamp
# Create a cache
cache = SimpleCache()
item_nameid = 123456
async def get_histogram_with_cache(client, item_nameid):
# Try to get from cache
cached_data, last_modified = cache.get(item_nameid)
try:
# Always make the request, but with if_modified_since if we have cached data
histogram, new_last_modified = await client.get_item_orders_histogram(
item_nameid,
if_modified_since=last_modified if last_modified else None
)
# Update cache with new data
cache.set(item_nameid, histogram, new_last_modified)
return histogram
except ResourceNotModified:
# If data hasn't changed, use cached data
print("Using cached data - resource not modified")
return cached_data
Benefits¶
Using the If-Modified-Since
mechanism provides several benefits:
- Reduced Bandwidth: You only download the full data when it has actually changed
- Fewer Rate Limits: You're less likely to hit Steam's
429: Too Many Requests
errors - Faster Responses: 304 responses are faster as they don't include the resource body
- Server-Friendly: Reduces load on Steam's servers, making you a good API citizen
- More Reliable: Your application can continue to function even during high-traffic periods
By implementing proper caching with the if_modified_since
parameter, you can create more efficient and
reliable applications that interact with the Steam Market.