Supercharging Walrus Aggregators with Nginx and Cloudflare Cache Reserve
Introduction
According to the Walrus design, “Caches are aggregators with additional caching functionality to decrease latency and reduce load on storage nodes. Such cache infrastructures may also act as CDNs, split the cost of blob reconstruction over many requests, and be better connected.” In other words, caching within a Web2 aggregator environment — especially when large volumes of data must be reconstructed — can drastically enhance performance and reliability. We’ll show how to implement caching at two layers (Nginx and Cloudflare) to handle potentially high read demands on your Walrus aggregator.
1. Nginx Caching Basics
Define your cache zone in nginx.conf
(or an included file):
proxy_cache_path /cache levels=1:2 keys_zone=agg_cache:10m max_size=16g inactive=1h use_temp_path=off;
/cache
is the cache directory.levels=1:2
organizes cached files in subdirectories.keys_zone=agg_cache:10m
allocates shared memory for storing cached object metadata.max_size=16g
caps total cache size.inactive=1h
expires items unused for over an hour.
Site Configuration Example
Point your aggregator domain to your backend and activate the cache:
server {
server_name walrus-testnet-aggregator.natsai.xyz;
location / {
proxy_pass http://localhost:9000;
proxy_cache agg_cache;
proxy_cache_bypass $http_cache_control;
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
add_header X-Cache-Status $upstream_cache_status;
}
}
2. Cloudflare Cache Setup
- Tiered Cache: Turn on under Cache → Tiered Cache. This has Cloudflare route requests more efficiently among their PoPs, cutting hits to your origin.
- Cache Reserve: Enable under Cache → Cache Reserve, giving you an economical storage layer for long-tail objects.
- Cache Rule: Create a rule matching:
- Hostname =
walrus-testnet-aggregator.natsai.xyz
- Path contains
/v1/blobs/
Then override the default caching with: - Edge TTL (e.g., 7 days)
- Browser TTL (e.g., 2 hours)
- Optionally ignore origin headers if you want heavier caching.
Pricing & Importance:
Cloudflare Cache Reserve charges a modest rate (e.g., $0.015 per GB-month, $0.36 per million reads, $4.50 per million writes). This pay-as-you-go model can be far more economical than constantly hitting your origin, especially if your aggregator serves large or infrequently accessed blobs. Since Walrus can store massive datasets over multiple epochs, lowering origin egress with Cache Reserve helps you stay scalable and cost-effective.
3. Verifying Cache Hits and Misses (Python + Nginx Checks)
Once you’ve configured Nginx and Cloudflare as discussed, you want to confirm that requests are being cached, and measure the difference between cold (MISS) and warm (HIT) responses. Here’s how:
3.1. Use a Python Test Script
Below is a Python script that:
- Publishes random data as a blob on your Walrus publisher.
- Fetches each published blob from the aggregator endpoint.
- Logs timing information to roughly distinguish cache hits from disk/origin fetches.
import requests
import uuid
import json
import random
import time
import datetime
# Publisher and Aggregator endpoints
PUBLISHER_URL = "https://walrus-testnet-publisher.natsai.xyz/v1/blobs"
AGGREGATOR_URL = "https://walrus-testnet-aggregator.natsai.xyz/v1/blobs/"
def generate_random_data():
random_string = "".join(random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=16))
print(f"Generated random data string: {random_string}")
return f"testdata-{uuid.uuid4()}-{random_string}", random_string
def publish_blob(data):
try:
print(f"Making publisher call with data: {data}")
start_time = datetime.datetime.now()
response = requests.put(PUBLISHER_URL, data=data)
end_time = datetime.datetime.now()
duration = (end_time - start_time).total_seconds() * 1000
print(f"Publisher response: {response.status_code}, {response.text}, Time taken: {duration:.2f} ms")
response.raise_for_status()
result = response.json()
print(f"Parsed response JSON: {result}")
if "alreadyCertified" in result:
print("Blob already exists:", result)
return result["alreadyCertified"]["blobId"]
elif "newlyCreated" in result:
print("New blob created:", result)
return result["newlyCreated"]["blobObject"]["blobId"]
else:
print("Unexpected response structure:", result)
return None
except Exception as e:
print("Error publishing blob:", e)
return None
def fetch_blob(blob_id):
try:
print(f"Fetching blob metadata for blob ID: {blob_id}")
start_time = datetime.datetime.now()
response = requests.get(f"{AGGREGATOR_URL}{blob_id}")
end_time = datetime.datetime.now()
duration = (end_time - start_time).total_seconds() * 1000
print(f"Aggregator response for blob {blob_id}: Status Code {response.status_code}, Response: {response.text}, Time taken: {duration:.2f} ms")
# Simple threshold to guess if it's a cache hit or not
if duration > 100:
print("Data likely fetched from disk or origin.")
else:
print("Data likely fetched from cache.")
return response.text if response.status_code == 200 else None
except Exception as e:
print("Error fetching blob metadata:", e)
return None
def stress_test():
blob_ids = []
original_data = []
print("Starting stress test...")
# Publish 10 random blobs
for i in range(10):
print(f"\nIteration {i + 1}: Generating and publishing blob")
data, original_string = generate_random_data()
print(f"Publishing blob {i + 1}: {data}")
blob_id = publish_blob(data)
if blob_id:
print(f"Successfully published blob with ID: {blob_id}")
blob_ids.append(blob_id)
original_data.append((blob_id, original_string))
else:
print(f"Failed to publish blob {i + 1}")
time.sleep(1)
print("\nPublishing complete. Fetching blobs...")
# Fetch each blob
for i, (blob_id, original_string) in enumerate(original_data):
print(f"\nFetching blob {i + 1}/{len(blob_ids)}: {blob_id}")
fetched_data = fetch_blob(blob_id)
if fetched_data:
print(f"Successfully fetched blob {blob_id}, Original Data: {original_string}, Fetched: {fetched_data}")
else:
print(f"Failed to fetch blob {blob_id}")
time.sleep(1)
print("\nRe-fetching blobs to test cache performance...")
# Re-fetch the same blobs to check for cache improvement
for i, (blob_id, original_string) in enumerate(original_data):
print(f"\nRe-fetching blob {i + 1}/{len(blob_ids)}: {blob_id}")
fetched_data = fetch_blob(blob_id)
if fetched_data:
print(f"Re-fetched blob {blob_id}, Original Data: {original_string}, Fetched: {fetched_data}")
else:
print(f"Failed to re-fetch blob {blob_id}")
time.sleep(1)
print("\nStress test completed.")
if __name__ == "__main__":
stress_test()
Key Observations:
- Publish Time: Varies based on how quickly the publisher can store data on Walrus nodes and return a certificate.
- Fetch Time (Cold): The first fetch for a blob is typically slower (Nginx or Cloudflare must retrieve the data from the aggregator backend or from the Walrus nodes themselves).
- Fetch Time (Warm): Repeated fetches within a short timespan often show a faster response, indicating a cache HIT.
3.2. Interpreting Logs & Headers
- Script Output
- The script prints out
Time taken
in milliseconds for both publishing and fetching. - If
duration > 100 ms
(arbitrary threshold), the script logs “Data likely fetched from disk or origin.” If< 100 ms
, it logs “Data likely fetched from cache.” - These aren’t precise metrics but give a quick gauge of whether you got a likely cache HIT.
2. Nginx Access Logs
- If you’re running Nginx, check your access logs (often in
/var/log/nginx/access.log
). Look forMISS
orHIT
in theX-Cache-Status
header (if you’ve setadd_header X-Cache-Status $upstream_cache_status;
). - A line might look like:
"GET /v1/blobs/xxx HTTP/1.1" 200 346 "-" "Python/3.9" "HIT"
Indicating an Nginx cache hit.
3. Cloudflare Response Headers
- If you enable Cloudflare’s caching, check for response headers such as
cf-cache-status: HIT
orMISS
. - You can do this using browser dev tools or a simple
curl -I https://...
command.
3.3. Inspecting the /cache
Directory
To confirm that Nginx is storing data locally, examine the /cache
directory you configured in nginx.conf
:
du -sh /cache
shows overall size usage.ls -R /cache
reveals subdirectories (based on thelevels=1:2
setting).- As you fetch new blobs, watch for growth in this directory. Over time, older, inactive items are evicted (
inactive=1h
).
For large-scale reads or many concurrent blob requests, you’ll see the cache fill up to max_size=16g
(in the example config). If you need more space, tweak that parameter accordingly in nginx.conf
.
Conclusion
By layering a local Nginx cache with Cloudflare’s Tiered Cache and Cache Reserve, you ensure that frequently requested blobs stay hot in memory or on-disk near users. This significantly reduces blob reconstruction overhead on storage nodes and accelerates response times for all clients. As Walrus scales and larger files become more common, these caching best practices help maintain low-latency, high-throughput blob deliveries — critical for any system striving to handle high-volume off-chain storage.