Supercharging Walrus Aggregators with Nginx and Cloudflare Cache Reserve

6 min readJan 26, 2025

Introduction

According to the Walrus design, “Caches are aggregators with additional caching functionality to decrease latency and reduce load on storage nodes. Such cache infrastructures may also act as CDNs, split the cost of blob reconstruction over many requests, and be better connected.” In other words, caching within a Web2 aggregator environment — especially when large volumes of data must be reconstructed — can drastically enhance performance and reliability. We’ll show how to implement caching at two layers (Nginx and Cloudflare) to handle potentially high read demands on your Walrus aggregator.

1. Nginx Caching Basics

Define your cache zone in nginx.conf (or an included file):

proxy_cache_path /cache levels=1:2 keys_zone=agg_cache:10m max_size=16g inactive=1h use_temp_path=off;

/cache is the cache directory.
levels=1:2 organizes cached files in subdirectories.
keys_zone=agg_cache:10m allocates shared memory for storing cached object metadata.
max_size=16g caps total cache size.
inactive=1h expires items unused for over an hour.

Site Configuration Example

Point your aggregator domain to your backend and activate the cache:

server {
    server_name walrus-testnet-aggregator.natsai.xyz;

    location / {
        proxy_pass http://localhost:9000;
        proxy_cache agg_cache;
        proxy_cache_bypass $http_cache_control;
        proxy_cache_valid 200 302 10m;
        proxy_cache_valid 404 1m;
        proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
        add_header X-Cache-Status $upstream_cache_status;
    }
}

2. Cloudflare Cache Setup

Tiered Cache: Turn on under Cache → Tiered Cache. This has Cloudflare route requests more efficiently among their PoPs, cutting hits to your origin.
Cache Reserve: Enable under Cache → Cache Reserve, giving you an economical storage layer for long-tail objects.
Cache Rule: Create a rule matching:

Hostname = walrus-testnet-aggregator.natsai.xyz
Path contains /v1/blobs/
Then override the default caching with:
Edge TTL (e.g., 7 days)
Browser TTL (e.g., 2 hours)
Optionally ignore origin headers if you want heavier caching.

Pricing & Importance:
Cloudflare Cache Reserve charges a modest rate (e.g., $0.015 per GB-month, $0.36 per million reads, $4.50 per million writes). This pay-as-you-go model can be far more economical than constantly hitting your origin, especially if your aggregator serves large or infrequently accessed blobs. Since Walrus can store massive datasets over multiple epochs, lowering origin egress with Cache Reserve helps you stay scalable and cost-effective.

3. Verifying Cache Hits and Misses (Python + Nginx Checks)

Once you’ve configured Nginx and Cloudflare as discussed, you want to confirm that requests are being cached, and measure the difference between cold (MISS) and warm (HIT) responses. Here’s how:

3.1. Use a Python Test Script

Below is a Python script that:

Publishes random data as a blob on your Walrus publisher.
Fetches each published blob from the aggregator endpoint.
Logs timing information to roughly distinguish cache hits from disk/origin fetches.

import requests
import uuid
import json
import random
import time
import datetime

# Publisher and Aggregator endpoints
PUBLISHER_URL = "https://walrus-testnet-publisher.natsai.xyz/v1/blobs"
AGGREGATOR_URL = "https://walrus-testnet-aggregator.natsai.xyz/v1/blobs/"

def generate_random_data():
    random_string = "".join(random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=16))
    print(f"Generated random data string: {random_string}")
    return f"testdata-{uuid.uuid4()}-{random_string}", random_string

def publish_blob(data):
    try:
        print(f"Making publisher call with data: {data}")
        start_time = datetime.datetime.now()
        response = requests.put(PUBLISHER_URL, data=data)
        end_time = datetime.datetime.now()
        duration = (end_time - start_time).total_seconds() * 1000
        print(f"Publisher response: {response.status_code}, {response.text}, Time taken: {duration:.2f} ms")
        response.raise_for_status()
        result = response.json()
        print(f"Parsed response JSON: {result}")
        if "alreadyCertified" in result:
            print("Blob already exists:", result)
            return result["alreadyCertified"]["blobId"]
        elif "newlyCreated" in result:
            print("New blob created:", result)
            return result["newlyCreated"]["blobObject"]["blobId"]
        else:
            print("Unexpected response structure:", result)
            return None
    except Exception as e:
        print("Error publishing blob:", e)
        return None

def fetch_blob(blob_id):
    try:
        print(f"Fetching blob metadata for blob ID: {blob_id}")
        start_time = datetime.datetime.now()
        response = requests.get(f"{AGGREGATOR_URL}{blob_id}")
        end_time = datetime.datetime.now()
        duration = (end_time - start_time).total_seconds() * 1000
        print(f"Aggregator response for blob {blob_id}: Status Code {response.status_code}, Response: {response.text}, Time taken: {duration:.2f} ms")
        
        # Simple threshold to guess if it's a cache hit or not
        if duration > 100:
            print("Data likely fetched from disk or origin.")
        else:
            print("Data likely fetched from cache.")
        
        return response.text if response.status_code == 200 else None
    except Exception as e:
        print("Error fetching blob metadata:", e)
        return None

def stress_test():
    blob_ids = []
    original_data = []

    print("Starting stress test...")

    # Publish 10 random blobs
    for i in range(10):
        print(f"\nIteration {i + 1}: Generating and publishing blob")
        data, original_string = generate_random_data()
        print(f"Publishing blob {i + 1}: {data}")
        blob_id = publish_blob(data)
        if blob_id:
            print(f"Successfully published blob with ID: {blob_id}")
            blob_ids.append(blob_id)
            original_data.append((blob_id, original_string))
        else:
            print(f"Failed to publish blob {i + 1}")
        time.sleep(1)

    print("\nPublishing complete. Fetching blobs...")

    # Fetch each blob
    for i, (blob_id, original_string) in enumerate(original_data):
        print(f"\nFetching blob {i + 1}/{len(blob_ids)}: {blob_id}")
        fetched_data = fetch_blob(blob_id)
        if fetched_data:
            print(f"Successfully fetched blob {blob_id}, Original Data: {original_string}, Fetched: {fetched_data}")
        else:
            print(f"Failed to fetch blob {blob_id}")
        time.sleep(1)

    print("\nRe-fetching blobs to test cache performance...")

    # Re-fetch the same blobs to check for cache improvement
    for i, (blob_id, original_string) in enumerate(original_data):
        print(f"\nRe-fetching blob {i + 1}/{len(blob_ids)}: {blob_id}")
        fetched_data = fetch_blob(blob_id)
        if fetched_data:
            print(f"Re-fetched blob {blob_id}, Original Data: {original_string}, Fetched: {fetched_data}")
        else:
            print(f"Failed to re-fetch blob {blob_id}")
        time.sleep(1)

    print("\nStress test completed.")

if __name__ == "__main__":
    stress_test()

Key Observations:

Publish Time: Varies based on how quickly the publisher can store data on Walrus nodes and return a certificate.
Fetch Time (Cold): The first fetch for a blob is typically slower (Nginx or Cloudflare must retrieve the data from the aggregator backend or from the Walrus nodes themselves).
Fetch Time (Warm): Repeated fetches within a short timespan often show a faster response, indicating a cache HIT.

3.2. Interpreting Logs & Headers

Script Output

The script prints out Time taken in milliseconds for both publishing and fetching.
If duration > 100 ms (arbitrary threshold), the script logs “Data likely fetched from disk or origin.” If < 100 ms, it logs “Data likely fetched from cache.”
These aren’t precise metrics but give a quick gauge of whether you got a likely cache HIT.

2. Nginx Access Logs

If you’re running Nginx, check your access logs (often in /var/log/nginx/access.log). Look for MISS or HIT in the X-Cache-Status header (if you’ve set add_header X-Cache-Status $upstream_cache_status;).
A line might look like:

"GET /v1/blobs/xxx HTTP/1.1" 200 346 "-" "Python/3.9" "HIT"

Indicating an Nginx cache hit.

3. Cloudflare Response Headers

If you enable Cloudflare’s caching, check for response headers such as cf-cache-status: HIT or MISS.
You can do this using browser dev tools or a simple curl -I https://... command.

3.3. Inspecting the `/cache` Directory

To confirm that Nginx is storing data locally, examine the /cache directory you configured in nginx.conf:

du -sh /cache shows overall size usage.
ls -R /cache reveals subdirectories (based on the levels=1:2 setting).
As you fetch new blobs, watch for growth in this directory. Over time, older, inactive items are evicted (inactive=1h).

For large-scale reads or many concurrent blob requests, you’ll see the cache fill up to max_size=16g (in the example config). If you need more space, tweak that parameter accordingly in nginx.conf.

Conclusion

By layering a local Nginx cache with Cloudflare’s Tiered Cache and Cache Reserve, you ensure that frequently requested blobs stay hot in memory or on-disk near users. This significantly reduces blob reconstruction overhead on storage nodes and accelerates response times for all clients. As Walrus scales and larger files become more common, these caching best practices help maintain low-latency, high-throughput blob deliveries — critical for any system striving to handle high-volume off-chain storage.

Supercharging Walrus Aggregators with Nginx and Cloudflare Cache Reserve

Introduction

1. Nginx Caching Basics

Site Configuration Example

2. Cloudflare Cache Setup

3. Verifying Cache Hits and Misses (Python + Nginx Checks)

3.1. Use a Python Test Script

3.2. Interpreting Logs & Headers

3.3. Inspecting the `/cache` Directory

Conclusion

Written by Natsai

No responses yet

Supercharging Walrus Aggregators with Nginx and Cloudflare Cache Reserve

Introduction

1. Nginx Caching Basics

Site Configuration Example

2. Cloudflare Cache Setup

3. Verifying Cache Hits and Misses (Python + Nginx Checks)

3.1. Use a Python Test Script

3.2. Interpreting Logs & Headers

3.3. Inspecting the /cache Directory

Conclusion

Written by Natsai

No responses yet

3.3. Inspecting the `/cache` Directory