Today, I will introduce a system that we call Shadow Proxy, which is released recently in our development environment.

What’s Shadow Proxy?

Shadow Proxy makes an HTTP request, which is copied from a request to production, to staging applications. It means that it is possible to apply requests to staging that are the same quality of production. The main purpose of this system is to check beforehand the impact of modifying an index on MongoDB. Our service has been running for more than two years, and the data size is now enormous. Therefore we have to take care to add/remove/forget a index because it might cause a serious problem on production.

This system consists of Fluentd(td-agent) and its plugins. Fluentd is an open-source data collector. It has robustness and flexibility. td-agent is the stable distribution of Fluentd, which is provided by Treasure Data, Inc.

Architecture

Image

Reverse Proxy

Proxies HTTP requests to production by Nginx. It forwards access logs to Log Aggregator by forward Output Plugin.

Log Aggregator

A hub of the log stream. It receives access logs by forward Input Plugin and forwards logs to Google BigQuery, Amazon S3 and Shadow Proxy by BigQuery Output Plugin, S3 Output Plugin and forward Output Plugin.

Shadow Proxy

Receives access logs from Log Aggregator and makes HTTP requests from logs by http shadow Output Plugin.

We forked the original and added some features.

  • Make request body and cookie sendable
  • Make the rate of sending for each target host adjustable
  • More details are to see here.

td-agent.conf of Shadow Proxy is like below.

<source>
  type forward
  port 24224
</source>

<match nginx.access>
  type http_shadow
  protocol_format https

  host_key domain
  path_format ${path}
  method_key method
  cookie_key cookie
  body_key request_body

  # Resolve staging hosts
  host_hash {
    "a-production-app-1.quipper.com": "a-staging-app-1.quipper.com",
    "a-production-app-2.quipper.com": "a-staging-app-2.quipper.com"
  }

  # Create HTTP Headers
  header_hash {
   "Referer": "${referer}",
   "User-Agent": "${agent}",
   "Authorization": "${authorization}",
   "Content-Type": "${content_type}"
  }

  # Load balancing
  max_concurrency 2
  timeout 5
  rate 100
  rate_per_host_hash {
    "a-staging-app-1.quipper.com": 25,
    "a-staging-app-2.quipper.com": 100
  }
  followlocation false

  num_threads 24
  flush_interval 1s
  buffer_type file
  buffer_path /var/log/td-agent/buffer/http_shadow

  retry_limit 0
  buffer_chunk_limit 256k
  buffer_queue_limit 3600
</match>

When Shadow Proxy receives the following input,

nginx.access: {
  "domain":"a-production-app-1.quipper.com",
  "method":"POST",
  "path":"/some/path",
  "referer":"https://a-production-app-1.quipper.com",
  "agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36",
  "content_type":"application/json",
  "cookie":"name1=value1, name2=value2",
  "authorization":"xxxxxxxxxxxxxxxxxxxx",
  "request_body":"{\\x22key1\\x22:\\x22value1\\x22,\\x22key2\\x22:\\x22value2\\x22}"
}

it will send the following HTTP request to a-staging-app-1.quipper.com.

POST /some/path HTTP/1.1
Host: a-staging-app-1.quipper.com
Referer: https://a-production-app-1.quipper.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
Authorization: xxxxxxxxxxxxxxxxxxxx
Content-Type: application/json
Cookie: name1=value1, name2=value2
{"key1":"value1","key2":"value2"}

Summary

  • We released Shadow Proxy to check the impact on MongoDB before an index on production is modified.
  • Shadow proxy makes an HTTP request from Nginx access log in production by Fluentd plugin.
  • Fluentd is really useful!