JSON Format

Why use JSON?

Because JSON affords some big advantages as a foundation for open data formats like ours.

Nearly every modern programming language and data processing system provides JSON support without any extra libraries or special dependencies. In many cases building and parsing JSON is actually done via native routines, which are efficient and fast.

JSON is easily readable by humans, is highly compressible, and requires little extra encoding compared with other open formats like XML. Best practices for processing JSON are generally well understood by developers.

Certainly there are other binary formats, like protobuf and BSON, that offer better runtime efficiency. But these are harder to consume, especially by humans. None of these alternatives are universally available like JSON, so their dependencies can potentially conflict with your app's existing dependencies. Given all these factors, we think JSON strikes the right balance with decent efficiency and excellent ease of use.


JSON grammar

JSON itself defines much about how usage data will be formatted. What we need to define here is a basic grammar for a few data structures that are specific to usage logging.

stream
    [ messages ]

messages
    message
    message, messages

message
    [ message-details ]

message-details
    message-detail
    message-detail, message-details

message-detail
    [ "<key-string>", "<string>" ]


Key strings

All key strings are formatted based on the type of key, and whether the key includes an identifying name. Keys with names may appear multiple times in a message, where keys without names are expected to appear only once. The most interesting keys are for HTTP logging, where others provide information about the loggers.


Key string Meaning of value
request_body HTTP body content (in plain text)
request_header:<name> HTTP header for name
request_method HTTP method string
request_param:<name> Param from HTTP post data or URL
request_url HTTP URL as seen by app
response_body HTTP body content (in plain text)
response_code Return code as string (values <300 or 302 only)
response_header:<name> HTTP header for name
session_field:<name> User session field for name
agent Logger agent string (read only)
now Unix time in millis (read only)
version Logger version string (read only)

By convention, key strings are always all lowercase (including the name portion). This is more convenient when consuming this format and when writing your own logging rules.


JSON examples

This first example is a stream with one message (one HTTP request/response). Most of the details have been removed, but basic information about the request/response are still present. This is a good example of the minimum amount of data to expect for each HTTP request/response.

[
["request_method","GET"],
["request_url","http://localhost:5000/"],
["request_header:user-agent","Mozilla/5.0..."],
["response_code","200"],
["response_header:content-type","text/html; charset=utf-8"],
["response_header:content-length","8803"],
["now","1518492245245"],
["agent","http_logger.rb"],
["version","1.9.0"]
]

Now we'll add a second more complicated HTTP request/response to our example stream. Note that the second message is appended to the stream after the first message (using a comma as a separator). For the second request, a JSON document is posted to the application, which responded with a HTML document. A lot more details are retained here, showing how much debugging information can potentially be logged about each request/response. Logging rules control how many details are kept and how many are discarded.

[

[
["request_method","GET"],
["request_url","http://localhost:5000/"],
["request_header:user-agent","Mozilla/5.0..."],
["response_code","200"],
["response_header:content-type","text/html; charset=utf-8"],
["response_header:content-length","8803"],
["now","1518492245245"],
["agent","http_logger.rb"],
["version","1.9.0"]
],

[
["request_method", "POST"],
["request_url","http://localhost:5000/?action=new"],
["request_body", "{ \"customerID\" : \"1234\" }"],
["request_header:version","HTTP/1.1"],
["request_header:host","localhost:5000"],
["request_header:connection","keep-alive"],
["request_header:cache-control","max-age=0"],
["request_header:upgrade-insecure-requests","1"],
["request_header:user-agent","Mozilla/5.0..."],
["request_header:accept","text/html,application/xhtml+xml,application/xml"],
["request_header:accept-encoding","gzip, deflate, br"],
["request_header:accept-language","en-US,en;q=0.9"],
["request_header:cookie","_ruby-getting-started_session=MTFxM0tmZG"],
["request_header:if-none-match","W/\"70bd4196dfa68808be58606609ed8357\""],
["request_param:action","new"]
["response_code","200"],
["response_header:x-frame-options","SAMEORIGIN"],
["response_header:x-xss-protection","1; mode=block"],
["response_header:x-content-type-options","nosniff"],
["response_header:content-type","text/html; charset=utf-8"],
["response_header:etag","W/\"1467037e1e8\""],
["response_header:cache-control","max-age=0, private, must-revalidate"],
["response_header:set-cookie","_ruby_session=WHZtbllOcU...; path=/; HttpOnly"],
["response_header:x-request-id","2209f8b1-ed2f-420c-9941-9625d7308583"],
["response_header:x-runtime","0.314384"],
["response_header:content-length","8803"],
["response_body","<!DOCTYPE html>\n<html>\n<head>\n <title>Ruby Getting Started</title>\n\n</head>\n<body>...</body>\n</html>\n"],
["session_field:session_id","8687e4ba9"],
["session_field:_csrf_token","nMI/JGb4GB"],
["now","1518492826956"],
["agent","http_logger.rb"],
["version","1.9.0"]
]

]


Processing with jq

jq is a command-line JSON processor that can read, convert and transform usage data in many different ways. While jq is powerful and relatively easy to use, it admittedly has a steep learning curve. The examples below will be helpful if you aren't already a jq guru.

A typical use of jq is to process a message stream (previously saved to a local file) based on a given filter string. The filter string will vary based on how the resulting data is to be processed.

jq '<filter>' stream.txt > results.txt

Usage data from our demo environment can be piped into jq with curl or wget.

curl -s https://demo.resurface.io/listener/<id>/messages | jq '<filter>' > results.txt

Summarizing messages

The jq filters below produce summary data based on all messages in the stream.

# count number of messages
[ . | length ]

# calculate average number of key-value pairs per message
[ .[] | length ] | (add / length)

# list all unique urls (sorted)
[ .[][] | select(.[0]=="request_url")[1] ] | unique | sort

# list all unique urls with count (reverse sorted)
[ .[][] | select(.[0]=="request_url")[1] ] | group_by(.) | map({group: .[0], count: . | length}) | sort_by(-.count)

Selecting messages

The jq filters below limit the results to only including messages that meet the specified criteria. Filtering can be done against any available message keys or values, and can include complex regular expressions.

# select messages with matching keys
[ .[] | select(any(.[0]=="response_header:content-length")) ]
[ .[] | select(any(.[0] | startswith("session"))) ]

# select messages with any matching value
[ .[] | select(any(.[1]=="GET")) ]
[ .[] | select(any(.[1] | startswith("GET"))) ]
[ .[] | select(any(.[1] | test(".*/check.html"))) ]

# select messages with specific key and value
[ .[] | select(any((.[0]=="request_method") and (.[1]=="GET"))) ]
[ .[] | select(any((.[0]=="request_method") and (.[1] | startswith("GET")))) ]
[ .[] | select(any((.[0]=="request_url") and (.[1] | test(".*/check.html")))) ]
[ .[] | select(any((.[0]=="request_header:host") and (.[1] | test("localhost.*")))) ]

Converting to CSV

The jq filter below produces a CSV file that can be imported into any spreadsheet or database. Add a get() call for each specific message detail to include in the CSV file, in the order they should appear.

def get($f): reduce .[] as $x (""; if $x[0]==f then $x[1] else . end); .[] | [ get("request_method"), get("request_url"), get("request_header:user-agent"), get("response_code"), get("response_header:content-type"), get("response_header:content-length") ] | @csv