Logging Rules

What are logging rules?

With resurface.io, usage logging is always done in the context of a set of rules. These rules describe when consent has been given to collect user data, what kinds of data may be collected, and how sensitive fields must be masked. All rules are applied within a logger before any usage data is sent.

Rules can perform many different actions:

Rules are expressed in code, like a regular part of your application, and so can easily be kept in sync and validated with your app as it changes. Rules are portable between logger implementations in different languages, so they can be shared across your organization.

Best of all, you don't have to be a programmer to create or manage rules for your applications. Rules are expressed with a simple syntax described here.


Predefined rules

The easiest way to configure rules for a logger is by including a predefined set of rules. This is done with an 'include' statement that gives the name of the set of rules to load. This example includes the current default rules as a starting point.


include default

Predefined rules cannot be modified, but they can be extended by adding more rules. The next example includes default rules and randomly keeps 10% of all logged messages.


include default
sample 10

As in the example above, you'll often start with a set of predefined rules and then add more rules specific to your applications. Next we'll dive into all of the predefined sets of rules -- strict, debug and standard -- and when to use each of them.

Strict rules

This predefined set of rules logs a minimum amount of detail, similar to a traditional weblog. Interesting details like body content and request parameters and most headers are dropped. You're unlikely to need additional rules to avoid logging sensitive user information, but the trade-off is that not many details are actually retained.

Strict rules are applied by default, either when no rules are specified or when 'include default' is used for most configurations. Redefining the meaning of 'include default' can be done through the logger API for advanced configurations -- but unless you've done so, 'include default' and 'include strict' will have the same meaning.


include strict

OR

include default   # strict unless redefined

Actions taken by strict rules:

Debug rules

This predefined set of rules logs every available detail, including user session fields, without any filtering or sensitive data protections at all. Debug rules are helpful for application debugging and testing, but are not appropriate for real environments with real users.


include debug

Actions taken by debug rules:

Standard rules

This predefined set of rules logs as much request and response data as possible, while protecting against common forms of personally identifiable information. Standard rules are helpful when strict rules don't provide enough detail and debug rules don't provide enough protection. However, standard rules may not block all sensitive data fields automatically -- so test first to see if additional rules are needed for your application.


include standard

Actions taken by standard rules:


Rule syntax

Evantually you'll want to go beyond predefined includes and start creating custom rules that are tailored to your applications. Custom rules are defined using the syntax described here.

A set of custom rules is a block of text where:

The example below configures two rules and has some helpful comments. Here the 'sample' rule takes parameter '10', while the 'skip_compression' rule takes no parameters.


# example of custom rules

sample 10         # keep 10% at random
skip_compression  # reduce CPU time

Because comments and whitespace are ignored and order of rules is not significant, this next set of rules has exactly the same meaning as the previous example.


skip_compression
        sample     10

All the simplest rules -- allow_http_url, include, sample, and skip_compression -- take zero or one string parameters, depending on how the rule is defined.

Regular expressions

To create more interesting rules, we rely on regular expressions. These are very flexible and efficient for matching and transforming strings. Regular expressions are also portable between languages, which is ideal for sharing rules across loggers in different languages.

Regular expressions admittedly require some training for the uninitiated, but are far easier to learn than a full-blown programming language. (and we provide lots of helpful examples for you to copy!)

The following examples are of regular expressions delimited with slashes.


/.*/       # match any value
/foo.*/    # starts with foo
/.*foo.*/  # contains foo
/.*foo/    # ends with foo

In our syntax, regular expressions can be written using one of several delimiters:   /  ~  !  %  |


/foo.*/   # starts with foo
~foo.*~   # starts with foo
!foo.*!   # starts with foo
%foo.*%   # starts with foo
|foo.*|   # starts with foo

If a delimiter character appears in a regular expression, then it must be escaped with a preceding backslash. This is where having a choice of delimiters is helpful, as you can pick the one that requires the least amount of escaping. This is great for matching against structured content like JSON or XML or HTML that have different conventions for escaping special characters.


# with an escaped delimiter (yuck!)
/A\/B/  # match 'A/B'

# with a different delimiter (better!)
|A/B|   # match 'A/B'

Simple rules like copy_session_field take a single regular expression as a parameter, where keyed rules take multiple regular expressions as parameters.

Keyed rules

These rules are the most powerful since they act directly on details of a logged message. A message is internally represented as a list of key/value pairs, which is the same structure used for JSON encoding. The following is an example of the key/value pairs for a messsage.


Key string                       Value string
-------------------------------  -------------------------------------
request_method                   GET
request_url                      http://localhost:5000/?action=new
request_header:user-agent        Mozilla/5.0...
request_param:action             new
response_code                    200
response_header:content-type     text/html; charset=utf-8
response_header:content-length   8803
response_body                    <!DOCTYPE html><html>...
session_field:session_id         8687e4ba9

Keyed rules are those where the first parameter is always a regular expression against a key string. This special regular expression always appears to the left of the name of the rule. These rules will only be evaluated against details where the left-hand regular expression matches the key string.

The following example is a rule that deletes response body details but has no effect on other details.


/response_body/ remove

If the keyed rule takes additional parameters, these appear to the right of the name of the rule, like any regular parameter. The following example is a rule that takes a second regular expression as a parameter.


# remove response bodies containing foo
/response_body/ remove_if /.*foo.*/

Keyed rules are the largest category of rules, featuring: remove, remove_if, remove_if_found, remove_unless, remove_unless_found, replace, stop, stop_if, stop_if_found, stop_unless, stop_unless_found

Rule ordering

Rules can be declared in any order. There is no special priority given to rules declared earlier versus later, nor to rules loaded by an include statement versus declared inline. Rules are always run in a preset order that gives ideal logging performance.

Why is this so crucial? Because if rules were run in declared order, this would force users to remember many important optimizations. Any rule that relies on a partial match (like remove_if_found) should be done before similar rules matching an entire value (like remove_if). Any sampling should be done only after all stop rules have run. Any replace rules are the slowest and should be run last. (and so on) It would be very difficult to create efficient sets of custom rules if ordering was not automatically optimized.

The following algorithm is applied every time a HTTP request/response is logged:

Most rules (with the exception of sample) can appear more than once within a set of rules. This is helpful for some complex expressions that would not be possible otherwise. When multiple rules with the same name are present, they all will be run by the logger, but their relative order is not strictly guaranteed.


Supported rules

All available rules are listed below in alphabetical order.

allow_http_url

By default, loggers will refuse to send messages over HTTP, as this is not secure. Add this rule to allow logger URLs with HTTP to be configured, but be advised this should never be used in real production environments.


allow_http_url
include standard

copy_session_field

This copies data from the active user session into the outgoing message. Only session field names that match the specified regular expression will be copied. Session data is copied before any other rules are run, so that stop and replace rules can inspect session fields just like any detail from the request or response. When no user session is active, nothing will be done.


# copy any available fields
copy_session_field /.*/

# copy any fields starting with 'foo'
copy_session_field /foo.*/

remove

This removes any detail from the message where the specfied regular expression matches its key. The value associated with the key is not checked. If all details are removed, the entire message will be discarded before doing any further processing.


# block cookie headers
/request_header:cookie/ remove
/reponse_header:set-cookie/ remove

remove_if

This removes any detail from the message where the first regular expression matches its key, and the second regex matches its entire value. If all details are removed, the message will be discarded.


# block response body if directed by comment
/response_body/ remove_if |<html>.*<!--SKIP_LOGGING-->.*|

remove_if_found

This removes any detail from the message where the first regular expression matches its key, and the second regex is found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.


# block response body if directed by comment
/response_body/ remove_if_found |<!--SKIP_LOGGING-->|

remove_unless

This removes any detail from the message where the first regular expression matches its key, but the second regex does not match its entire value. If all details are removed, the message will be discarded.


# block response body without opt-in comment
/response_body/ remove_unless |<html>.*<!--DO_LOGGING-->.*|

remove_unless_found

This removes any detail from the message where the first regular expression matches its key, but the second regex is not found at least once in its value. This is faster than matching against the entire value. If all details are removed, the message will be discarded.


# block response body without opt-in comment
/response_body/ remove_unless_found |<!--DO_LOGGING-->|

replace

This masks sensitive user information that appears in message. When the first regular expression matches the key of a message detail, all instances of the second regex in its value will be found and replaced. The third parameter is the safe mask string, which can be just a static value or an expression that includes backreferences. (Please note backreferences are specified in a language-specific manner)


# chop out long sequence of numbers from all details
/.*/ replace /[0-9\.\-\/]{9,}/, /xyxy/

# chop url after first '?' (Node & Java)
/request_url/ replace /([^\?;]+).*/, |$1|

# chop url after first '?' (Python & Ruby)
/request_url/ replace /([^\?;]+).*/, |\\1|

sample

This discards messages at random while attempting to keep the specified percentage of messages over time. The percentage must be between 1 and 99. Sampling is applied only to messages that were not intentionally discarded by any form of stop rule.


include standard
sample 10

NOTE: Unlike most rules, 'sample' may appear only once in a set of rules.

skip_compression

This disables deflate compression of messages, which is ordinarily enabled by default. This reduces CPU overhead related to logging, at the expense of higher network utilization to transmit messages.


include standard
skip_compression

stop

This discards the entire message if the specified regular expression matches any available key. The value associated with the key is not checked.


# block messages if requested via header
/request_header:nolog/ stop

stop_if

This discards the message if the first regular expression matches an available key, and the second regex matches its entire value.


# block messages if directed by body comment
/response_body/ stop_if |<html>.*<!--STOP_LOGGING-->.*|

stop_if_found

This discards the message if the first regular expression matches an available key, and the second regex is found at least once in its value. This is faster than matching against the entire value.


# block messages if directed by body comment
/response_body/ stop_if_found |<!--STOP_LOGGING-->|

stop_unless

This discards the message if the first regular expression matches an available key, but the second regex fails to match its entire value. If several of these rules are present, then all must be satisfied for logging to be done.


# block messages without url opt-in
/request_url/ stop_unless |.*/fooapp/.*log=yes.*|

stop_unless_found

This discards the message if the first regular expression matches an available key, but the second regex fails to be found at least once in its value. This is faster than matching against the entire value. If several of these rules are present, then all must be satisfied.


# block messages without url opt-in
/request_url/ stop_unless_found |log=yes|

Limitations