Translation

beng-proxy knows two ways to locate the resource a request URI points to:

  • via an external translation server

  • static translation

The latter is only for debugging. The URI path is appended to the document root (/var/www by default). For security (by obscurity) reasons, beng-proxy has no code for generating directory listings. If the request has a trailing slash, beng-proxy looks for a file named index or index.html and serves it. Without the trailing slash, beng-proxy refuses to handle the request.

The translation server should be the default on production servers. It is a daemon on the same physical machine which does all the translation work for us. beng-proxy connects to a Unix socket to contact this translation server.

A request may consist of several micro commands. The request is initialized with the BEGIN command, which is followed by any number of commands which provide parameters. After all parameters have been transferred, the client sends the END command, and waits for the server’s response.

The client can send any number of requests over the socket until one side closes the connection.

Example conversation

  1. client sends BEGIN “\x03”

  2. client sends REMOTE_HOST “192.168.1.77:1234”

  3. client sends HOST “www.example.com”

  4. client sends URI “/foo/index.html”

  5. client sends END

  6. server sends BEGIN “\x01”

  7. server sends PATH “/var/www/foo/index.html”

  8. server sends CONTENT_TYPE “text/html; charset=utf8”

  9. server sends PROCESS

  10. server sends END

Command packets

The protocol is binary and uses host byte order. A command packet may look like this in pseudo C:

struct beng_proxy_translate_packet {
    uint16_t length;
    uint16_t command;
    char payload[length];
};

The length only refers to the payload. The maximum supported payload size is 65535 bytes.

Most parameters are ASCII strings; in this case, the payload contains just the raw string, without terminating zero.

Request

  • BEGIN: Begins the request. The payload is a 8-bit unsigned integer specifying the protocol version. The protocol version described here is 3.

  • END: Finishes the request.

  • LISTENER_TAG: The tag of the listener (as specified in the listener configuration section) that accepted the connection.

  • REMOTE_HOST: the client’s address or host name and the port number (as string) (This packet optional and is only submitted if requested via WANT, see page )

  • HOST: the Host HTTP request header

  • URI: the raw URI from the HTTP request (without the query string)

  • QUERY_STRING: the query string from request URI, without the question mark (This packet optional and is only submitted if requested via WANT, see page )

  • SESSION: a session identifier generated by the translation server, see section Sessions

  • REALM_SESSION: Like SESSION, but realm-local. Unlike SESSION, it is only sent under certain conditions (e.g. in TOKEN_AUTH requests), because the realm is only known after the regular translation response has been applied already.

  • PARAM: a parameter passed by the browser

  • USER_AGENT: the User-Agent request header sent by the client (not in the widget registry) (This packet optional and is only submitted if requested via WANT, see page )

  • USER: the user name currently logged in using AUTH; see page (This packet optional and is only submitted if requested via WANT, see page )

  • LANGUAGE: the Accept-Language request header sent by the client (not in the widget registry) (This packet optional and is only submitted if requested via WANT, see page )

  • AUTHORIZATION: the Authorization request header sent by the client (see RFC 2617); only for HTTP-level Authentication.

  • CONTENT_TYPE_LOOKUP: Look up the Content-Type of a file name suffix. See Content-Type Lookup for a detailed description.

  • SUFFIX: The file name suffix without the dot for CONTENT_TYPE_LOOKUP. See Content-Type Lookup for a detailed description.

  • ERROR_DOCUMENT: a resource has failed, and the translation server is asked to provide the location of the error document. This is followed by the packets URI and STATUS. See Error documents for a detailed description.

  • PROBE_PATH_SUFFIXES: Result of PROBE_PATH_SUFFIXES. This is an echo of the PROBE_PATH_SUFFIXES from the previous translation response. If a file with one of the given suffixes exists, then PROBE_SUFFIX specifies the first existing suffix. If no PROBE_SUFFIX follows, then no file was found.

  • PATH_EXISTS: This is an echo of PATH_EXISTS from the previous translation response, accompanied by STATUS describing whether the given file exists.

  • FILE_NOT_FOUND: The specified file does not exist. The translation server is asked to provide an alternate translation. This is an echo of the FILE_NOT_FOUND from the previous translation response.

  • ENOTDIR: The specified file does not exist, but a portion of the path points to a regular file. This is an echo of the ENOTDIR packet from the previous translation response. The given URI has been shortened: the last slash and what follows has been moved to PATH_INFO. This may be repeated until the regular file has been found.

  • DIRECTORY_INDEX: The specified file is a directory. The translation server is asked to provide an alternate translation. This is an echo of the DIRECTORY_INDEX from the previous translation response.

  • WANT: causes beng-proxy to submit the same translation request again, with this packet echoed plus the requested packets. The payload is an array of 16-bit integers with requested packet ids. The following packets are allowed/supported here: LISTENER_TAG, REMOTE_HOST, USER_AGENT, USER, LANGUAGE, ARGS, QUERY_STRING

  • WANT_FULL_URI: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to beng-proxy), and with the full request URI (including semicolon-arguments and the follow-up suffix, but excluding the query string).

  • INTERNAL_REDIRECT: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to beng-proxy). However, instead of the original request URI, beng-proxy uses the one from this responses’s URI or EXPAND_URI packet.

  • CHECK: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to beng-proxy). The current response is remembered, to be used when the second response contains the PREVIOUS packet. This can be used to implement authentication (see Authentication).

  • CHECK_HEADER: the CHECK request shall contain the specified request header. Payload is the header name (lower case). For the CHECK request, the payload is the header name and the value separated by a colon; if no such request header exists, the value is empty.

  • AUTH: Indicates that authentication is necessary (see The AUTH packet).

  • READ_FILE: This is a repeated translation in reply to a translation response with a READ_FILE packet. The payload is the file contents or empty if the file does not exist (or if there was another problem reading the file). This packet is implicitly on “vary”.

Response

  • BEGIN: Begins the response. The payload is a 8-bit unsigned integer specifying the protocol version. The initial protocol version is 0.

  • END: Finishes the response.

  • URI: the “real” raw URI from the HTTP request (without the query string); this is used to override the URI, e.g. when beng-proxy is behind another proxy which modifies the URI

  • EXPAND_URI: Override URI with the given value (after expanding).

  • HOST: the host name for generating absolute URLs; default is the Host HTTP request header

  • SCHEME: the scheme for generating absolute URLs; default is http. This packet is useful if beng-proxy is behind stunnel

  • ALLOW_REMOTE_NETWORK: Allow only clients with addresses in the specified network; all other addresses get a “403 Forbidden” response. The payload is a struct sockaddr_in or struct sockaddr_in6 plus one byte specifying the prefix length (in bits). This packet may be sent more than once.

  • UNTRUSTED: sets the “untrusted” host name for this request: only untrusted widgets matching this host name are allowed. Trusted widgets are rejected.

  • STATUS: HTTP status code, encoded as uint16_t; this parameter is usually not used

  • HTTP: load the resource from a remote HTTP server (see HTTP proxying). Payload is an absolute URI starting with http:// or https://.

  • HTTP2: force HTTP/2 for the preceding HTTP packet. No payload.

  • CERTIFICATE: Use the named client certificate for the outbound SSL connection (see CERTIFICATE).

  • PIPE: a local program which reads input from stdin and prints the modified resource on stdout (see Pipe filters).

  • LHTTP_PATH: a local path which is executed as HTTP server

  • LHTTP_URI: the request URI for LHTTP_PATH

  • EXPAND_LHTTP_URI: the regular expression rule for LHTTP_URI

  • LHTTP_HOST: the “Host” request header for LHTTP_PATH

  • CONCURRENCY: a 16 bit integer specifying the maximum number of concurrent requests to this server (FastCGI, LHTTP and Multi-WAS only)

  • PARALLELISM: a 16 bit integer specifying the maximum number of parallel child processes of this kind (FastCGI, WAS, Multi-WAS, LHTTP)

  • DISPOSABLE: Mark the child process as “disposable”, which may give it a very short idle timeout (or none at all). To be used for processes that will likely only be used once.

  • NON_BLOCKING: If present, make the socket passed to a child process non-blocking (LHTTP only currently). This is needed by NodeJS 0.12.

  • CGI: a local path which is executed as CGI script (see CGI, FastCGI, WAS and Pipe)

  • FASTCGI: a local path which is executed as FastCGI script (see CGI, FastCGI, WAS and Pipe)

  • WAS: a local path which is executed as WAS application (see CGI, FastCGI, WAS and Pipe). May be followed by CONCURRENCY to enable Multi-WAS mode.

  • REDIRECT: another alternative to PATH: redirect the HTTP client to this URL; STATUS must be set to one of the HTTP 3xx codes

  • EXPAND_REDIRECT: Override REDIRECT with the given value (after expanding); see Response.

  • REDIRECT_QUERY_STRING: Append the query string to the given REDIRECT URL.

  • REDIRECT_FULL_URI: Use the full request URI path (including semicolon-arguments and the follow-up suffix, but excluding the query string) for expanding REDIRECT. This packet must be preceded by BASE, EASY_BASE and REDIRECT. It makes sense to combine it with REDIRECT_QUERY_STRING.

  • HTTPS_ONLY: Allow this request to be handled only on encrypted connections (HTTPS with SSL/TLS). If the connection is encrypted, then this is a no-op. If it is not encrypted, the server generates a permanent redirect to https://. The payload may contain a 16 bit integer specifying the port number (zero means default port).

  • BOUNCE: Redirects the browser with a 303 See Other status to this URI, and appends the current absolute URI (form-encoded). This is useful to redirect to another server, which will need to redirect back to the original URI.

  • MESSAGE: Generate a response with the given body (text/plain and US-ASCII).

  • TINY_IMAGE: Generate a response with a tiny (one-pixel GIF) image.

  • EXPAND_PATH: Override the PATH with the given value (applicable to static files, CGI, FastCGI, WAS, HTTP). Backslash references are expanded to the value of the match group of REGEX. In the presence of this packet, the URI suffix after the base will not be appended to other paths. The translation server is responsible for ensuring that the resulting path cannot point to files that are not supposed to be published. beng-proxy disallows /../ sequences in the URI tail string, but it may nonetheless be possible for an attacker to break out if the regular expression and the expansion string are phrased improperly. (Since version 2.0.5)

  • LISTENER_TAG: override the LISTENER_TAG. All following translation requests will feature the new listener tag.

  • SITE: optional identification or name of the site this resource belongs to

  • EXPAND_SITE: provide a cache expansion for the preceding SITE

  • SESSION_SITE: Set a SITE for all requests in the current session. This packet with an empty payload can be used to clear the session’s SITE value.

  • RATE_LIMIT_SITE_REQUESTS: limit the rate of requests to this site. Payload is two 32-bit floats describing the rate and burst for the underlying token bucket. Requests that fail the token bucket get a “429 Too Many Requests” response.

  • RATE_LIMIT_SITE_TRAFFIC: limit the traffic rate of requests to this site. Payload is two 32-bit floats describing the rate [bytes per second] and burst [bytes] for the underlying token bucket. Requests that fail the token bucket get a “429 Too Many Requests” response.

  • DOCUMENT_ROOT: base directory of the site; may also be passed after a CGI command, to set the document root only for this CGI

  • FILTER: the next resource address (HTTP, CGI) will denote an output filter, see section Filters for details.

  • CHAIN: similar to FILTER, but the translation server is asked again after the current response has been generated. See section Chains for details.

  • CACHE_TAG: Mark a cache item with this tag (an opaque string). This can be used to flush/invalidate groups of cache items in one control command. The following parts of the response can be tagged:

    • After FILTER: for filter cache items, to be used with FLUSH_FILTER_CACHE.

    • After a HTTP resource address (e.g. HTTP, FASTCGI, WAS): for HTTP cache items, to be used with FLUSH_HTTP_CACHE.

    • Prior to any of the above: for the whole translation response (i.e. the translation cache item), to be used with TCACHE_INVALIDATE.

  • REVEAL_USER: If present after FILTER, then the filter will see X-CM4all-BENG-User as an additional request header (if a user is logged in).

  • FILTER_4XX: Enable filtering of client errors (status 4xx). Without this flag, only successful responses (2xx) are filtered. Only useful when at least one FILTER was specified.

  • PROCESS: enables the beng-proxy processor, see section The Beng Template Language

  • PROCESS_TEXT: enables the beng-proxy text processor (Since version 1.3.2)

  • PROCESS_CSS: enables the beng-proxy CSS processor

  • DOMAIN: the domain name for partitioned frames

  • SESSION: a session identifier generated by the translation server, see section Sessions

  • RECOVER_SESSION: A token to be stored in a browser cookie which can later be used by the translation server to recover the current session. In particular, it will be sent back to the translation server in a Token Authentication request.

  • ATTACH_SESSION: Attach to an existing session (or mark this session to be attached by others with the same identifier). The payload is a non-empty unique identifier for sessions to be attached/merged. This value can also be used to discard the session using the DISCARD_SESSION control packet.

  • USER: the user name associated with this session

  • REALM: a realm name for this session. An existing session matches only if its realm matches the current request’s realm; on mismatch, a new session with the same public id is created for this realm. If this packet is not specified in the translation response, then the “Host” request header is used.

  • REALM_FROM_AUTH_BASE: Copy the AUTH or AUTH_FILE contents to REALM (i.e. without APPEND_AUTH).

  • TRANSPARENT: Transparent proxy: forward URI path segment params to the request handler instead of using them. This disables legacy handling of these params (which was used to control widget rendering).

  • LANGUAGE: overrides the Accept-Language request header for this session

  • DISCARD_SESSION: discard the current browser session

  • DISCARD_REALM_SESSION: Like ``DISCARD_SESSION`, but discard only the part of the session that is specific to the current realm (see t_realm).

  • SECURE_COOKIE: Set the “secure” flag on the session cookie.

  • SESSION_COOKIE_SAME_SITE: Set the “SameSite” attribute on the (realm) session cookie. Valid payloads are strict, lax and none (all lower case).

  • CHDIR: change the working directory (after namespace setup).

  • HOME: home directory of the account this site belongs to; will be mounted in the jail; defaults to DOCUMENT_ROOT

  • EXPAND_HOME: Expansion for HOME.

  • ADDRESS: after each HTTP packet, there must be one or more ADDRESS packets which specify the resolved addresses. The payload of each is a struct sockaddr.

  • STICKY: Make the resource address “sticky”, i.e. attempt to forward all requests of a session to the same worker.

  • VIEW: starts a new view; the body of the packet is the name of the view (ASCII letters, digits, underscore, dash only). Each view can have different address/processor/filter settings. The first view (the one before the first VIEW packet) is the default and has no name.

  • MAX_AGE: a 32 bit unsigned integer specifying the number of seconds the preceding piece of information is valid without having to revalidate. A value of 0 specifies that beng-proxy should not remember this value at all. Without this packet, the maximum age is not limited. Currently, this is only supported for the following packets:

    • BEGIN (refers to the whole translate response)

    • USER

  • VARY: similar to the HTTP Vary response header; the payload contains an array of translation request commands which this response depends upon.

    The following request packets are currently supported: PARAM, SESSION, LISTENER_TAG, REMOTE_HOST, HOST, LANGUAGE, USER_AGENT, QUERY_STRING, USER, INTERNAL_REDIRECT, ENOTDIR.

    The following request packets are on “vary” implicitly: WIDGET_TYPE, CONTENT_TYPE_LOOKUP, URI, STATUS, CHECK, WANT_FULL_URI, PROBE_PATH_SUFFIXES, PROBE_SUFFIX, PATH_EXISTS, FILE_NOT_FOUND, DIRECTORY_INDEX, WANT.

  • INVALIDATE: Invalidates existing translation cache items which depend on some of the request values. The payload has the same format as VARY. Additionally, the URI command is supported, to invalidate all items pointing to the request URI, and SITE to invalidate all items with the given site name.

    If you specify more than one command, all must match. If you list a command which was not specified in the request (or a command which is not supported here), nothing will be deleted.

    Example: INVALIDATE on SESSION invalidates all cache items for the current session.

  • REQUEST_HEADER_FORWARD: See Forwarding HTTP Headers

  • RESPONSE_HEADER_FORWARD: See Forwarding HTTP Headers

  • WWW_AUTHENTICATE: the WWW-Authenticate response header sent to the client (see RFC 2617). Currently, this is never cached. This exact behavior is subject to change in the future, and will be cacheable.

  • AUTHENTICATION_INFO: the Authentication-Info response header sent to the client (see RFC 2617).

  • HEADER: A custom HTTP response header sent to the client. Name and value are separated by a colon (without any whitespace). This will not override existing headers. It is not allowed to set hop-by-hop headers (RFC 2616 13.5.1) this way. This packet shall only be a last resort, when there is no other way to set a required response header.

  • EXPAND_HEADER: Same as HEADER, but expand the value.

  • REQUEST_HEADER: A custom HTTP request header for the backend server. Name and value are separated by a colon (without any whitespace). This will override existing headers. It is not allowed to set hop-by-hop headers (RFC 2616 13.5.1) this way.

  • EXPAND_REQUEST_HEADER: Same as REQUEST_HEADER, but expand the value.

  • CONTENT_TYPE_LOOKUP: Indicates that the translation server is willing to look up Content-Type by file name suffix. See Content-Type Lookup for a detailed description.

  • ERROR_DOCUMENT: Indicates that the translation server is willing to provide a custom error document. See Error documents for a detailed description.

  • PROBE_PATH_SUFFIXES: Check if the TEST_PATH (or EXPAND_TEST_PATH) plus one of the suffixes from PROBE_SUFFIX exists (regular files only). beng-proxy will send another translation request, echoing this packet and echoing the PROBE_SUFFIX that was found. This packet must be followed by at least two PROBE_SUFFIX packets.

  • PATH_EXISTS: Check if the given PATH exists; the translation shall be repeated, echoing this packet accompanied by a STATUS packet describing whether the given file exists (200 or 404).

  • FILE_NOT_FOUND: Indicates that the translation server would like to provide an alternate translation when the specified file does not exist. beng-proxy will repeat the translation request with this packet echoed. This is supported by the following address types: PATH, CGI, FASTCGI, WAS, LHTTP_PATH.

  • ENOTDIR: Indicates that the translation server would like to provide an alternate translation when the specified file does not exist, but a portion of the path points to a regular file.

  • DIRECTORY_INDEX: Indicates that the translation server would like to provide an alternate translation when the specified file is a directory. beng-proxy will repeat the translation request with this packet echoed.

  • DIRECTORY_INDEX_SLASH: If DIRECTORY_INDEX applies but the request URI path does not end with a slash, automatically send a redirect appending the slash.

  • TEST_PATH: Test the specified file. If this packet is not present, then the path from the resource address is used (PATH, CGI, FASTCGI, LHTTP_PATH). Affects the packets FILE_NOT_FOUND, DIRECTORY_INDEX, ENOTDIR.

  • EXPAND_TEST_PATH: Override the TEST_PATH with the given value. Backslash references are expanded to the value of the match group of REGEX. (Since version 4.0.34)

  • COOKIE_DOMAIN: Set the session cookie’s “Domain” attribute.

  • COOKIE_HOST: Override the cookie host name. This host name is used for storing and looking up cookies in the jar. It is especially useful for protocols that don’t have a host name, such as CGI.

  • EXPAND_COOKIE_HOST: Expansion for COOKIE_HOST.

  • COOKIE_PATH: Override the cookie’s Path attribute. This is sent to the client when beng-proxy generates a new session cookie. Be careful with overlapping locations that create conflicting cookies.

  • VALIDATE_MTIME: A cached response is valid only if the file specified in this packet is not modified. The first 8 bytes is the mtime (seconds since UNIX epoch), the rest is the absolute path to a regular file (symlinks not supported). The translation fails when the file does not exist or is inaccessible. The special value 0 matches only when the file does not exist; as soon as the file appears, the cached response will be discarded.

  • READ_FILE: Asks beng-proxy to read the specified (small) file and submit another translation request with the file contents in another READ_FILE packet.

  • EXPAND_READ_FILE: Expansion for READ_FILE.

  • DEFER: Defer the request to the next translation server.

  • PREVIOUS: Tells beng-proxy to use the resource address of the previous translation response. Only allowed if the request contains a CHECK or AUTH packet.

  • UNCACHED: Disable the HTTP cache for the given resource address.

  • IGNORE_NO_CACHE: Ignore the Cache-Control:no-cache request header, i.e. don’t allow the client to circumvent the HTTP cache.

  • EAGER_CACHE: Enable caching for the given resource address, even if it is not declared to be cacheable.

  • DISCARD_QUERY_STRING: Discard the query string from the request URI. This can be combined with EAGER_CACHE to prevent cache-busting with random query strings.

  • NO_QUERY_STRING: No query string is allowed/supported on this request URI. The webserver is allowed to reject requests with a query string.

  • AUTO_FLUSH_CACHE: All (successful) modifying requests (POST, PUT …) flush the HTTP cache of the specified CACHE_TAG.

  • GENERATOR: A short symbolic identifier (alphanumeric, underscore, dash) for the entity that generates the HTTP response (according to the rest of this translation response). If non-empty, then this will set the GENERATOR attribute in access log datagrams. Without this packet, the value of the X-CM4all-Generator response header is used.

To send a standard error page, the translation server sends a response containing only the STATUS parameter with the desired HTTP status.

Sending a packet twice is regarded an error. It cannot be used to override a previous value.

Caching

Almost all translation responses must be cacheable. The following response packets allow reusing cache items for different requests:

  • LIKE_HOST: Repeat the translation, but with the specified HOST value (which can be an artificial name, even one which is not RFC-valid). This allows sharing the translation cache between different hosts. It can be combined with BASE and REGEX to share only a part of the URI location space.

  • BASE: Defines a realm in the URI space. The payload specifies the URI prefix (of the original request URI, ending with a slash) which contains this realm. All resources in this realm can be addressed by beng-proxy with a trivial pattern: append the relative URI (within the realm) to the resource address (e.g. the PATH, HTTP or PATH_INFO value).

    The address in this response applies to request URI, not the base URI (to allow backwards compatibility with translation clients which do not support this packet).

    Example: in the request, URI is /foo/bar/index.html; in the response, PATH is /var/www/foo/bar/index.html and BASE is /foo/. The beng-proxy translation cache now knows: if a request on /foo/test.png is received, it can serve /var/www/foo/test.png without querying the translation server.

  • UNSAFE_BASE: Modifier for BASE: omit the security checks. This allows /../ to be part of the remaining URI, possibly allowing clients to break out of the given directory.

  • EASY_BASE: Modifier BASE which aims to simplify its usage: the resource address given in the response refers to the BASE, not to the actual request URI. It is important to include the trailing slash which is part of BASE in the resource address (e.g. BASE=”/foo/”, PATH=”/var/www/foo/”). beng-proxy applies the URI suffix before handling the HTTP request.

  • REGEX: Reuse a cached response only if the request URI matches the specified regular expression (Perl compatible, anchored). This works only when a BASE was specified. (Since version 1.3.2)

  • INVERSE_REGEX: Don’t apply the cached response if the request URI matches the specified regular expression (Perl compatible, anchored). (Since version 1.3.2)

  • REGEX_TAIL: Apply REGEX and INVERSE_REGEX to the URI suffix following BASE instead of the whole request URI. (Since version 4.0.21)

  • REGEX_RAW: By default, URI paths are normalized when expanding a cached translation response (i.e. mutliple consecutive slashes are compressed to one and occurrences of /./ are compressed to /). This option disables the URI path normalization.

  • REGEX_UNESCAPE: Unescape the URI for REGEX.

  • INVERSE_REGEX_UNESCAPE: Unescape the URI for INVERSE_REGEX.

  • REGEX_ON_HOST_URI: Prepend the Host header to the string used with REGEX and INVERSE_REGEX.

  • REGEX_ON_USER_URI: Prepend the user name (from USER) and a ’@’ to the string used with REGEX and INVERSE_REGEX.

  • LAYOUT: The translation server gives an overview of the URI layout. Its payload is a non-empty opaque value which is mirrored in the next request.

    This packet is followed by one or more URI / BASE / REGEX packets specifying exact URI matches, URI bases or regular expressions which shall not share cache items. The first matching base/regex specfies where translation cache items will be stored; all URIs without a match have their own cache.

    This way, cacheable URI bases can be constructed easily without excessively complex INVERSE_REGEX packets.

    Example for a response after a request to /.cm4all/foo:

    • BASE=/

    • LAYOUT=[opaque]

    • URI=/robots.txt

    • BASE=/.cm4all/private/

    • BASE=/.cm4all/

    • BASE=/.well-known/

    • REGEX=\.php$

    Here, the whole host is separated into three bases (the three which are specified, and everything else). Responses don’t need INVERSE_REGEX to exclude the specified bases.

    The following request will mirror the LAYOUT packet and the matching URI / BASE / ``REGEX` packet:

    • URI=/.cm4all/foo

    • LAYOUT=[opaque]

    • BASE=/.cm4all/

    The server recognizes that this is a follow-up request, and responds:

    • BASE=/.cm4all/

    • EASY_BASE

    • PATH=/var/www/cm4all/

    This response can be cached and reused for everything below /.cm4all/, except for URIs below /.cm4all/private/.

    If LAYOUT is followed by REGEX_TAIL, then all regular expressions (and other URI comparisons) are matched against the tail of the URI after the given BASE. Example LAYOUT response:

    • BASE=/foo/

    • LAYOUT=[opaque]

    • REGEX_TAIL

    • URI=hello.txt

    • BASE=bar/

    • REGEX=\.php$

    In the follow-up request, these are mirrored; for example, after a request to /foo/hello.txt, the next translation request looks like this:

    • URI=/foo/hello.txt

    • LAYOUT=[opaque]

    • URI=hello.txt

    Note how there are now two URI packets: the first one is the actual request URI and the second one mirrors the matching LAYOUT item.

    As a shortcut for implementing CORS, a layout item may be followed by ACCESS_CONTROL_ALLOW_ALL. All matching OPTIONS requests will then lead to an empty response with Access-Control-Allow-{Origin,Methods,Headers}: *. Use this for API endpoints with unrestricted script access to avoid roundtrips to the actual API process.

Static files

See Static files for an explanation of static file resources.

The response packet PATH declares a static file that will be served. The following packets are available:

  • PATH: Absolute path of the local file to be served.

  • EXPAND_PATH: Override the path with the given value (after expanding); see Response.

  • APPEND_PATH: Append this string to the PATH (after applying BASE or EXPAND_PATH).

  • AUTO_BROTLI_PATH: Build the precompressed Brotli path by appending .br to the PATH.

  • GZIPPED: Absolute path of a precompressed version of the file. The file is compressed with gzip. May follow the PATH packet.

  • AUTO_GZIPPED: Build the precompressed path by appending “.gz” to the PATH. Unlike GZIPPED, this is compatible with BASE.

  • AUTO_GZIP: Compress the response on-the-fly if the client accepts the gzip encoding. This consumes a lot of CPU and should only be used for dynamic responses which can be compressed well.

  • AUTO_BROTLI: Compress the response on-the-fly if the client accepts the br encoding. This consumes a lot of CPU and should only be used for dynamic responses which can be compressed well.

  • AUTO_COMPRESS_ONLY_TEXT: apply AUTO_GZIP and AUTO_BROTLI only to text responses.

  • CONTENT_TYPE: MIME type of the file (optional)

  • EXPIRES_RELATIVE: Generate an Expires response header. The payload is a 32 bit integer specifying the number of seconds from now.

  • EXPIRES_RELATIVE_WITH_QUERY: Like EXPIRES_RELATIVE, but this value is only used if there is a non-empty query string. This is useful for serving static files which are usually referenced with a version number in the query string.

  • BENEATH: Absolute path of a directory that the PATH shall not escape, not even using symlinks. This is implemented using the RESOLVE_BENEATH flag of Linux’s openat2() system call.

Proxying requests

When proxying HTTP requests with the a HTTP packet, beng-proxy forwards the request to the specified location (with headers filtered as described in Forwarding HTTP Headers), including the HTTP method and the request body. There is one exception: if PROCESS is enabled and a widget is focused (see Focus), the other HTTP server receives a GET request without a body, because the focused widget is going to receive the request body.

If the filter URL starts with a slash, beng-proxy assumes it is the absolute path to a Unix socket.

CGI, FastCGI, WAS and Pipe

The protocols CGI, FastCGI and WAS can be used to generate or filter resources (see CGI and FastCGI and WAS). A “pipe” can be used as a filter (see Pipe filters). The following packets are used to choose the protocol:

  • CGI: a local path which is executed as CGI script

  • FASTCGI: a local path which is executed as FastCGI script. To connect to an existing FastCGI server, specify one or more ADDRESS packets.

  • WAS: a local path which is executed as WAS application

  • PIPE: a local program which reads input from stdin and prints the modified resource on stdout

The following packets can be used to specify more details:

  • EXPAND_PATH: Override the executable path with the given value (after expanding); see Response.

  • APPEND: appends an argument to the command line

  • EXPAND_APPEND: provide a cache expansion for the preceding APPEND

  • PAIR: adds a FastCGI/WAS parameter in the form KEY=VALUE.

  • EXPAND_PAIR: provide a cache expansion for the preceding PAIR

  • SETENV: adds an environment variable for CGI, FastCGI, WAS or LHTTP in the form KEY=VALUE.

  • EXPAND_SETENV: provide a cache expansion for the preceding SETENV

  • PATH_INFO: optional URI substring which was left after finding the file

  • EXPAND_PATH_INFO: Override the PATH_INFO with the given value. Backslash references are expanded to the value of the match group of REGEX. In the presence of this packet, the URI suffix after the base will not be appended to other paths. (Since version 2.0.4)

  • DOCUMENT_ROOT: set the document root passed to this CGI process

  • EXPAND_DOCUMENT_ROOT: Override the DOCUMENT_ROOT with the given value. Backslash references are expanded to the value of the match group of REGEX. (Since version 6.0)

  • INTERPRETER: run a CGI script with the specified interpreter: invokes the specified interpreter with the mapped file path added as a command-line argument. This can be used to run Perl scripts without setting the “execute” bit.

  • ACTION: run the specified CGI program instead of the mapped file. This program reads the mapped file path from SCRIPT_FILENAME and loads this script. This is modeled after the Apache directive Action, and implements a protocol understood by PHP and COMA.

  • SCRIPT_NAME: the SCRIPT_NAME environment variable for a CGI

  • EXPAND_SCRIPT_NAME: Override the SCRIPT_NAME with the given value. Backslash references are expanded to the value of the match group of REGEX. (Since version 4.0.33)

  • AUTO_BASE: Auto-calculate the BASE from PATH_INFO (only CGI, FastCGI and WAS)

  • REQUEST_URI_VERBATIM: Pass the CGI parameter REQUEST_URI verbatim instead of building it from SCRIPT_NAME, PATH_INFO and QUERY_STRING. (Since version 16.29)

See Resource Limits for how to configure resource limits and Namespaces for how to configure namespaces.

Local HTTP

|l|X|

APPEND: appends an argument to the command line
EXPAND_APPEND: provide a cache expansion for the preceding APPEND

See Resource Limits for how to configure resource limits and Namespaces for how to configure namespaces.

Forwarding HTTP Headers

There are two translation packets which control which HTTP headers are going to be forwarded:

  • REQUEST_HEADER_FORWARD: this packet specifies which request headers are forwarded to the request handler. The payload is a list of group/mode pairs (struct beng_header_forward_packet).

  • RESPONSE_HEADER_FORWARD: same as REQUEST_HEADER_FORWARD, but applies to response headers forwarded to the client.

Group is one of:

  • IDENTITY: headers Via, X-Forwarded-For, X-CM4all-Generator

  • CAPABILITIES: Server, User-Agent, Accept-*

  • COOKIE: Cookie[2], Set-Cookie[2]

  • FORWARD: forward information about the original request/response that would usually not be visible. If set to MANGLE, then Host is translated to X-Forwarded-Host.

  • CORS: forward CORS request/response headers

  • SECURE: forward “secure” request/response headers such as X-CM4all-BENG-User

  • SSL: forward information about the SSL connection, i.e. X-CM4all-HTTPS (set to on if the request was received on a SSL/TLS connection, see SSL/TLS), X-CM4all-BENG-Peer-Subject and X-CM4all-BENG-Peer-Issuer-Subject (see Client Certificates)

  • TRANSFORMATION: forward headers that affect the transformation (i.e. X-CM4all-View)

  • LINK: forward headers that contain links, such as Location, Content-Location and Referer; if set to MANGLE, then beng-proxy attempts to rewrite the Location URI relative to itself

  • AUTH: forward HTTP authentication headers (e.g. basic/digest auth), such as WWW-Authenticate, Authentication-Info and authorization; if set to MANGLE, then beng-proxy allows the translation server to handle HTTP authentication. The default is NO for request headers and MANGLE for response headers.

    MANGLE on the request header settings generates an Autorization request header containing bearer USER, where USER is the current user as specified by the USER translation response packet. This can be used for servers which do not understand the X-CM4all-BENG-User request header (from header group SECURE).

  • OTHER: other end-to-end headers not explicitly mentioned here

  • ALL: all of the above except for SECURE, SSL and AUTH

Mode is one of:

  • NO: don’t forward the headers

  • YES: forward the headers

  • MANGLE: beng-proxy processes the headers

  • BOTH: both beng-proxy and the backend server process the headers (special case for cookie headers, which is a combination of YES and MANGLE)

beng-proxy’s session management is only active when COOKIE is MANGLE (which is the default) or BOTH. The behavior of the COOKIE setting on widgets is undefined.

Resource Limits

The packet RLIMITS specifies Linux resource limits for child processes. Its payload is a string, a sequence of resource limit codes and their respective limit values. The following resource limits are supported:

  • t (CPU): CPU time limit in seconds.

  • f (FSIZE): The maximum size of files that the process may create.

  • d (DATA): The maximum size of the process’s data segment.

  • s (STACK): The maximum size of the process stack, in bytes.

  • c (CORE): Maximum size of core file.

  • m (RSS): The limit of the process’s resident set, in pages.

  • u (NPROC): The maximum number of processes that can be created for the real user ID.

  • n (NOFILE): The maximum file descriptor number that can be opened by this process.

  • l (MEMLOCK): The maximum number of bytes of memory that may be locked into RAM.

  • v (AS): The maximum size of the process’s virtual memory (address space) in bytes.

  • i (SIGPENDING): The maximum number of signals that may be queued.

  • q (MSGQUEUE): The maximum number of bytes that can be allocated for POSIX message queues.

  • e (NICE): A ceiling to which the process’s nice value can be raised.

  • r (RTPRIO): Ceiling on the real-time priority that may be set for this process.

The letter in the first column is the code for the payload, to be followed by ’!’ (for “unlimited”) or the numeric limit value (with optional prefix “K”, “M” or “G” for “kibi”, “mebi”, “gibi”).

The limits are applied to both “soft” and “hard” by default. The code S changes all following specifications to “soft” only, and H does the same for “hard”.

Example:

c!Sv1Gn256Hn512

Explanation:

  • c! unlimited core file size (both soft and hard)

  • S: the following will be soft limits

  • v1G: limit address space to \(1 GiB\) (soft; the hard limit is unchanged)

  • n256: maximum 256 file descriptors (soft)

  • H: the following will be hard limits

  • n512: maximum 256 file descriptors (hard)

Namespaces

Child processes such as FastCGI programs can run in separate Linux namespaces to improve separation from the rest of the server. That requires a fairly new Linux kernel.

Articles on http://lwn.net/ on Linux namespaces:

User Namespaces

The translation packet USER_NAMESPACE launches the process in a new user namespace. This creates a new mapping for user ids inside this namespace. More importantly, this gives the process a full set of capabilities. This is a precondition for some of the other namespaces.

Requires Linux 3.8 or newer.

PID Namespaces

The translation packet PID_NAMESPACE launches the process in a new PID namespace. This creates a new mapping for process ids inside this namespace. Only processes in this namespace are visible and only these can be killed.

The translation packet PID_NAMESPACE_NAME reassociates the process with an existing PID namespace, selected by its name (in the payload). This requires the cm4all-spawn daemon, which manages PID namespaces.

By default, other processes are actually still visible through /proc. For complete PID namespace support, one would need to mount a new proc filesystem connected to the new namespace.

Requires Linux 3.8 or newer.

Cgroup Namespaces

The translation packet CGROOUP_NAMESPACE launches the process in a new cgroup namespace.

Requires Linux 4.6 or newer.

Network Namespaces

The translation packet NETWORK_NAMESPACE launches the process in a new network namespace. Without further configuration, this leaves the process without access to the network, because there is no network device in the new namespace.

The packet NETWORK_NAMESPACE_NAME instead reassociates the process with an existing network namespace configured with ip netns.

Requires Linux 2.6.29 or newer.

Mount Namespaces

A mount namespace makes the VFS mount table private to the new process. This namespace is created implicitly by the packets described in this section.

  • PIVOT_ROOT works like the chroot command; its payload specifies the directory which will be the new root. All other mounts will be removed from the namespace. The new root must contain a top-level directory called mnt. It will be mounted read-only and with option nosuid.

  • CHROOT is plain old chroot(). Can be combined with PIVOT_ROOT; and unlike that command, it does not need a top-level mnt directory.

  • MOUNT_ROOT_TMPFS creates an empty read-only tmpfs as the filesystem root. All required mountpoints will be created, but the filesystem will contain nothing else.

  • TMPFS_DIRS_READABLE: Make all directories created in tmpfs (MOUNT_ROOT_TMPFS, MOUNT_EMPTY) readable. By default, such directories are only “executable”, but not “readable”.

  • MOUNT_PROC mounts a new read-only instance of the proc filesystem.

  • MOUNT_DEV mounts a minimalistic /dev.

  • MOUNT_HOME bind-mounts the home directory (specified by HOME) to the given directory within the PIVOT_ROOT. It will be mounted with option nosuid.

  • MOUNT_TMP_TMPFS mounts a new tmpfs on /tmp. This is private to the namespace and is deleted when the process exits. The payload may specify additional tmpfs mount options such as size=64M.

    By default, code execution from this filesystem is disabled via MS_NOEXEC. A follow-up MOUNT_TMP_TMPFS_EXEC packet disables this behavior, i.e. allows executing code from this tmpfs.

  • MOUNT_TMPFS mounts a new (user-writable) tmpfs on the given path. This is private to the namespace and is deleted when the process exits.

  • MOUNT_NAMED_TMPFS mounts a new (user-writable) tmpfs on the given path that can be shared across processes. The payload is the name of the tmpfs source directory and the target directory (absolute path within the new root), separated by a null byte. The tmpfs will be deleted if it is not used for a certain amount of time.

  • MOUNT_EMPTY mounts a new (read-only) tmpfs on the given path. Inside this filesystem, mount points will be created automatically. Other than that, it can be used to hide parts of an existing filesystem.

  • BIND_MOUNT mounts arbitrary directories from the old root into the new root. The payload is the source directory and the target directory (absolute path within the new root), separated by a null byte. The new mount will have the options ro,noexec,nosuid,nodev.

    The source directory is an absolute path on the host. If it is prefixed with container:, it is relative to the new mount namespace, i.e. the container. The prefix host: is the same as no prefix.

    This (and all variants of this packet) may be followed by an empty OPTIONAL packet: if the source directory does not exist, this directive is ignored silently.

  • EXPAND_BIND_MOUNT is the same as BIND_MOUNT, but the source directory is expanded using REGEX results.

  • BIND_MOUNT_RW and EXPAND_BIND_MOUNT_RW do the same, just in writable mode (mount option rw). BIND_MOUNT_EXEC and EXPAND_BIND_MOUNT_EXEC omit the noexec option.

    BIND_MOUNT_RW_EXEC makes the mount both writable and executable.

  • BIND_MOUNT_FILE mounts a (read-only, non-executable) regular file onto an existing regular file. The payload is the source path (absolute within the old root) and the target path (absolute within the new root), separated by a null byte.

    BIND_MOUNT_FILE_EXEC omits the noexec option.

  • MOUNT_LISTEN_STREAM creates a stream listener socket and mounts it at the specified path into the container. Once the first process connects to this socket, beng-proxy sends a request to the translation server echoing just this packet; its response may contain one of:

    • STATUS: an error condition.

    • EXECUTE: a process to be spawned which starts with the listener socket on stdin.

    • ACCEPT_HTTP: create a transient HTTP listener which receives HTTP requests from the child process; a LISTENER_TAG packet may be present which will be echoed on all translation requests for this listener. If STATS_TAG is present, it will be used instead of LISTENER_TAG for Prometheus metrics.

    The payload is the socket path inside the new mount namespace. After the socket path, a null byte may follow with opaque data which is ignored by beng-proxy, but which may be evaluated by the translation server.

  • WRITE_FILE write a small text file in a mount namespace. Payload is the absolute path and the file contents separated by a null byte. The file can either be written to a tmpfs that was already mounted, or bind-mounted over an existing read-only file.

  • SYMLINK: Create a symlink. Payload is target and linkpath separated by a null byte.

  • PIVOT_ROOT depends on user namespaces. MOUNT_PROC, MOUNT_HOME and MOUNT_TMP_TMPFS depend on PIVOT_ROOT, user namespaces and PID namespaces.

UTS Namespaces

A UTS namespace allows manipulating the host name reported by the kernel. UTS_NAMESPACE creates the namespace; its payload is the new host name.

Namespaces Summary

The following example describes part of a translation packets that attempts to execute a child process as securely as possible:

USER_NAMESPACE
PID_NAMESPACE
NETWORK_NAMESPACE
PIVOT_ROOT "/var/lib/lxc/wheezy/rootfs"
HOME "/var/www/foo"
MOUNT_HOME "/home/www"

The child process cannot see or kill processes processes other than the ones that were started by itself. It cannot access the network. It lives in another filesystem namespace. It can access the directory /var/www/foo at /home/www. The proc filesystem is not mounted.

Cgroups

Control cgroups (“cgroups”) are a Linux kernel feature for grouping processes. They are useful in many ways, such as assigning/accounting resources (CPU, memory, network bandwidth, …).

beng-proxy can use cgroups only when launched with systemd.

CGROUP specifies a cgroup name for the new child process. It is a name below beng-proxy’s own cgroup assigned by systemd. All controllers managed by systemd are enabled.

CGROUP_SET set a cgroup attribute. Payload is in the form controller.name=value, e.g. cpu.shares=42.

CGROUP_XATTR set an extended attribute on the cgroup directory. Payload is in the form namespace.name=value, e.g. user.account_id=42.

Other Child Process Options

  • UID_GID specifies (effective) uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers. All selected users and groups must be explicitly allowed with the user and group settings in the spawn configuration. The default is to run child processes with the same unprivileged credentials as beng-proxy itself (or the one specified with --spawn-user).

  • MAPPED_UID_GID is like UID_GID, but these are the numbers visible inside the user namespace. Currently, only the uid is implemented, therefore the payload must be a 32-bit integer.

  • REAL_UID_GID specifies the real uid and gid for the child process. Payload is either one or two 32 bit integers. Defaults to the UID_GID value.

    This feature works only if https://lore.kernel.org/linux-security-module/20250306082615.174777-1-max.kellermann@ionos.com/ is applied. Without it, the kernel will revert the euid on execve().

  • MAPPED_REAL_UID_GID adds user namespace mappings for REAL_UID_GID. Currently, only the uid is implemented, therefore the payload must be a 32-bit integer.

  • CAP_SYS_RESOURCE grants the new child process the CAP_SYS_RESOURCE capability, allowing it to ignore filesystem quotas. It is not possible to use it together with user namespaces (USER_NAMESPACE).

  • NO_NEW_PRIVS permanently disables new privileges for the child process. That is, setuid and setgid bits are ignored on executed programs. It is recommended to set this flag on all processes by default, unless there are strong reasons against it.

  • FORBID_USER_NS forbids the child process to create new user namespaces and thus gaining a full set of capabilities. This is useful because there have been lots of namespace-related vulnerabilities in the kernel.

  • FORBID_MULTICAST forbids the child process to add multicast group memberships. This is useful because it disallows snooping on the host’s multicast traffic.

  • FORBID_BIND makes bind() and listen() return EACCES.

  • ALLOW_PTRACE allows the child process to use the ptrace() and similar system calls which are disallowed by default.

  • STDERR_PATH specifies an absolute path that will be created. The child’s error messages will be appended there. STDERR_NULL redirects standard error to /dev/null instead.

  • STDERR_POND enables the child_error_logger when it was disabled with is_default="no" (see Child Error Logger).

  • CHILD_TAG specifies a “tag” string for the child process. This can be used to address groups of child processes (e.g. for FADE_CHILDREN). A child process may have more than one tag.

Filters

The translation server can tell beng-proxy to apply a filter to the resource by sending the FILTER command. It is followed by a packet specifying the filter server (HTTP, CGI, FASTCGI, PIPE).

A filter server is a HTTP server. beng-proxy sends the original resource with a POST request and expects the filtered resource as response.

If the filter returns status 200 OK or 204 No Content, then the previous status code is used instead.

It is important that a filter is completely stateless. Running the same filter twice on the same source must always render the same result, at any time.

There may be more than one filter; the order of the PROCESS and FILTER packets is important.

According to the HTTP specification, POST requests are not cached. To gain the necessary performance, beng-proxy caches filter results, extending the HTTP specification. This is limited to resources which have an ETag response header, because beng-proxy uses the ETag internally to address cache items.

Chains

Chained request handlers behave similar to FILTER: the current handler’s response is passed to the next handler as POST request. But unlike FILTER, beng-proxy waits for the current handler to generate the response, and only then asks the translation server for further instructions. This is useful in situations where one handler prepares something which the translation server needs to decide about the next stage.

To enable chaining, the translation sends a response specifying the request handler plus a CHAIN packet with opaque payload. Once that request handler has generated the response, beng-proxy sends another translation request containing a copy of the CHAIN packet and a STATUS packet. Additionally, the CHAIN_HEADER may contain the value of the X-CM4all-Chain response header, if one exists in the current HTTP response.

Now the translation server generates another request handler, or BREAK_CHAIN to send the pending response to the browser as-is.

Example:

request 1:
 URI "/chain/"
 HOST "example.com"
 ...

response 1:
 HTTP "http://foo/bar/"
 CHAIN "42"

request 2:
 CHAIN "42"
 CHAIN_HEADER "xyz"
 STATUS "200"

response 2:
 WAS "/the/filter/program"

If the response packet CHAIN is followed by an empty TRANSPARENT_CHAIN packet, the chain handler will only see a GET request without a body, and the original request method/body will be sent to the following request handler. In that case, the chain handler’s response body will be ignored.

Sessions

beng-proxy lets the translation server manage a “session” variable, which may be empty, or contain an opaque string. It is up to the translation server to manage its contents. With every translation request, beng-proxy sends its contents unless it is empty (in which case it omits this parameter). With every response, the translation server may provide a new value (which may be empty).

Additionally, the REALM_SESSION packet may contain a value that is specific to the session realm. It is only sent to the translation server in TOKEN_AUTH requests.

External Session Manager

Sometimes, the translation server involves an external entity in its session management, for example to handle authentication. The translation server can then ask beng-proxy to handle refreshes by sending a GET to a specified HTTP server.

The packet EXTERNAL_SESSION_MANAGER contains the HTTP URL, and must be followed by one or more ADDRESS packets (just like the HTTP packet). After that, the packet EXTERNAL_SESSION_KEEPALIVE may contain a 16 bit integer specifying the refresh interval in seconds.

The refresh is performed only while handling a request for this session.

Example:

EXTERNAL_SESSION_MANAGER=http://foo/session/42
ADDRESS=192.168.1.100:80
EXTERNAL_SESSION_KEEPALIVE=300

This example sends a GET request every 5 minutes to http://foo/session/42 on IP address 192.168.1.100.

Content-Type Lookup

The presence of CONTENT_TYPE_LOOKUP in a translation response indicates that the translation server is willing to look up Content-Type by file name suffix. It will disable the normal lookup via extended attributes.

When a HTTP request for a static file is handled, beng-proxy will check if the file name has a “suffix” (short alphanumeric name after a dot). If will ask the translation server for a Content-Type for this suffix. This translation request contains the packets CONTENT_TYPE_LOOKUP (echoing the server’s packet) and SUFFIX (containing the non-empty suffix without the dot).

Example conversation:

  • client sends BEGIN “\x03”

  • client sends CONTENT_TYPE_LOOKUP “foo”

  • client sends SUFFIX “png”

  • client sends END

  • server sends BEGIN “\x03”

  • server sends CONTENT_TYPE “image/png”

  • server sends END

If the suffix is unknown, the translation server may omit the CONTENT_TYPE packet and only reply with BEGIN and END.

AUTO_GZIPPED and AUTO_BROTLI_PATH may be specified if this file type is likely to have a precompressed file in the same directory.

Additionally, the translation server may specify transformations (PROCESS or FILTER) for all files of this type. They will be applied before other transformations from the original translation response.

Error documents

Errors from remote servers are forwarded to the client. If no error document is available, beng-proxy generates a simple one.

The translation server indicates that it is willing to override the error document by sending an empty ERROR_DOCUMENT packet in the translation response. As soon as an error occurs (response status 400..599), beng-proxy sends another translation request, consisting of ERROR_DOCUMENT, URI and STATUS. The payload of ERROR_DOCUMENT is opaque to beng-proxy, and will be echoed.

The translation server responds with a pointer to another resource which shall be used as the error document. If the translation response is empty, or if the error document itself fails, beng-proxy forwards the original error document (or generates one). The error document cannot be filtered or processed.

CSRF Protection

To help applications fix cross-site request forgery vulnerabilities, beng-proxy implements the X-CM4all-CSRF-Token header. This feature needs to be enabled explicitly with the following packets:

  • REQUIRE_CSRF_TOKEN requires a valid token request header for modifying requests (POST, PUT etc.). This option is not only supported for regular HTTP requests, but also for widgets (for modifying requests to widgets).

    This requirement only applies to requests with a session cookie. Requests without a session are assumed to be harmless, because there is no authenticated identity associated with it.

  • SEND_CSRF_TOKEN adds a valid token header to successful responses. This option is not supported for widgets.

Covert cross-site requests don’t have this header (with a valid value) will be denied with status 403 Forbidden, effectively avoiding this kind of vulnerability.

Clients can obtain a token by inspecting the response header of a request to a location with SEND_CSRF_TOKEN enabled. They may then use this token in subsequent modifying requests to REQUIRE_CSRF_TOKEN locations.

This token is specific to the session and expires after a while (currently an hour). It can be reused until it expires.

Since this is implemented as a header, this cannot be used for plain HTML FORM requests. If the client is a browser, it is necessary to use the XMLHttpRequest or Fetch API which allows sending custom headers.

Widget registry

The translation server provides access to the widget database, where all widget servers are registered. A widget request can use the following packets:

  • WIDGET_TYPE: the name of the widget type

The translation server’s response consists of these packets:

  • STATUS: in case of a lookup error, this packet provides the HTTP status code

  • PATH, CGI, HTTP: choose one of these packets: a static widget (local file path), a local CGI script, or a HTTP server

  • PROCESS: enable the BENG processor

  • UNTRUSTED: sets the externally visible host name for requests which are proxied to this widget. This marks the widget as “untrusted” and disallows any other way of embedding it. This is useful for widget code whose JavaScript must not be executed in the same context as another widget.

  • UNTRUSTED_PREFIX: same as UNTRUSTED, but is a prefix for the request host name. This widget can only be used when the request’s UNTRUSTED packet begins with this prefix. Example: UNTRUSTED_PREFIX="foo" matches a request with UNTRUSTED="foo.example.com", but not UNTRUSTED="foobar.example.com".

  • UNTRUSTED_SITE_SUFFIX: similar to UNTRUSTED_PREFIX, but matches the suffix instead of the prefix. When generating untrusted URIs, the site name is prepended. During verification, the request’s UNTRUSTED value must exactly match this scheme.

  • UNTRUSTED_RAW_SITE_SUFFIX: Like UNTRUSTED_SITE_SUFFIX, but do not insert a dot.

  • DIRECT_ADDRESSING: Enable “direct” URI addressing for this widget. It is used when the widget is requested in a “frame”. It is a simpler scheme that is more natural; relative links can be built without URI rewriting and without the special beng-proxy encoding. In some cases, the processor can therefore be disabled, reducing overhead.

  • STATEFUL: Remember the state of this widget, i.e. path info and query string. It is remembered for GET requests to the widget when it is focused and the XML processor is enabled. POST requests do not update the state because the POST URI may not be valid in a follow-up GET request. AJAX requests on the other hand should not update the state, and they do not because they usually do not use the XML processor, which is only useful for generating the initial HTML page, and not for incremental (AJAX) updates.

  • WIDGET_INFO: Send the request headers X-CM4all-Widget-Id, X-CM4all-Widget-Type and X-CM4all-Widget-Prefix to the widget server. (Since version 1.3.2)

  • LOCAL_URI: The URI of the “local” location of a widget class. This may refer to a location that serves static resources. It is used by the processor for rewriting URIs beginning with @/ (see Static Widget Resources). The payload must end with a slash. beng-proxy does not process this URI. It is going to be evaluated by the browser, and may be absolute. For example, it may refer to a dedicated resource server.

  • DUMP_HEADERS: Enable header dumps for the widget: on a HTTP request, the request and response headers will be logged. Only for debugging purposes.

  • PEEK: Mark this request as a “peek” request, which means the server shall generate the translation response, but shall not account it (e.g. shall not mark a ticket as “consumed”).

Login translation

To support interactive login, the translation server can implement this protocol. It translates a user name to information on how to launch the user’s processes.

The request contains the following packets:

  • LOGIN: Marks this request as a “login” request. No payload.

  • SERVICE: Payload specifies the service that wants to log in. Examples for well-known service names:

    • ssh: Secure Shell. The response describes how to execute commands in a SSH sesion channel.

    • sftp: SSH File Transfer Protocol, i.e. SSH subsystem sftp.

    • rsync: rsync over SSH. This request is sent by Lukko when it sees a rsync --server command. The response contains an EXECUTE packet with a path to a statically linked rsync executable that will be executed using execveat().

  • LISTENER_TAG: A string which specifies the listener this login was accepted on; this is optional and its configuration is specific to the translation client.

  • USER: Contains the user name specified by the client.

  • PASSWORD: If this packet is present, then the client asks to verify a password (clear-text in the payload). A password mismatch must result in a negative reply.

If the user does not exist, the translation server shall respond with STATUS=404.

A successful response must contain at least HOME and UID_GID:

  • HOME: Path of the user’s home directory.

  • SHELL: An absolute path specifying the user’s shell.

  • UID_GID: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers.

  • TOKEN: A token to be matched by the OpenSSH configuration file.

  • NO_PASSWOORD: If present, then the LOGIN request can be approved without a password. This can happen when the username is a secret token. An optional payload may describe a service-specific limitation, e.g. sftp to limit LOGIN/SERVICE=ssh to SERVICE=sftp.

  • AUTHORIZED_KEYS: The contents of an OpenSSH authorized_keys file.

  • NO_HOME_AUTHORIZED_KEYS: If present, then ~/.ssh/authorized_keys is not used.

  • SERVICE: Begin a new partition of the response for the specified service. The translation server can do this to send an individual response for all supported services in a single response. This is useful if the request was SERVICE=ssh when the client (i.e. the SSH server, i.e. Lukko) doesn’t yet know whether the SSH client will open a shell or a SFTP session. Returning all possible services eliminates further translation requests: the translation server promises that these are the only allowed services (in the context of the SERVICE specified in the request) and all other services shall be denied.

Cron translation

This sub-protocol can tell the cron job execution layer of Workshop how to spawn a child process.

The request contains the following packets:

  • CRON: Marks this request as a “cron” request. The payload is the name of the cron section in Workshop’s configuration file, or none if none was specified there.

  • URI: If the job refers to a URN instead of a command, then this packet is present and contains the URN. A successful response must specify the program to be executed in EXECUTE with command-line arguments in APPEND packets.

  • USER: The account id owning the job.

  • PARAM: An opaque string from the cron job table. Its contents are specific to the translation server. Its contents should be considered user input, and should not be trusted. Optional.

If the account does not exist, the translation server shall respond with STATUS=404.

If no STATUS packet is present, the request is assumed to be successful.

A successful response must contain at least HOME and UID_GID:

  • HOME: Path of the user’s home directory.

  • UID_GID: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers.

Additional packets may configure resource limits (Resource Limits, Namespaces) and so on (Other Child Process Options).

The client may assume that all responses may be cached indefinitely.

Execute Translation

This sub-protocol is used to query how to spawn a process which was requested to be executed.

The request contains the following packets:

  • EXECUTE: Marks this request as an “execute” request. The payload is a token describing which process shall be executed. This token was provided by an unprivileged process and should not be trusted.

  • PARAM: An opaque parameter with more details about the process. This parameter was provided by an unprivileged process and should not be trusted.

  • SERVICE: Payload specifies the service that wants to execute the process, e.g. workshop.

  • LISTENER_TAG: A tag which was set in the client’s configuration file.

  • PLAN: If this request was triggered by a Workshop plan, then this is its name.

A successful response contains at least EXECUTE with the path of the program to be spawned, plus the usual process parameters.

A failed response contains STATUS and optionally MESSAGE.

  • HOME: Path of the user’s home directory.

  • UID_GID: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers.

Pool translation

This sub-protocol is used beng-lb. It allows the translation server to choose a pool which shall handle a specific HTTP request.

The request contains the following packets:

  • POOL: Marks this request as a “pool request. The payload is the name of the translation_handler section in lb.conf.

  • HOST: the Host HTTP request header

The response contains the following packets:

  • POOL: The name of the pool (or branch or lua_handler …) which shall handle the HTTP request.

  • CANONICAL_HOST: A string which shall be used instead of the Host request header for the “host” sticky mode.

  • SITE: Optional identification or name of the site this resource belongs to. It has no meaning for beng-lb, and is only used for TCACHE_INVALIDATE.

  • STATUS: Can be used instead of POOL to generate a brief error response.

  • REDIRECT: Can be used instead of POOL to generate a redirect response (303 See Other with the specified Location header value). Can be combined with STATUS to select a different status code.

  • HTTPS_ONLY: See page .

  • MESSAGE: Can be used instead of POOL to generate a text/plain response. Can be combined with STATUS and REDIRECT.

  • VARY: See page .

  • ARCH: Prefer this CPU architecture for the selected pool member. Payload can be amd64 or arm64. If no member with a matching architecture exists, the behavior is unspecified; the request may fail or be forwarded to a server with a mismatching architecture. (This is implemented for rendezvous_hashing only.)

The client may assume that all responses may be cached indefinitely.