Translation =========== :program:`beng-proxy` knows two ways to locate the resource a request URI points to: - via an external translation server - static translation The latter is only for debugging. The URI path is appended to the document root (:file:`/var/www` by default). For security (by obscurity) reasons, :program:`beng-proxy` has no code for generating directory listings. If the request has a trailing slash, :program:`beng-proxy` looks for a file named ``index`` or ``index.html`` and serves it. Without the trailing slash, :program:`beng-proxy` refuses to handle the request. The translation server should be the default on production servers. It is a daemon on the same physical machine which does all the translation work for us. :program:`beng-proxy` connects to a Unix socket to contact this translation server. A request may consist of several micro commands. The request is initialized with the ``BEGIN`` command, which is followed by any number of commands which provide parameters. After all parameters have been transferred, the client sends the ``END`` command, and waits for the server’s response. The client can send any number of requests over the socket until one side closes the connection. Example conversation -------------------- #. client sends ``BEGIN`` “\\x03” #. client sends ``REMOTE_HOST`` “192.168.1.77:1234” #. client sends ``HOST`` “www.example.com” #. client sends ``URI`` “/foo/index.html” #. client sends ``END`` #. server sends ``BEGIN`` “\\x01” #. server sends ``PATH`` “/var/www/foo/index.html” #. server sends ``CONTENT_TYPE`` “text/html; charset=utf8” #. server sends ``PROCESS`` #. server sends ``END`` Command packets --------------- The protocol is binary and uses host byte order. A command packet may look like this in pseudo C:: struct beng_proxy_translate_packet { uint16_t length; uint16_t command; char payload[length]; }; The ``length`` only refers to the payload. The maximum supported payload size is 65535 bytes. Most parameters are ASCII strings; in this case, the payload contains just the raw string, without terminating zero. Request ------- - ``BEGIN``: Begins the request. The payload is a 8-bit unsigned integer specifying the protocol version. The protocol version described here is 3. - ``END``: Finishes the request. .. _t-listener_tag: - ``LISTENER_TAG``: The ``tag`` of the listener (as specified in the ``listener`` configuration section) that accepted the connection. - ``REMOTE_HOST``: the client’s address or host name and the port number (as string) (This packet optional and is only submitted if requested via ``WANT``, see page ) - ``HOST``: the ``Host`` HTTP request header - ``URI``: the raw URI from the HTTP request (without the query string) - ``QUERY_STRING``: the query string from request URI, without the question mark (This packet optional and is only submitted if requested via ``WANT``, see page ) - ``SESSION``: a session identifier generated by the translation server, see section :ref:`sessions` - ``REALM_SESSION``: Like ``SESSION``, but realm-local. Unlike ``SESSION``, it is only sent under certain conditions (e.g. in ``TOKEN_AUTH`` requests), because the realm is only known after the regular translation response has been applied already. - ``PARAM``: a parameter passed by the browser - ``USER_AGENT``: the ``User-Agent`` request header sent by the client (not in the widget registry) (This packet optional and is only submitted if requested via ``WANT``, see page ) - ``USER``: the user name currently logged in using ``AUTH``; see page (This packet optional and is only submitted if requested via ``WANT``, see page ) - ``LANGUAGE``: the ``Accept-Language`` request header sent by the client (not in the widget registry) (This packet optional and is only submitted if requested via ``WANT``, see page ) - ``AUTHORIZATION``: the ``Authorization`` request header sent by the client (see `RFC 2617 `__); only for :ref:`http_auth`. - ``CONTENT_TYPE_LOOKUP``: Look up the ``Content-Type`` of a file name suffix. See :ref:`ctlookup` for a detailed description. - ``SUFFIX``: The file name suffix without the dot for ``CONTENT_TYPE_LOOKUP``. See :ref:`ctlookup` for a detailed description. - ``ERROR_DOCUMENT``: a resource has failed, and the translation server is asked to provide the location of the error document. This is followed by the packets ``URI`` and ``STATUS``. See :ref:`errdoc` for a detailed description. - ``PROBE_PATH_SUFFIXES``: Result of ``PROBE_PATH_SUFFIXES``. This is an echo of the ``PROBE_PATH_SUFFIXES`` from the previous translation response. If a file with one of the given suffixes exists, then ``PROBE_SUFFIX`` specifies the first existing suffix. If no ``PROBE_SUFFIX`` follows, then no file was found. - ``PATH_EXISTS``: This is an echo of ``PATH_EXISTS`` from the previous translation response, accompanied by ``STATUS`` describing whether the given file exists. - ``FILE_NOT_FOUND``: The specified file does not exist. The translation server is asked to provide an alternate translation. This is an echo of the ``FILE_NOT_FOUND`` from the previous translation response. - ``ENOTDIR``: The specified file does not exist, but a portion of the path points to a regular file. This is an echo of the ``ENOTDIR`` packet from the previous translation response. The given URI has been shortened: the last slash and what follows has been moved to ``PATH_INFO``. This may be repeated until the regular file has been found. - ``DIRECTORY_INDEX``: The specified file is a directory. The translation server is asked to provide an alternate translation. This is an echo of the ``DIRECTORY_INDEX`` from the previous translation response. .. _want: - ``WANT``: causes :program:`beng-proxy` to submit the same translation request again, with this packet echoed plus the requested packets. The payload is an array of 16-bit integers with requested packet ids. The following packets are allowed/supported here: ``LISTENER_TAG``, ``REMOTE_HOST``, ``USER_AGENT``, ``USER``, ``LANGUAGE``, ``ARGS``, ``QUERY_STRING`` - ``WANT_FULL_URI``: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to :program:`beng-proxy`), and with the full request URI (including semicolon-arguments and the follow-up suffix, but excluding the query string). - ``INTERNAL_REDIRECT``: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to :program:`beng-proxy`). However, instead of the original request URI, :program:`beng-proxy` uses the one from this responses’s ``URI`` or ``EXPAND_URI`` packet. - ``CHECK``: causes beng-proxy to submit the same translation request again, with this packet appended (its payload is opaque to :program:`beng-proxy`). The current response is remembered, to be used when the second response contains the ``PREVIOUS`` packet. This can be used to implement authentication (see :ref:`authentication`). - ``CHECK_HEADER``: the ``CHECK`` request shall contain the specified request header. Payload is the header name (lower case). For the ``CHECK`` request, the payload is the header name and the value separated by a colon; if no such request header exists, the value is empty. - ``AUTH``: Indicates that authentication is necessary (see :ref:`auth`). - ``READ_FILE``: This is a repeated translation in reply to a translation response with a ``READ_FILE`` packet. The payload is the file contents or empty if the file does not exist (or if there was another problem reading the file). This packet is implicitly on “vary”. .. _tresponse: Response -------- - ``BEGIN``: Begins the response. The payload is a 8-bit unsigned integer specifying the protocol version. The initial protocol version is 0. - ``END``: Finishes the response. - ``URI``: the “real” raw URI from the HTTP request (without the query string); this is used to override the URI, e.g. when :program:`beng-proxy` is behind another proxy which modifies the URI - ``EXPAND_URI``: Override ``URI`` with the given value (after expanding). - ``HOST``: the host name for generating absolute URLs; default is the ``Host`` HTTP request header - ``SCHEME``: the scheme for generating absolute URLs; default is ``http``. This packet is useful if :program:`beng-proxy` is behind ``stunnel`` - ``ALLOW_REMOTE_NETWORK``: Allow only clients with addresses in the specified network; all other addresses get a "403 Forbidden" response. The payload is a ``struct sockaddr_in`` or ``struct sockaddr_in6`` plus one byte specifying the prefix length (in bits). This packet may be sent more than once. - ``UNTRUSTED``: sets the “untrusted” host name for this request: only untrusted widgets matching this host name are allowed. Trusted widgets are rejected. - ``STATUS``: HTTP status code, encoded as ``uint16_t``; this parameter is usually not used - ``HTTP``: load the resource from a remote HTTP server (see :ref:`http`). Payload is an absolute URI starting with ``http://`` or ``https://``. - ``HTTP2``: force HTTP/2 for the preceding ``HTTP`` packet. No payload. - ``CERTIFICATE``: Use the named client certificate for the outbound SSL connection (see :ref:`CERTIFICATE `). - ``PIPE``: a local program which reads input from stdin and prints the modified resource on stdout (see :ref:`pipe`). - ``LHTTP_PATH``: a local path which is executed as HTTP server - ``LHTTP_URI``: the request URI for ``LHTTP_PATH`` - ``EXPAND_LHTTP_URI``: the regular expression rule for ``LHTTP_URI`` - ``LHTTP_HOST``: the “Host” request header for ``LHTTP_PATH`` - ``CONCURRENCY``: a 16 bit integer specifying the maximum number of concurrent requests to this server (FastCGI, LHTTP and Multi-WAS only) - ``PARALLELISM``: a 16 bit integer specifying the maximum number of parallel child processes of this kind (FastCGI, WAS, Multi-WAS, LHTTP) - ``DISPOSABLE``: Mark the child process as "disposable", which may give it a very short idle timeout (or none at all). To be used for processes that will likely only be used once. - ``NON_BLOCKING``: If present, make the socket passed to a child process non-blocking (LHTTP only currently). This is needed by NodeJS 0.12. - ``CGI``: a local path which is executed as CGI script (see :ref:`t-cgi`) - ``FASTCGI``: a local path which is executed as FastCGI script (see :ref:`t-cgi`) - ``WAS``: a local path which is executed as WAS application (see :ref:`t-cgi`). May be followed by ``CONCURRENCY`` to enable Multi-WAS mode. - ``REDIRECT``: another alternative to ``PATH``: redirect the HTTP client to this URL; ``STATUS`` must be set to one of the HTTP 3xx codes - ``EXPAND_REDIRECT``: Override ``REDIRECT`` with the given value (after expanding); see :ref:`tresponse`. - ``REDIRECT_QUERY_STRING``: Append the query string to the given ``REDIRECT`` URL. - ``REDIRECT_FULL_URI``: Use the full request URI path (including semicolon-arguments and the follow-up suffix, but excluding the query string) for expanding ``REDIRECT``. This packet must be preceded by ``BASE``, ``EASY_BASE`` and ``REDIRECT``. It makes sense to combine it with ``REDIRECT_QUERY_STRING``. .. _httpsonly: - ``HTTPS_ONLY``: Allow this request to be handled only on encrypted connections (HTTPS with SSL/TLS). If the connection is encrypted, then this is a no-op. If it is not encrypted, the server generates a permanent redirect to ``https://``. The payload may contain a 16 bit integer specifying the port number (zero means default port). - ``BOUNCE``: Redirects the browser with a ``303 See Other`` status to this URI, and appends the current absolute URI (form-encoded). This is useful to redirect to another server, which will need to redirect back to the original URI. - ``MESSAGE``: Generate a response with the given body (``text/plain`` and US-ASCII). - ``TINY_IMAGE``: Generate a response with a tiny (one-pixel GIF) image. - ``EXPAND_PATH``: Override the ``PATH`` with the given value (applicable to static files, CGI, FastCGI, WAS, ``HTTP``). Backslash references are expanded to the value of the match group of ``REGEX``. In the presence of this packet, the URI suffix after the base will not be appended to other paths. The translation server is responsible for ensuring that the resulting path cannot point to files that are not supposed to be published. :program:`beng-proxy` disallows ``/../`` sequences in the URI tail string, but it may nonetheless be possible for an attacker to break out if the regular expression and the expansion string are phrased improperly. (Since version 2.0.5) - ``LISTENER_TAG``: override the ``LISTENER_TAG``. All following translation requests will feature the new listener tag. - ``SITE``: optional identification or name of the site this resource belongs to - ``EXPAND_SITE``: provide a cache expansion for the preceding ``SITE`` - ``SESSION_SITE``: Set a ``SITE`` for all requests in the current session. This packet with an empty payload can be used to clear the session’s ``SITE`` value. - ``RATE_LIMIT_SITE_REQUESTS``: limit the rate of requests to this site. Payload is two 32-bit floats describing the rate and burst for the underlying token bucket. Requests that fail the token bucket get a "429 Too Many Requests" response. - ``RATE_LIMIT_SITE_TRAFFIC``: limit the traffic rate of requests to this site. Payload is two 32-bit floats describing the rate [bytes per second] and burst [bytes] for the underlying token bucket. Requests that fail the token bucket get a "429 Too Many Requests" response. - ``DOCUMENT_ROOT``: base directory of the site; may also be passed after a ``CGI`` command, to set the document root only for this CGI - ``FILTER``: the next resource address (``HTTP``, ``CGI``) will denote an output filter, see section :ref:`filter` for details. - ``CHAIN``: similar to ``FILTER``, but the translation server is asked again after the current response has been generated. See section :ref:`chain` for details. .. _cache_tag: - ``CACHE_TAG``: Mark a cache item with this tag (an opaque string). This can be used to flush/invalidate groups of cache items in one control command. The following parts of the response can be tagged: - After ``FILTER``: for filter cache items, to be used with :ref:`FLUSH_FILTER_CACHE `. - After a HTTP resource address (e.g. ``HTTP``, ``FASTCGI``, ``WAS``): for HTTP cache items, to be used with :ref:`FLUSH_HTTP_CACHE `. - Prior to any of the above: for the whole translation response (i.e. the translation cache item), to be used with :ref:`TCACHE_INVALIDATE `. - ``REVEAL_USER``: If present after ``FILTER``, then the filter will see ``X-CM4all-BENG-User`` as an additional request header (if a user is logged in). - ``FILTER_4XX``: Enable filtering of client errors (status 4xx). Without this flag, only successful responses (2xx) are filtered. Only useful when at least one ``FILTER`` was specified. - ``PROCESS``: enables the :program:`beng-proxy` processor, see section :ref:`processor` - ``PROCESS_TEXT``: enables the :program:`beng-proxy` text processor (Since version 1.3.2) - ``PROCESS_CSS``: enables the :program:`beng-proxy` CSS processor - ``DOMAIN``: the domain name for partitioned frames - ``SESSION``: a session identifier generated by the translation server, see section :ref:`sessions` - ``RECOVER_SESSION``: A token to be stored in a browser cookie which can later be used by the translation server to recover the current session. In particular, it will be sent back to the translation server in a :ref:`token_auth` request. .. _t_attach_session: - ``ATTACH_SESSION``: Attach to an existing session (or mark this session to be attached by others with the same identifier). The payload is a non-empty unique identifier for sessions to be attached/merged. This value can also be used to discard the session using the :ref:`DISCARD_SESSION ` control packet. - ``USER``: the user name associated with this session .. _t_realm: - ``REALM``: a realm name for this session. An existing session matches only if its realm matches the current request’s realm; on mismatch, a new session with the same public id is created for this realm. If this packet is not specified in the translation response, then the “Host” request header is used. - ``REALM_FROM_AUTH_BASE``: Copy the ``AUTH`` or ``AUTH_FILE`` contents to ``REALM`` (i.e. without ``APPEND_AUTH``). - ``TRANSPARENT``: Transparent proxy: forward URI path segment params to the request handler instead of using them. This disables legacy handling of these params (which was used to control widget rendering). - ``LANGUAGE``: overrides the ``Accept-Language`` request header for this session - ``DISCARD_SESSION``: discard the current browser session - ``DISCARD_REALM_SESSION``: Like ``DISCARD_SESSION`, but discard only the part of the session that is specific to the current realm (see :ref:`t_realm`). - ``SECURE_COOKIE``: Set the "secure" flag on the session cookie. - ``SESSION_COOKIE_SAME_SITE``: Set the "SameSite" attribute on the (realm) session cookie. Valid payloads are ``strict``, ``lax`` and ``none`` (all lower case). - ``CHDIR``: change the working directory (after namespace setup). - ``HOME``: home directory of the account this site belongs to; will be mounted in the jail; defaults to ``DOCUMENT_ROOT`` - ``EXPAND_HOME``: Expansion for ``HOME``. - ``ADDRESS``: after each ``HTTP`` packet, there must be one or more ``ADDRESS`` packets which specify the resolved addresses. The payload of each is a ``struct sockaddr``. - ``STICKY``: Make the resource address "sticky", i.e. attempt to forward all requests of a session to the same worker. - ``VIEW``: starts a new view; the body of the packet is the name of the view (ASCII letters, digits, underscore, dash only). Each view can have different address/processor/filter settings. The first view (the one before the first ``VIEW`` packet) is the default and has no name. - ``MAX_AGE``: a 32 bit unsigned integer specifying the number of seconds the preceding piece of information is valid without having to revalidate. A value of 0 specifies that :program:`beng-proxy` should not remember this value at all. Without this packet, the maximum age is not limited. Currently, this is only supported for the following packets: - ``BEGIN`` (refers to the whole translate response) - ``USER`` .. _tvary: - ``VARY``: similar to the HTTP ``Vary`` response header; the payload contains an array of translation request commands which this response depends upon. The following request packets are currently supported: ``PARAM``, ``SESSION``, ``LISTENER_TAG``, ``REMOTE_HOST``, ``HOST``, ``LANGUAGE``, ``USER_AGENT``, ``QUERY_STRING``, ``USER``, ``INTERNAL_REDIRECT``, ``ENOTDIR``. The following request packets are on “vary” implicitly: ``WIDGET_TYPE``, ``CONTENT_TYPE_LOOKUP``, ``URI``, ``STATUS``, ``CHECK``, ``WANT_FULL_URI``, ``PROBE_PATH_SUFFIXES``, ``PROBE_SUFFIX``, ``PATH_EXISTS``, ``FILE_NOT_FOUND``, ``DIRECTORY_INDEX``, ``WANT``. - ``INVALIDATE``: Invalidates existing translation cache items which depend on some of the request values. The payload has the same format as ``VARY``. Additionally, the ``URI`` command is supported, to invalidate all items pointing to the request URI, and ``SITE`` to invalidate all items with the given site name. If you specify more than one command, all must match. If you list a command which was not specified in the request (or a command which is not supported here), nothing will be deleted. Example: ``INVALIDATE`` on ``SESSION`` invalidates all cache items for the current session. - ``REQUEST_HEADER_FORWARD``: See :ref:`tfwdheader` - ``RESPONSE_HEADER_FORWARD``: See :ref:`tfwdheader` - ``WWW_AUTHENTICATE``: the ``WWW-Authenticate`` response header sent to the client (see `RFC 2617 `__). Currently, this is never cached. This exact behavior is subject to change in the future, and will be cacheable. - ``AUTHENTICATION_INFO``: the ``Authentication-Info`` response header sent to the client (see `RFC 2617 `__). - ``HEADER``: A custom HTTP response header sent to the client. Name and value are separated by a colon (without any whitespace). This will not override existing headers. It is not allowed to set hop-by-hop headers (`RFC 2616 13.5.1 `__) this way. This packet shall only be a last resort, when there is no other way to set a required response header. - ``EXPAND_HEADER``: Same as ``HEADER``, but expand the value. - ``REQUEST_HEADER``: A custom HTTP request header for the backend server. Name and value are separated by a colon (without any whitespace). This will override existing headers. It is not allowed to set hop-by-hop headers (`RFC 2616 13.5.1 `__) this way. - ``EXPAND_REQUEST_HEADER``: Same as ``REQUEST_HEADER``, but expand the value. - ``CONTENT_TYPE_LOOKUP``: Indicates that the translation server is willing to look up ``Content-Type`` by file name suffix. See :ref:`ctlookup` for a detailed description. - ``ERROR_DOCUMENT``: Indicates that the translation server is willing to provide a custom error document. See :ref:`errdoc` for a detailed description. - ``PROBE_PATH_SUFFIXES``: Check if the ``TEST_PATH`` (or ``EXPAND_TEST_PATH``) plus one of the suffixes from ``PROBE_SUFFIX`` exists (regular files only). :program:`beng-proxy` will send another translation request, echoing this packet and echoing the ``PROBE_SUFFIX`` that was found. This packet must be followed by at least two ``PROBE_SUFFIX`` packets. - ``PATH_EXISTS``: Check if the given ``PATH`` exists; the translation shall be repeated, echoing this packet accompanied by a ``STATUS`` packet describing whether the given file exists (200 or 404). - ``FILE_NOT_FOUND``: Indicates that the translation server would like to provide an alternate translation when the specified file does not exist. :program:`beng-proxy` will repeat the translation request with this packet echoed. This is supported by the following address types: ``PATH``, ``CGI``, ``FASTCGI``, ``WAS``, ``LHTTP_PATH``. - ``ENOTDIR``: Indicates that the translation server would like to provide an alternate translation when the specified file does not exist, but a portion of the path points to a regular file. - ``DIRECTORY_INDEX``: Indicates that the translation server would like to provide an alternate translation when the specified file is a directory. :program:`beng-proxy` will repeat the translation request with this packet echoed. - ``DIRECTORY_INDEX_SLASH``: If ``DIRECTORY_INDEX`` applies but the request URI path does not end with a slash, automatically send a redirect appending the slash. - ``TEST_PATH``: Test the specified file. If this packet is not present, then the path from the resource address is used (``PATH``, ``CGI``, ``FASTCGI``, ``LHTTP_PATH``). Affects the packets ``FILE_NOT_FOUND``, ``DIRECTORY_INDEX``, ``ENOTDIR``. - ``EXPAND_TEST_PATH``: Override the ``TEST_PATH`` with the given value. Backslash references are expanded to the value of the match group of ``REGEX``. (Since version 4.0.34) - ``COOKIE_DOMAIN``: Set the session cookie’s "Domain" attribute. - ``COOKIE_HOST``: Override the cookie host name. This host name is used for storing and looking up cookies in the jar. It is especially useful for protocols that don’t have a host name, such as CGI. - ``EXPAND_COOKIE_HOST``: Expansion for ``COOKIE_HOST``. - ``COOKIE_PATH``: Override the cookie’s ``Path`` attribute. This is sent to the client when :program:`beng-proxy` generates a new session cookie. Be careful with overlapping locations that create conflicting cookies. - ``VALIDATE_MTIME``: A cached response is valid only if the file specified in this packet is not modified. The first 8 bytes is the mtime (seconds since UNIX epoch), the rest is the absolute path to a regular file (symlinks not supported). The translation fails when the file does not exist or is inaccessible. The special value 0 matches only when the file does not exist; as soon as the file appears, the cached response will be discarded. - ``READ_FILE``: Asks :program:`beng-proxy` to read the specified (small) file and submit another translation request with the file contents in another ``READ_FILE`` packet. - ``EXPAND_READ_FILE``: Expansion for ``READ_FILE``. .. _tdefer: - ``DEFER``: Defer the request to the next translation server. - ``PREVIOUS``: Tells beng-proxy to use the resource address of the previous translation response. Only allowed if the request contains a ``CHECK`` or ``AUTH`` packet. - ``UNCACHED``: Disable the HTTP cache for the given resource address. - ``IGNORE_NO_CACHE``: Ignore the ``Cache-Control:no-cache`` request header, i.e. don't allow the client to circumvent the HTTP cache. - ``EAGER_CACHE``: Enable caching for the given resource address, even if it is not declared to be cacheable. - ``DISCARD_QUERY_STRING``: Discard the query string from the request URI. This can be combined with ``EAGER_CACHE`` to prevent cache-busting with random query strings. - ``NO_QUERY_STRING``: No query string is allowed/supported on this request URI. The webserver is allowed to reject requests with a query string. - ``AUTO_FLUSH_CACHE``: All (successful) modifying requests (``POST``, ``PUT`` ...) flush the HTTP cache of the specified ``CACHE_TAG``. - ``GENERATOR``: A short symbolic identifier (alphanumeric, underscore, dash) for the entity that generates the HTTP response (according to the rest of this translation response). If non-empty, then this will set the ``GENERATOR`` attribute in access log datagrams. Without this packet, the value of the ``X-CM4all-Generator`` response header is used. To send a standard error page, the translation server sends a response containing only the ``STATUS`` parameter with the desired HTTP status. Sending a packet twice is regarded an error. It cannot be used to override a previous value. .. _tcache: Caching ------- Almost all translation responses must be cacheable. The following response packets allow reusing cache items for different requests: - ``LIKE_HOST``: Repeat the translation, but with the specified ``HOST`` value (which can be an artificial name, even one which is not RFC-valid). This allows sharing the translation cache between different hosts. It can be combined with ``BASE`` and ``REGEX`` to share only a part of the URI location space. - ``BASE``: Defines a realm in the URI space. The payload specifies the URI prefix (of the original request URI, ending with a slash) which contains this realm. All resources in this realm can be addressed by :program:`beng-proxy` with a trivial pattern: append the relative URI (within the realm) to the resource address (e.g. the ``PATH``, ``HTTP`` or ``PATH_INFO`` value). The address in this response applies to request URI, not the base URI (to allow backwards compatibility with translation clients which do not support this packet). Example: in the request, ``URI`` is ``/foo/bar/index.html``; in the response, ``PATH`` is ``/var/www/foo/bar/index.html`` and ``BASE`` is ``/foo/``. The :program:`beng-proxy` translation cache now knows: if a request on ``/foo/test.png`` is received, it can serve :file:`/var/www/foo/test.png` without querying the translation server. - ``UNSAFE_BASE``: Modifier for ``BASE``: omit the security checks. This allows ``/../`` to be part of the remaining URI, possibly allowing clients to break out of the given directory. - ``EASY_BASE``: Modifier ``BASE`` which aims to simplify its usage: the resource address given in the response refers to the ``BASE``, not to the actual request URI. It is important to include the trailing slash which is part of ``BASE`` in the resource address (e.g. ``BASE``\ =”/foo/”, ``PATH``\ =”/var/www/foo/”). :program:`beng-proxy` applies the URI suffix before handling the HTTP request. - ``REGEX``: Reuse a cached response only if the request ``URI`` matches the specified regular expression (Perl compatible, anchored). This works only when a BASE was specified. (Since version 1.3.2) - ``INVERSE_REGEX``: Don’t apply the cached response if the request ``URI`` matches the specified regular expression (Perl compatible, anchored). (Since version 1.3.2) - ``REGEX_TAIL``: Apply ``REGEX`` and ``INVERSE_REGEX`` to the URI suffix following ``BASE`` instead of the whole request URI. (Since version 4.0.21) - ``REGEX_RAW``: By default, URI paths are normalized when expanding a cached translation response (i.e. mutliple consecutive slashes are compressed to one and occurrences of ``/./`` are compressed to ``/``). This option disables the URI path normalization. - ``REGEX_UNESCAPE``: Unescape the URI for ``REGEX``. - ``INVERSE_REGEX_UNESCAPE``: Unescape the URI for ``INVERSE_REGEX``. - ``REGEX_ON_HOST_URI``: Prepend the ``Host`` header to the string used with ``REGEX`` and ``INVERSE_REGEX``. - ``REGEX_ON_USER_URI``: Prepend the user name (from ``USER``) and a ’@’ to the string used with ``REGEX`` and ``INVERSE_REGEX``. - ``LAYOUT``: The translation server gives an overview of the URI layout. Its payload is a non-empty opaque value which is mirrored in the next request. This packet is followed by one or more ``URI`` / ``BASE`` / ``REGEX`` packets specifying exact URI matches, URI bases or regular expressions which shall not share cache items. The first matching base/regex specfies where translation cache items will be stored; all URIs without a match have their own cache. This way, cacheable URI bases can be constructed easily without excessively complex ``INVERSE_REGEX`` packets. Example for a response after a request to ``/.cm4all/foo``: - ``BASE=/`` - ``LAYOUT=[opaque]`` - ``URI=/robots.txt`` - ``BASE=/.cm4all/private/`` - ``BASE=/.cm4all/`` - ``BASE=/.well-known/`` - ``REGEX=\.php$`` Here, the whole host is separated into three bases (the three which are specified, and everything else). Responses don't need ``INVERSE_REGEX`` to exclude the specified bases. The following request will mirror the ``LAYOUT`` packet and the matching ``URI`` / ``BASE`` / ``REGEX` packet: - ``URI=/.cm4all/foo`` - ``LAYOUT=[opaque]`` - ``BASE=/.cm4all/`` The server recognizes that this is a follow-up request, and responds: - ``BASE=/.cm4all/`` - ``EASY_BASE`` - ``PATH=/var/www/cm4all/`` This response can be cached and reused for everything below ``/.cm4all/``, except for URIs below ``/.cm4all/private/``. If ``LAYOUT`` is followed by ``REGEX_TAIL``, then all regular expressions (and other URI comparisons) are matched against the tail of the URI after the given ``BASE``. Example ``LAYOUT`` response: - ``BASE=/foo/`` - ``LAYOUT=[opaque]`` - ``REGEX_TAIL`` - ``URI=hello.txt`` - ``BASE=bar/`` - ``REGEX=\.php$`` In the follow-up request, these are mirrored; for example, after a request to ``/foo/hello.txt``, the next translation request looks like this: - ``URI=/foo/hello.txt`` - ``LAYOUT=[opaque]`` - ``URI=hello.txt`` Note how there are now two ``URI`` packets: the first one is the actual request URI and the second one mirrors the matching ``LAYOUT`` item. As a shortcut for implementing CORS, a layout item may be followed by ``ACCESS_CONTROL_ALLOW_ALL``. All matching ``OPTIONS`` requests will then lead to an empty response with ``Access-Control-Allow-{Origin,Methods,Headers}: *``. Use this for API endpoints with unrestricted script access to avoid roundtrips to the actual API process. .. _tstatic: Static files ------------ See :ref:`static` for an explanation of static file resources. The response packet ``PATH`` declares a static file that will be served. The following packets are available: - ``PATH``: Absolute path of the local file to be served. - ``EXPAND_PATH``: Override the path with the given value (after expanding); see :ref:`tresponse`. - ``APPEND_PATH``: Append this string to the ``PATH`` (after applying ``BASE`` or ``EXPAND_PATH``). - ``AUTO_BROTLI_PATH``: Build the precompressed Brotli path by appending :file:`.br` to the ``PATH``. - ``GZIPPED``: Absolute path of a precompressed version of the file. The file is compressed with ``gzip``. May follow the ``PATH`` packet. - ``AUTO_GZIPPED``: Build the precompressed path by appending “``.gz``” to the ``PATH``. Unlike ``GZIPPED``, this is compatible with ``BASE``. - ``AUTO_GZIP``: Compress the response on-the-fly if the client accepts the ``gzip`` encoding. This consumes a lot of CPU and should only be used for dynamic responses which can be compressed well. - ``AUTO_BROTLI``: Compress the response on-the-fly if the client accepts the ``br`` encoding. This consumes a lot of CPU and should only be used for dynamic responses which can be compressed well. - ``AUTO_COMPRESS_ONLY_TEXT``: apply ``AUTO_GZIP`` and ``AUTO_BROTLI`` only to text responses. - ``CONTENT_TYPE``: MIME type of the file (optional) - ``EXPIRES_RELATIVE``: Generate an ``Expires`` response header. The payload is a 32 bit integer specifying the number of seconds from now. - ``EXPIRES_RELATIVE_WITH_QUERY``: Like ``EXPIRES_RELATIVE``, but this value is only used if there is a non-empty query string. This is useful for serving static files which are usually referenced with a version number in the query string. - ``BENEATH``: Absolute path of a directory that the ``PATH`` shall not escape, not even using symlinks. This is implemented using the ``RESOLVE_BENEATH`` flag of Linux's ``openat2()`` system call. Proxying requests ----------------- When proxying HTTP requests with the a ``HTTP`` packet, :program:`beng-proxy` forwards the request to the specified location (with headers filtered as described in :ref:`tfwdheader`), including the HTTP method and the request body. There is one exception: if ``PROCESS`` is enabled and a widget is focused (see :ref:`focus`), the other HTTP server receives a ``GET`` request without a body, because the focused widget is going to receive the request body. If the filter URL starts with a slash, :program:`beng-proxy` assumes it is the absolute path to a Unix socket. .. _t-cgi: CGI, FastCGI, WAS and Pipe -------------------------- The protocols CGI, FastCGI and WAS can be used to generate or filter resources (see :ref:`cgi` and :ref:`was`). A “pipe” can be used as a filter (see :ref:`pipe`). The following packets are used to choose the protocol: - ``CGI``: a local path which is executed as CGI script - ``FASTCGI``: a local path which is executed as FastCGI script. To connect to an existing FastCGI server, specify one or more ``ADDRESS`` packets. - ``WAS``: a local path which is executed as WAS application - ``PIPE``: a local program which reads input from stdin and prints the modified resource on stdout The following packets can be used to specify more details: - ``EXPAND_PATH``: Override the executable path with the given value (after expanding); see :ref:`tresponse`. - ``APPEND``: appends an argument to the command line - ``EXPAND_APPEND``: provide a cache expansion for the preceding ``APPEND`` - ``PAIR``: adds a FastCGI/WAS parameter in the form ``KEY=VALUE``. - ``EXPAND_PAIR``: provide a cache expansion for the preceding ``PAIR`` - ``SETENV``: adds an environment variable for CGI, FastCGI, WAS or LHTTP in the form ``KEY=VALUE``. - ``EXPAND_SETENV``: provide a cache expansion for the preceding ``SETENV`` - ``PATH_INFO``: optional URI substring which was left after finding the file - ``EXPAND_PATH_INFO``: Override the ``PATH_INFO`` with the given value. Backslash references are expanded to the value of the match group of ``REGEX``. In the presence of this packet, the URI suffix after the base will not be appended to other paths. (Since version 2.0.4) - ``DOCUMENT_ROOT``: set the document root passed to this CGI process - ``EXPAND_DOCUMENT_ROOT``: Override the ``DOCUMENT_ROOT`` with the given value. Backslash references are expanded to the value of the match group of ``REGEX``. (Since version 6.0) - ``INTERPRETER``: run a CGI script with the specified interpreter: invokes the specified interpreter with the mapped file path added as a command-line argument. This can be used to run Perl scripts without setting the “execute” bit. - ``ACTION``: run the specified CGI program instead of the mapped file. This program reads the mapped file path from ``SCRIPT_FILENAME`` and loads this script. This is modeled after the Apache directive ``Action``, and implements a protocol understood by PHP and COMA. - ``SCRIPT_NAME``: the ``SCRIPT_NAME`` environment variable for a CGI - ``EXPAND_SCRIPT_NAME``: Override the ``SCRIPT_NAME`` with the given value. Backslash references are expanded to the value of the match group of ``REGEX``. (Since version 4.0.33) - ``AUTO_BASE``: Auto-calculate the ``BASE`` from ``PATH_INFO`` (only CGI, FastCGI and WAS) - ``REQUEST_URI_VERBATIM``: Pass the CGI parameter ``REQUEST_URI`` verbatim instead of building it from ``SCRIPT_NAME``, ``PATH_INFO`` and ``QUERY_STRING``. (Since version 16.29) See :ref:`rlimits` for how to configure resource limits and :ref:`ns` for how to configure namespaces. Local HTTP ---------- \|l|X\| | ``APPEND``: appends an argument to the command line | ``EXPAND_APPEND``: provide a cache expansion for the preceding ``APPEND`` See :ref:`rlimits` for how to configure resource limits and :ref:`ns` for how to configure namespaces. .. _tfwdheader: Forwarding HTTP Headers ----------------------- There are two translation packets which control which HTTP headers are going to be forwarded: - ``REQUEST_HEADER_FORWARD``: this packet specifies which request headers are forwarded to the request handler. The payload is a list of group/mode pairs (``struct beng_header_forward_packet``). - ``RESPONSE_HEADER_FORWARD``: same as ``REQUEST_HEADER_FORWARD``, but applies to response headers forwarded to the client. Group is one of: - ``IDENTITY``: headers ``Via``, ``X-Forwarded-For``, ``X-CM4all-Generator`` - ``CAPABILITIES``: ``Server``, ``User-Agent``, ``Accept-*`` - ``COOKIE``: ``Cookie[2]``, ``Set-Cookie[2]`` - ``FORWARD``: forward information about the original request/response that would usually not be visible. If set to ``MANGLE``, then ``Host`` is translated to ``X-Forwarded-Host``. - ``CORS``: forward `CORS `__ request/response headers - ``SECURE``: forward “secure” request/response headers such as ``X-CM4all-BENG-User`` - ``SSL``: forward information about the SSL connection, i.e. ``X-CM4all-HTTPS`` (set to ``on`` if the request was received on a SSL/TLS connection, see :ref:`ssl`), ``X-CM4all-BENG-Peer-Subject`` and ``X-CM4all-BENG-Peer-Issuer-Subject`` (see :ref:`ssl_verify`) - ``TRANSFORMATION``: forward headers that affect the transformation (i.e. ``X-CM4all-View``) - ``LINK``: forward headers that contain links, such as ``Location``, ``Content-Location`` and ``Referer``; if set to ``MANGLE``, then :program:`beng-proxy` attempts to rewrite the ``Location`` URI relative to itself - ``AUTH``: forward HTTP authentication headers (e.g. basic/digest auth), such as ``WWW-Authenticate``, ``Authentication-Info`` and ``authorization``; if set to ``MANGLE``, then :program:`beng-proxy` allows the translation server to handle HTTP authentication. The default is ``NO`` for request headers and ``MANGLE`` for response headers. ``MANGLE`` on the request header settings generates an ``Autorization`` request header containing :samp:`bearer USER`, where ``USER`` is the current user as specified by the ``USER`` translation response packet. This can be used for servers which do not understand the ``X-CM4all-BENG-User`` request header (from header group ``SECURE``). - ``OTHER``: other end-to-end headers not explicitly mentioned here - ``ALL``: all of the above except for ``SECURE``, ``SSL`` and ``AUTH`` Mode is one of: - ``NO``: don’t forward the headers - ``YES``: forward the headers - ``MANGLE``: :program:`beng-proxy` processes the headers - ``BOTH``: both :program:`beng-proxy` and the backend server process the headers (special case for cookie headers, which is a combination of ``YES`` and ``MANGLE``) :program:`beng-proxy`\ ’s session management is only active when ``COOKIE`` is ``MANGLE`` (which is the default) or ``BOTH``. The behavior of the ``COOKIE`` setting on widgets is undefined. .. _rlimits: Resource Limits --------------- The packet ``RLIMITS`` specifies Linux resource limits for child processes. Its payload is a string, a sequence of resource limit codes and their respective limit values. The following resource limits are supported: - ``t`` (``CPU``): CPU time limit in seconds. - ``f`` (``FSIZE``): The maximum size of files that the process may create. - ``d`` (``DATA``): The maximum size of the process’s data segment. - ``s`` (``STACK``): The maximum size of the process stack, in bytes. - ``c`` (``CORE``): Maximum size of core file. - ``m`` (``RSS``): The limit of the process’s resident set, in pages. - ``u`` (``NPROC``): The maximum number of processes that can be created for the real user ID. - ``n`` (``NOFILE``): The maximum file descriptor number that can be opened by this process. - ``l`` (``MEMLOCK``): The maximum number of bytes of memory that may be locked into RAM. - ``v`` (``AS``): The maximum size of the process’s virtual memory (address space) in bytes. - ``i`` (``SIGPENDING``): The maximum number of signals that may be queued. - ``q`` (``MSGQUEUE``): The maximum number of bytes that can be allocated for POSIX message queues. - ``e`` (``NICE``): A ceiling to which the process’s nice value can be raised. - ``r`` (``RTPRIO``): Ceiling on the real-time priority that may be set for this process. The letter in the first column is the code for the payload, to be followed by ’!’ (for “unlimited”) or the numeric limit value (with optional prefix “K”, “M” or “G” for “kibi”, “mebi”, “gibi”). The limits are applied to both “soft” and “hard” by default. The code ``S`` changes all following specifications to “soft” only, and ``H`` does the same for “hard”. Example:: c!Sv1Gn256Hn512 Explanation: - ``c!`` unlimited core file size (both soft and hard) - ``S``: the following will be soft limits - ``v1G``: limit address space to :math:`1 GiB` (soft; the hard limit is unchanged) - ``n256``: maximum 256 file descriptors (soft) - ``H``: the following will be hard limits - ``n512``: maximum 256 file descriptors (hard) .. _ns: Namespaces ---------- Child processes such as FastCGI programs can run in separate Linux namespaces to improve separation from the rest of the server. That requires a fairly new Linux kernel. Articles on http://lwn.net/ on Linux namespaces: - `Namespaces in operation, part 1: namespaces overview `__ - `Namespaces in operation, part 3: PID namespaces `__ - `Namespaces in operation, part 4: more on PID namespaces `__ - `Namespaces in operation, part 5: User namespaces `__ - `Namespaces in operation, part 6: more on user namespaces `__ - `Network namespaces `__ User Namespaces ^^^^^^^^^^^^^^^ The translation packet ``USER_NAMESPACE`` launches the process in a new user namespace. This creates a new mapping for user ids inside this namespace. More importantly, this gives the process a full set of capabilities. This is a precondition for some of the other namespaces. Requires Linux 3.8 or newer. PID Namespaces ^^^^^^^^^^^^^^ The translation packet ``PID_NAMESPACE`` launches the process in a new PID namespace. This creates a new mapping for process ids inside this namespace. Only processes in this namespace are visible and only these can be killed. The translation packet ``PID_NAMESPACE_NAME`` reassociates the process with an existing PID namespace, selected by its name (in the payload). This requires the ``cm4all-spawn`` daemon, which manages PID namespaces. By default, other processes are actually still visible through :file:`/proc`. For complete PID namespace support, one would need to mount a new ``proc`` filesystem connected to the new namespace. Requires Linux 3.8 or newer. Cgroup Namespaces ^^^^^^^^^^^^^^^^^ The translation packet ``CGROOUP_NAMESPACE`` launches the process in a new cgroup namespace. Requires Linux 4.6 or newer. Network Namespaces ^^^^^^^^^^^^^^^^^^ The translation packet ``NETWORK_NAMESPACE`` launches the process in a new network namespace. Without further configuration, this leaves the process without access to the network, because there is no network device in the new namespace. The packet ``NETWORK_NAMESPACE_NAME`` instead reassociates the process with an existing network namespace configured with ``ip netns``. Requires Linux 2.6.29 or newer. Mount Namespaces ^^^^^^^^^^^^^^^^ A mount namespace makes the VFS mount table private to the new process. This namespace is created implicitly by the packets described in this section. - ``PIVOT_ROOT`` works like the ``chroot`` command; its payload specifies the directory which will be the new root. All other mounts will be removed from the namespace. The new root must contain a top-level directory called ``mnt``. It will be mounted read-only and with option ``nosuid``. - ``CHROOT`` is plain old ``chroot()``. Can be combined with ``PIVOT_ROOT``; and unlike that command, it does not need a top-level ``mnt`` directory. - ``MOUNT_ROOT_TMPFS`` creates an empty read-only ``tmpfs`` as the filesystem root. All required mountpoints will be created, but the filesystem will contain nothing else. - ``TMPFS_DIRS_READABLE``: Make all directories created in tmpfs (``MOUNT_ROOT_TMPFS``, ``MOUNT_EMPTY``) readable. By default, such directories are only "executable", but not "readable". - ``MOUNT_PROC`` mounts a new read-only instance of the ``proc`` filesystem. - ``MOUNT_DEV`` mounts a minimalistic :file:`/dev`. - ``MOUNT_HOME`` bind-mounts the home directory (specified by ``HOME``) to the given directory within the ``PIVOT_ROOT``. It will be mounted with option ``nosuid``. - ``MOUNT_TMP_TMPFS`` mounts a new ``tmpfs`` on :file:`/tmp`. This is private to the namespace and is deleted when the process exits. The payload may specify additional ``tmpfs`` mount options such as ``size=64M``. By default, code execution from this filesystem is disabled via ``MS_NOEXEC``. A follow-up ``MOUNT_TMP_TMPFS_EXEC`` packet disables this behavior, i.e. allows executing code from this ``tmpfs``. - ``MOUNT_TMPFS`` mounts a new (user-writable) ``tmpfs`` on the given path. This is private to the namespace and is deleted when the process exits. - ``MOUNT_NAMED_TMPFS`` mounts a new (user-writable) ``tmpfs`` on the given path that can be shared across processes. The payload is the name of the tmpfs source directory and the target directory (absolute path within the new root), separated by a null byte. The ``tmpfs`` will be deleted if it is not used for a certain amount of time. - ``MOUNT_EMPTY`` mounts a new (read-only) ``tmpfs`` on the given path. Inside this filesystem, mount points will be created automatically. Other than that, it can be used to hide parts of an existing filesystem. - ``BIND_MOUNT`` mounts arbitrary directories from the old root into the new root. The payload is the source directory and the target directory (absolute path within the new root), separated by a null byte. The new mount will have the options ``ro,noexec,nosuid,nodev``. The source directory is an absolute path on the host. If it is prefixed with ``container:``, it is relative to the new mount namespace, i.e. the container. The prefix ``host:`` is the same as no prefix. This (and all variants of this packet) may be followed by an empty ``OPTIONAL`` packet: if the source directory does not exist, this directive is ignored silently. - ``EXPAND_BIND_MOUNT`` is the same as ``BIND_MOUNT``, but the source directory is expanded using ``REGEX`` results. - ``BIND_MOUNT_RW`` and ``EXPAND_BIND_MOUNT_RW`` do the same, just in writable mode (mount option ``rw``). ``BIND_MOUNT_EXEC`` and ``EXPAND_BIND_MOUNT_EXEC`` omit the ``noexec`` option. ``BIND_MOUNT_RW_EXEC`` makes the mount both writable and executable. - ``BIND_MOUNT_FILE`` mounts a (read-only, non-executable) regular file onto an existing regular file. The payload is the source path (absolute within the old root) and the target path (absolute within the new root), separated by a null byte. ``BIND_MOUNT_FILE_EXEC`` omits the ``noexec`` option. - ``MOUNT_LISTEN_STREAM`` creates a stream listener socket and mounts it at the specified path into the container. Once the first process connects to this socket, :program:`beng-proxy` sends a request to the translation server echoing just this packet; its response may contain one of: - ``STATUS``: an error condition. - ``EXECUTE``: a process to be spawned which starts with the listener socket on stdin. - ``ACCEPT_HTTP``: create a transient HTTP listener which receives HTTP requests from the child process; a ``LISTENER_TAG`` packet may be present which will be echoed on all translation requests for this listener. If ``STATS_TAG`` is present, it will be used instead of ``LISTENER_TAG`` for Prometheus metrics. The payload is the socket path inside the new mount namespace. After the socket path, a null byte may follow with opaque data which is ignored by :program:`beng-proxy`, but which may be evaluated by the translation server. - ``WRITE_FILE`` write a small text file in a mount namespace. Payload is the absolute path and the file contents separated by a null byte. The file can either be written to a ``tmpfs`` that was already mounted, or bind-mounted over an existing read-only file. - ``SYMLINK``: Create a symlink. Payload is target and linkpath separated by a null byte. - ``PIVOT_ROOT`` depends on user namespaces. ``MOUNT_PROC``, ``MOUNT_HOME`` and ``MOUNT_TMP_TMPFS`` depend on ``PIVOT_ROOT``, user namespaces and PID namespaces. UTS Namespaces ^^^^^^^^^^^^^^ A UTS namespace allows manipulating the host name reported by the kernel. ``UTS_NAMESPACE`` creates the namespace; its payload is the new host name. Namespaces Summary ^^^^^^^^^^^^^^^^^^ The following example describes part of a translation packets that attempts to execute a child process as securely as possible:: USER_NAMESPACE PID_NAMESPACE NETWORK_NAMESPACE PIVOT_ROOT "/var/lib/lxc/wheezy/rootfs" HOME "/var/www/foo" MOUNT_HOME "/home/www" The child process cannot see or kill processes processes other than the ones that were started by itself. It cannot access the network. It lives in another filesystem namespace. It can access the directory :file:`/var/www/foo` at :file:`/home/www`. The ``proc`` filesystem is not mounted. Cgroups ------- Control cgroups (“cgroups”) are a Linux kernel feature for grouping processes. They are useful in many ways, such as assigning/accounting resources (CPU, memory, network bandwidth, ...). :program:`beng-proxy` can use ``cgroups`` only when launched with ``systemd``. ``CGROUP`` specifies a ``cgroup`` name for the new child process. It is a name below :program:`beng-proxy`\ ’s own cgroup assigned by ``systemd``. All controllers managed by ``systemd`` are enabled. ``CGROUP_SET`` set a cgroup attribute. Payload is in the form ``controller.name=value``, e.g. ``cpu.shares=42``. ``CGROUP_XATTR`` set an extended attribute on the cgroup directory. Payload is in the form ``namespace.name=value``, e.g. ``user.account_id=42``. .. _childoptions: Other Child Process Options --------------------------- - ``UID_GID`` specifies (effective) uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers. All selected users and groups must be explicitly allowed with the ``user`` and ``group`` settings in the ``spawn`` configuration. The default is to run child processes with the same unprivileged credentials as :program:`beng-proxy` itself (or the one specified with ``--spawn-user``). - ``MAPPED_UID_GID`` is like ``UID_GID``, but these are the numbers visible inside the user namespace. Currently, only the uid is implemented, therefore the payload must be a 32-bit integer. - ``REAL_UID_GID`` specifies the real uid and gid for the child process. Payload is either one or two 32 bit integers. Defaults to the ``UID_GID`` value. This feature works only if https://lore.kernel.org/linux-security-module/20250306082615.174777-1-max.kellermann@ionos.com/ is applied. Without it, the kernel will revert the euid on ``execve()``. - ``MAPPED_REAL_UID_GID`` adds user namespace mappings for ``REAL_UID_GID``. Currently, only the uid is implemented, therefore the payload must be a 32-bit integer. - ``CAP_SYS_RESOURCE`` grants the new child process the CAP_SYS_RESOURCE capability, allowing it to ignore filesystem quotas. It is not possible to use it together with user namespaces (``USER_NAMESPACE``). - ``NO_NEW_PRIVS`` permanently disables new privileges for the child process. That is, ``setuid`` and ``setgid`` bits are ignored on executed programs. It is recommended to set this flag on **all** processes by default, unless there are strong reasons against it. - ``FORBID_USER_NS`` forbids the child process to create new user namespaces and thus gaining a full set of capabilities. This is useful because there have been lots of namespace-related vulnerabilities in the kernel. - ``FORBID_MULTICAST`` forbids the child process to add multicast group memberships. This is useful because it disallows snooping on the host’s multicast traffic. - ``FORBID_BIND`` makes ``bind()`` and ``listen()`` return ``EACCES``. - ``ALLOW_PTRACE`` allows the child process to use the ``ptrace()`` and similar system calls which are disallowed by default. - ``STDERR_PATH`` specifies an absolute path that will be created. The child’s error messages will be appended there. ``STDERR_NULL`` redirects standard error to :file:`/dev/null` instead. - ``STDERR_POND`` enables the ``child_error_logger`` when it was disabled with ``is_default="no"`` (see :ref:`child_error_logger`). - ``CHILD_TAG`` specifies a “tag” string for the child process. This can be used to address groups of child processes (e.g. for :ref:`FADE_CHILDREN `). A child process may have more than one tag. .. _filter: Filters ------- The translation server can tell :program:`beng-proxy` to apply a filter to the resource by sending the ``FILTER`` command. It is followed by a packet specifying the filter server (``HTTP``, ``CGI``, ``FASTCGI``, ``PIPE``). A filter server is a HTTP server. :program:`beng-proxy` sends the original resource with a POST request and expects the filtered resource as response. If the filter returns status ``200 OK`` or ``204 No Content``, then the previous status code is used instead. It is important that a filter is completely stateless. Running the same filter twice on the same source must always render the same result, at any time. There may be more than one filter; the order of the ``PROCESS`` and ``FILTER`` packets is important. According to the HTTP specification, ``POST`` requests are not cached. To gain the necessary performance, :program:`beng-proxy` caches filter results, extending the HTTP specification. This is limited to resources which have an *ETag* response header, because :program:`beng-proxy` uses the *ETag* internally to address cache items. .. _chain: Chains ------ Chained request handlers behave similar to ``FILTER``: the current handler's response is passed to the next handler as ``POST`` request. But unlike ``FILTER``, :program:`beng-proxy` waits for the current handler to generate the response, and only then asks the translation server for further instructions. This is useful in situations where one handler prepares something which the translation server needs to decide about the next stage. To enable chaining, the translation sends a response specifying the request handler plus a ``CHAIN`` packet with opaque payload. Once that request handler has generated the response, :program:`beng-proxy` sends another translation request containing a copy of the ``CHAIN`` packet and a ``STATUS`` packet. Additionally, the ``CHAIN_HEADER`` may contain the value of the ``X-CM4all-Chain`` response header, if one exists in the current HTTP response. Now the translation server generates another request handler, or ``BREAK_CHAIN`` to send the pending response to the browser as-is. Example:: request 1: URI "/chain/" HOST "example.com" ... response 1: HTTP "http://foo/bar/" CHAIN "42" request 2: CHAIN "42" CHAIN_HEADER "xyz" STATUS "200" response 2: WAS "/the/filter/program" If the response packet ``CHAIN`` is followed by an empty ``TRANSPARENT_CHAIN`` packet, the chain handler will only see a ``GET`` request without a body, and the original request method/body will be sent to the following request handler. In that case, the chain handler's response body will be ignored. .. _sessions: Sessions -------- :program:`beng-proxy` lets the translation server manage a “session” variable, which may be empty, or contain an opaque string. It is up to the translation server to manage its contents. With every translation request, :program:`beng-proxy` sends its contents unless it is empty (in which case it omits this parameter). With every response, the translation server may provide a new value (which may be empty). Additionally, the ``REALM_SESSION`` packet may contain a value that is specific to the session realm. It is only sent to the translation server in ``TOKEN_AUTH`` requests. External Session Manager ^^^^^^^^^^^^^^^^^^^^^^^^ Sometimes, the translation server involves an external entity in its session management, for example to handle authentication. The translation server can then ask :program:`beng-proxy` to handle refreshes by sending a ``GET`` to a specified HTTP server. The packet ``EXTERNAL_SESSION_MANAGER`` contains the HTTP URL, and must be followed by one or more ``ADDRESS`` packets (just like the ``HTTP`` packet). After that, the packet ``EXTERNAL_SESSION_KEEPALIVE`` may contain a 16 bit integer specifying the refresh interval in seconds. The refresh is performed only while handling a request for this session. Example:: EXTERNAL_SESSION_MANAGER=http://foo/session/42 ADDRESS=192.168.1.100:80 EXTERNAL_SESSION_KEEPALIVE=300 This example sends a ``GET`` request every 5 minutes to ``http://foo/session/42`` on IP address ``192.168.1.100``. .. _ctlookup: ``Content-Type`` Lookup ----------------------- The presence of ``CONTENT_TYPE_LOOKUP`` in a translation response indicates that the translation server is willing to look up ``Content-Type`` by file name suffix. It will disable the normal lookup via *extended attributes*. When a HTTP request for a static file is handled, :program:`beng-proxy` will check if the file name has a “suffix” (short alphanumeric name after a dot). If will ask the translation server for a ``Content-Type`` for this suffix. This translation request contains the packets ``CONTENT_TYPE_LOOKUP`` (echoing the server’s packet) and ``SUFFIX`` (containing the non-empty suffix without the dot). Example conversation: - client sends ``BEGIN`` “\\x03” - client sends ``CONTENT_TYPE_LOOKUP`` “foo” - client sends ``SUFFIX`` “png” - client sends ``END`` - server sends ``BEGIN`` “\\x03” - server sends ``CONTENT_TYPE`` “image/png” - server sends ``END`` If the suffix is unknown, the translation server may omit the ``CONTENT_TYPE`` packet and only reply with ``BEGIN`` and ``END``. ``AUTO_GZIPPED`` and ``AUTO_BROTLI_PATH`` may be specified if this file type is likely to have a precompressed file in the same directory. Additionally, the translation server may specify transformations (``PROCESS`` or ``FILTER``) for all files of this type. They will be applied before other transformations from the original translation response. .. _errdoc: Error documents --------------- Errors from remote servers are forwarded to the client. If no error document is available, :program:`beng-proxy` generates a simple one. The translation server indicates that it is willing to override the error document by sending an empty ``ERROR_DOCUMENT`` packet in the translation response. As soon as an error occurs (response status 400..599), :program:`beng-proxy` sends another translation request, consisting of ``ERROR_DOCUMENT``, ``URI`` and ``STATUS``. The payload of ``ERROR_DOCUMENT`` is opaque to :program:`beng-proxy`, and will be echoed. The translation server responds with a pointer to another resource which shall be used as the error document. If the translation response is empty, or if the error document itself fails, :program:`beng-proxy` forwards the original error document (or generates one). The error document cannot be filtered or processed. CSRF Protection --------------- To help applications fix cross-site request forgery vulnerabilities, :program:`beng-proxy` implements the ``X-CM4all-CSRF-Token`` header. This feature needs to be enabled explicitly with the following packets: - ``REQUIRE_CSRF_TOKEN`` requires a valid token request header for modifying requests (``POST``, ``PUT`` etc.). This option is not only supported for regular HTTP requests, but also for widgets (for modifying requests to widgets). This requirement only applies to requests with a session cookie. Requests without a session are assumed to be harmless, because there is no authenticated identity associated with it. - ``SEND_CSRF_TOKEN`` adds a valid token header to successful responses. This option is not supported for widgets. Covert cross-site requests don't have this header (with a valid value) will be denied with status ``403 Forbidden``, effectively avoiding this kind of vulnerability. Clients can obtain a token by inspecting the response header of a request to a location with ``SEND_CSRF_TOKEN`` enabled. They may then use this token in subsequent modifying requests to ``REQUIRE_CSRF_TOKEN`` locations. This token is specific to the session and expires after a while (currently an hour). It can be reused until it expires. Since this is implemented as a header, this cannot be used for plain ``HTML FORM`` requests. If the client is a browser, it is necessary to use the ``XMLHttpRequest`` or ``Fetch`` API which allows sending custom headers. .. _registry: Widget registry --------------- The translation server provides access to the widget database, where all widget servers are registered. A widget request can use the following packets: - ``WIDGET_TYPE``: the name of the widget type The translation server’s response consists of these packets: - ``STATUS``: in case of a lookup error, this packet provides the HTTP status code - ``PATH``, ``CGI``, ``HTTP``: choose one of these packets: a static widget (local file path), a local CGI script, or a HTTP server - ``PROCESS``: enable the BENG processor - ``UNTRUSTED``: sets the externally visible host name for requests which are proxied to this widget. This marks the widget as “untrusted” and disallows any other way of embedding it. This is useful for widget code whose JavaScript must not be executed in the same context as another widget. - ``UNTRUSTED_PREFIX``: same as ``UNTRUSTED``, but is a prefix for the request host name. This widget can only be used when the request’s ``UNTRUSTED`` packet begins with this prefix. Example: ``UNTRUSTED_PREFIX="foo"`` matches a request with ``UNTRUSTED="foo.example.com"``, but not ``UNTRUSTED="foobar.example.com"``. - ``UNTRUSTED_SITE_SUFFIX``: similar to ``UNTRUSTED_PREFIX``, but matches the suffix instead of the prefix. When generating untrusted URIs, the site name is prepended. During verification, the request’s ``UNTRUSTED`` value must exactly match this scheme. - ``UNTRUSTED_RAW_SITE_SUFFIX``: Like ``UNTRUSTED_SITE_SUFFIX``, but do not insert a dot. - ``DIRECT_ADDRESSING``: Enable “direct” URI addressing for this widget. It is used when the widget is requested in a “frame”. It is a simpler scheme that is more natural; relative links can be built without URI rewriting and without the special :program:`beng-proxy` encoding. In some cases, the processor can therefore be disabled, reducing overhead. - ``STATEFUL``: Remember the state of this widget, i.e. path info and query string. It is remembered for ``GET`` requests to the widget when it is focused and the XML processor is enabled. ``POST`` requests do not update the state because the ``POST`` URI may not be valid in a follow-up ``GET`` request. AJAX requests on the other hand should not update the state, and they do not because they usually do not use the XML processor, which is only useful for generating the initial HTML page, and not for incremental (AJAX) updates. - ``WIDGET_INFO``: Send the request headers ``X-CM4all-Widget-Id``, ``X-CM4all-Widget-Type`` and ``X-CM4all-Widget-Prefix`` to the widget server. (Since version 1.3.2) .. _local_uri: - ``LOCAL_URI``: The URI of the "local" location of a widget class. This may refer to a location that serves static resources. It is used by the processor for rewriting URIs beginning with ``@/`` (see :ref:`uriat`). The payload must end with a slash. :program:`beng-proxy` does not process this URI. It is going to be evaluated by the browser, and may be absolute. For example, it may refer to a dedicated resource server. - ``DUMP_HEADERS``: Enable header dumps for the widget: on a HTTP request, the request and response headers will be logged. Only for debugging purposes. - ``PEEK``: Mark this request as a "peek" request, which means the server shall generate the translation response, but shall not account it (e.g. shall not mark a ticket as "consumed"). .. _login: Login translation ----------------- To support interactive login, the translation server can implement this protocol. It translates a user name to information on how to launch the user’s processes. The request contains the following packets: - ``LOGIN``: Marks this request as a “login” request. No payload. - ``SERVICE``: Payload specifies the service that wants to log in. Examples for well-known service names: - ``ssh``: Secure Shell. The response describes how to execute commands in a SSH sesion channel. - ``sftp``: SSH File Transfer Protocol, i.e. SSH subsystem ``sftp``. - ``rsync``: rsync over SSH. This request is sent by `Lukko `__ when it sees a ``rsync --server`` command. The response contains an ``EXECUTE`` packet with a path to a statically linked ``rsync`` executable that will be executed using ``execveat()``. - ``LISTENER_TAG``: A string which specifies the listener this login was accepted on; this is optional and its configuration is specific to the translation client. - ``USER``: Contains the user name specified by the client. - ``PASSWORD``: If this packet is present, then the client asks to verify a password (clear-text in the payload). A password mismatch must result in a negative reply. If the user does not exist, the translation server shall respond with ``STATUS=404``. A successful response must contain at least ``HOME`` and ``UID_GID``: - ``HOME``: Path of the user’s home directory. - ``SHELL``: An absolute path specifying the user’s shell. - ``UID_GID``: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers. - ``TOKEN``: A token to be matched by the OpenSSH configuration file. - ``NO_PASSWOORD``: If present, then the ``LOGIN`` request can be approved without a password. This can happen when the username is a secret token. An optional payload may describe a service-specific limitation, e.g. ``sftp`` to limit ``LOGIN/SERVICE=ssh`` to ``SERVICE=sftp``. - ``AUTHORIZED_KEYS``: The contents of an OpenSSH :file:`authorized_keys` file. - ``NO_HOME_AUTHORIZED_KEYS``: If present, then :file:`~/.ssh/authorized_keys` is not used. - ``SERVICE``: Begin a new partition of the response for the specified service. The translation server can do this to send an individual response for all supported services in a single response. This is useful if the request was ``SERVICE=ssh`` when the client (i.e. the SSH server, i.e. `Lukko `__) doesn't yet know whether the SSH client will open a shell or a SFTP session. Returning all possible services eliminates further translation requests: the translation server promises that these are the only allowed services (in the context of the ``SERVICE`` specified in the request) and all other services shall be denied. .. _cron: Cron translation ---------------- This sub-protocol can tell the ``cron`` job execution layer of *Workshop* how to spawn a child process. The request contains the following packets: - ``CRON``: Marks this request as a “cron” request. The payload is the name of the ``cron`` section in Workshop’s configuration file, or none if none was specified there. - ``URI``: If the job refers to a URN instead of a command, then this packet is present and contains the URN. A successful response must specify the program to be executed in ``EXECUTE`` with command-line arguments in ``APPEND`` packets. - ``USER``: The account id owning the job. - ``PARAM``: An opaque string from the cron job table. Its contents are specific to the translation server. Its contents should be considered user input, and should not be trusted. Optional. If the account does not exist, the translation server shall respond with ``STATUS=404``. If no ``STATUS`` packet is present, the request is assumed to be successful. A successful response must contain at least ``HOME`` and ``UID_GID``: - ``HOME``: Path of the user’s home directory. - ``UID_GID``: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers. Additional packets may configure resource limits (:ref:`rlimits`, :ref:`ns`) and so on (:ref:`childoptions`). The client may assume that all responses may be cached indefinitely. .. _execute: Execute Translation ------------------- This sub-protocol is used to query how to spawn a process which was requested to be executed. The request contains the following packets: - ``EXECUTE``: Marks this request as an "execute" request. The payload is a token describing which process shall be executed. This token was provided by an unprivileged process and should not be trusted. - ``PARAM``: An opaque parameter with more details about the process. This parameter was provided by an unprivileged process and should not be trusted. - ``SERVICE``: Payload specifies the service that wants to execute the process, e.g. :samp:`workshop`. - ``LISTENER_TAG``: A tag which was set in the client's configuration file. - ``PLAN``: If this request was triggered by a `Workshop `__ plan, then this is its name. A successful response contains at least ``EXECUTE`` with the path of the program to be spawned, plus :ref:`the usual process parameters `. A failed response contains ``STATUS`` and optionally ``MESSAGE``. - ``HOME``: Path of the user’s home directory. - ``UID_GID``: Specify uid and gid (and supplementary groups) for the child process. Payload is an array of 32 bit integers. .. _pooltrans: Pool translation ---------------- This sub-protocol is used :program:`beng-lb`. It allows the translation server to choose a pool which shall handle a specific HTTP request. The request contains the following packets: - ``POOL``: Marks this request as a “pool request. The payload is the name of the ``translation_handler`` section in ``lb.conf``. - ``HOST``: the ``Host`` HTTP request header The response contains the following packets: - ``POOL``: The name of the pool (or ``branch`` or ``lua_handler`` ...) which shall handle the HTTP request. - ``CANONICAL_HOST``: A string which shall be used instead of the ``Host`` request header for the “host” sticky mode. - ``SITE``: Optional identification or name of the site this resource belongs to. It has no meaning for :program:`beng-lb`, and is only used for ``TCACHE_INVALIDATE``. - ``STATUS``: Can be used instead of ``POOL`` to generate a brief error response. - ``REDIRECT``: Can be used instead of ``POOL`` to generate a redirect response (``303 See Other`` with the specified ``Location`` header value). Can be combined with ``STATUS`` to select a different status code. - ``HTTPS_ONLY``: See page . - ``MESSAGE``: Can be used instead of ``POOL`` to generate a ``text/plain`` response. Can be combined with ``STATUS`` and ``REDIRECT``. - ``VARY``: See page . - ``ARCH``: Prefer this CPU architecture for the selected pool member. Payload can be ``amd64`` or ``arm64``. If no member with a matching architecture exists, the behavior is unspecified; the request may fail or be forwarded to a server with a mismatching architecture. (This is implemented for ``rendezvous_hashing`` only.) The client may assume that all responses may be cached indefinitely.