beng-proxy¶
Features¶
beng-proxy delivers resources via HTTP. In the most simple form, it it provides a resource in pass-through mode, acting as an HTTP proxy.
It caches resources if possible.
It can filter any resources by POSTing it to a HTTP server, e.g. to apply XSLT to a XML resource.
On HTML resources, it can apply a simple template language. This language provides commands to insert another HTML page, which is called Widget.
Widgets¶
A Widget is an object which can be inserted into a web site. It is rendered by a Widget server into HTML.
We do not assume that we can trust the widget server. As a consequence, we have to ensure that a malicious widget server cannot compromise the security of beng-proxy, the client or even other widget servers.
There is a global registry for well-known preconfigured widgets. The user can also choose to run his own (non-registered) widget server. In fact, any public HTTP server should be able to act as a widget server.
JavaScript¶
Since all widgets are put together into a single HTML page, all of the
JavaScript runs in the same security context. That will open the door
for malicious widget servers, which are now able to take over the full
web site, including all other widgets. For that reason, only
well-known and trusted widget servers should be allowed to be
inlined. All other widget must be embedded in an IFRAME in another
domain.
Forms¶
beng-proxy itself does not use the query string and the request body. Both is forwarded to the “focused” widget. See Focus for information on widget focus.
Installation¶
beng-proxy requires a Debian Bullseye operating system: Linux kernel 5.10 glibc 2.31. For compiling the source code, you need a C++23 compiler, e.g. gcc 12.
Install the package cm4all-beng-proxy and the translation server of
your choice.
Configuration¶
The file /etc/cm4all/beng/proxy/beng-proxy.conf contains
beng-proxy’s configuration. The following options are
available:
@include¶
Include another file. Example:
@include "foo/bar.conf"
@include_optional "foo/may-not-exist.conf"
@include "wildcard/*.conf"
The second line silently ignores non-existing files.
The third line includes all files in the directory wildcard ending
with .conf.
The specified file name may be relative to the including file.
Variables (@set)¶
Set a variable. Within double-quoted strings, variables can be expanded
with ${name}. Example:
@set foo = "192.168.1.42"
@set bar = "${foo}:80"
listener {
bind "${bar}"
}
At the time of this writing, the concept of variables is not well-implemented. For example, (backslash) escape sequences don’t work, and the scope of variables is not defined. For now, use variables only for very simple things.
Translation Servers¶
The setting translation_socket specifies the translation server’s
socket. It can be specified multiple times to support
translation deferral. Example:
translation_socket "@translation1"
translation_socket "@translation2"
The default is @translation.
listener¶
Listen for HTTP requests on the configured address. Example:
listener {
bind "*:80"
tag "foo"
zeroconf_service "beng-proxy"
}
This binds to all interfaces on port 80. The (optional) tag is set to “foo”.
Known attributes:
bind: an address to bind to. May be the wildcard*or an IPv4/IPv6 address followed by a port. If you omit the port number, it will default to 80. Specifying port 0 will auto-select a free port (which makes sense only if you publish the listener with Zeroconf). IPv6 addresses should be enclosed in square brackets to disambiguate the port separator. Local sockets start with a slash/, and abstract sockets start with the symbol@.interface: limit this listener to the given network interface.mode: for local socket files, this specifies the octal file mode.mptcp:yesenables Multi-Path TCPack_timeout: close the connection if transmitted data remains unacknowledged by the client for this number of seconds. By default, dead connections can remain open for up to 20 minutes.keepalive:yesenables the socket optionSO_KEEPALIVE. This causes some traffic for the keepalive probes, but allows detecting disappeared clients even when there is no traffic.v6only:nodisables IPv4 support on IPv6 listeners (IPV6_V6ONLY). The default isyes.reuse_port:yesenables the socket optionSO_REUSEPORT, which allows multiple sockets to bind to the same port.free_bind:yesenables the socket optionIP_FREEBIND, which allows binding to an address which does not yet exist. This is useful when the daemon shall be started before all network interfaces are up and configured.tag: a tag, to be passed to the translation server in a LISTENER_TAG packet.access_logger:nodisables the access logger on this listener. A value other thanyesornoselects a namedaccess_loggerblock (see Logging Protocol).access_logger_only_errors:yeslimits the access log to failed requests (HTTP status 4xx and 5xx).auth_alt_host:yesforwards the value of theX-CM4all-AltHostrequest header to the translation server inAUTHrequests.ssl:yesenables SSL/TLS.ssl_cert: add a certificate/key pair to the listener. Ifsslis enabled, at least one pair must be configured; if there is more than one, the server will choose one according to the SNI parameter received from the client.ssl_verifyandssl_ca_certcan be used to enable client certificate verification (see Client Certificates for details). To generate the request headersX-CM4all-BENG-Peer-SubjectandX-CM4all-BENG-Peer-Issuer-Subject, theSSLrequest header group must be set toMANGLE(see Forwarding HTTP Headers).zeroconf_service: if specified, then register this listener as Zeroconf service in the local Avahi daemon. This can be used by beng-lb to discover pool members.zeroconf_domain(optional): The name of the Zeroconf domain.zeroconf_interface: publish the Zeroconf service only on the given interface.zeroconf_protocol(optional): Publish only protocolinetorinet6.zeroconf_weight: publish the Zeroconf service with the specified “weight”, i.e. ask beng-lb to use this weight when choosing nodes (works only withrendezvous_hashing). The value is a decimal number; the implied default value is1.0. For example, if you specify0.5, you expect this node to get only half as many requests as others.translation_socket: if at least one is specified, then this translation server is used instead of one from the global configuration (see Translation Servers).
ssl_client¶
Configures the SSL/TLS client (for HTTPS). Example:
ssl_client {
cert "/etc/ssl/certs/ssl-cert-snakeoil.pem" "/etc/ssl/private/ssl-cert-snakeoil.key"
}
The section contains a cert line for each client certificate to be
used for outgoing SSL/TLS connections. Each time a server asks for a
client certificate, beng-proxy will look for a matching
certificate for the requested certificate authority.
Instead of letting beng-proxy choose a matching certificate, the translation server can specify a certificate by its name. To give a certificate a name, add a third parameter:
ssl_client {
cert "/etc/ssl/certs/ssl-cert-snakeoil.pem" "/etc/ssl/private/ssl-cert-snakeoil.key" "thename"
}
Now the translation server can send the CERTIFICATE packet with
payload thename to select this certificate.
control¶
See Configuring.
spawn¶
Configures the process spawner. Example:
spawn {
default_user "www-data"
allow_user "www-data"
allow_group "www-data"
CPUWeight "50"
TasksMax "100"
MemoryHigh "12 GB"
MemoryMax "16 GB"
IOWeight "50"
}
default_user: a user name which is used if the translation server does not specify a user id.allow_user: allow child processes to impersonate the given user. This can be a user name (from/etc/passwd), a numeric user id or an open range (e.g. 2147483648- which allows all user ids from 2147483648 on).allow_group: allow child processes to impersonate the given group.cgroups_writable_by_group: make this group the owner of all cgroups and grant the group write access.CPUWeight: CPU weight for all spawned processes combined (\(1..10000\)). systemd’s default is 100.TasksMax: maximum number of tasks (\(1..\)). systemd sets no limit by default.MemoryMin: “If the memory usage of a cgroup is within its effective min boundary, the cgroup’s memory won’t be reclaimed under any conditions. If there is no unprotected reclaimable memory available, OOM killer is invoked.” (https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files)MemoryLow: “Best-effort memory protection. If the memory usage of a cgroup is within its effective low boundary, the cgroup’s memory won’t be reclaimed unless there is no reclaimable memory available in unprotected cgroups.” (https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files)MemoryHigh: “Specify the throttling limit on memory usage of the executed processes in this unit. Memory usage may go above the limit if unavoidable, but the processes are heavily slowed down and memory is taken away aggressively in such cases. This is the main mechanism to control memory usage of a unit.” (systemd.resource-control(5))MemoryMax: “Specify the absolute limit on memory usage of the executed processes in this unit. If memory usage cannot be contained under the limit, out-of-memory killer is invoked inside the unit.” (systemd.resource-control(5))MemorySwapMax: “Swap usage hard limit. If a cgroup’s swap usage reaches this limit, anonymous memory of the cgroup will not be swapped out.” (https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files)IOWeight: IO weight for all spawned processes combined (\(1..10000\)). systemd’s default is 100.
Memory limits are in bytes and may be postfixed with kB, MB,
GB or TB. Percent values are relative to total physical
memory.
set¶
Tweak global settings. Most of these are legacy from the old –set
command-line option. Do not confuse with @set, which sets
configuration parser variables! Syntax:
set NAME = "VALUE"
The following settings are available:
session_cookie: The name of the session cookie. The default value isbeng_proxy_session.session_cookie_same_site: Enable theSameSiteattribute in the session cookie (see RFC 6265 5.3.7). Supported values arestrict,laxandnone.dynamic_session_cookie: Append a suffix to the session cookie generated from theHostrequest header if set toyes. This is a measure to increase sessions separation of different hosts under the same domain, accounting for mainstream user agents that are known to ignore theDomaincookie attribute. It is not guaranteed to be collision-free.session_idle_timeout: After this duration, a session expires, unless it gets refreshed by a request. Example:30 minutes.max_connections: The maximum number of incoming HTTP connections.tcp_stock_limit: The maximum number of outgoing TCP connections per remote host. 0 means unlimited, which has shown to be a bad choice, because many servers do not scale well.lhttp_stock_limit: The maximum number of LHTTP process copies. 0 means unlimited.lhttp_stock_max_idle: The maximum number of idle LHTTP process copies. If there are more than that, a timer will incrementally kill excess processes.fastcgi_stock_limit: The maximum number of child processes for one FastCGI application. 0 means unlimited.fastcgi_stock_max_idle: The maximum number of idle child processes for one FastCGI application. If there are more than that, a timer will incrementally kill excess processes.was_stock_limit: The maximum number of child processes for one WAS application. 0 means unlimited.was_stock_max_idle: The maximum number of idle child processes for one WAS application. If there are more than that, a timer will incrementally kill excess processes.multi_was_stock_limit: The maximum number of child processes for one Multi-WAS application. 0 means unlimited.multi_was_stock_max_idle: The maximum number of idle child processes for one Multi-WAS application. If there are more than that, a timer will incrementally kill excess processes.remote_was_stock_limit: The maximum number of Multi-WAS connections to one Remote-WAS application. 0 means unlimited.remote_was_stock_max_idle: The maximum number of idle Multi-WAS connections to one Remote-WAS application. If there are more than that, a timer will incrementally kill excess connections.{lhttp,fastcgi,was,multi_was,remote_was}_stock_max_wait: If the stock wait time goes above this threshold, then further waiters fail with HTTP status “418 Too Many Requests”. Value is a duration followed by a unit, e.g.5sor800ms. By default, there is no maximum wait time.http_cache_size: The maximum amount of memory used by the HTTP cache. Set to 0 to disable the HTTP cache.http_cache_obey_no_cache: Set tonoto ignoreno-cachespecifications inPragmaandCache-Controlrequest headers.filter_cache_size: The maximum amount of memory used by the filter cache. Set to 0 to disable the filter cache.encoding_cache_size: The maximum amount of memory used by the encoding cache (which caches compressed responses). Set to 0 to disable the encoding cache.translate_cache_size: The maximum number of cached translation server responses. Set to 0 to disable the translate cache.translate_stock_limit: The maximum number of concurrent connections to the translation server. Set to 0 to disable the limit. The default is 64.populate_io_buffers:yespopulates all I/O buffers on startup. This reduces waits for Linux kernel VM compaction/migration.populate_translate_cache,populate_http_cache,populate_filter_cache,populate_encoding_cache: likepopulate_io_buffers, but for the respective cache subsystem.use_io_uring: Set tonoto disable the use ofio_uring, which can make debugging withstraceeasier, becausestracecannot seeio_uringoperations. This is the global knob; withno, all otherio_uringsettings mentioned below have no effect.http_io_uring: Enables or disablesio_uringfor HTTP connections.was_io_uring: Enables or disablesio_uringfor communication with WAS applications.io_uring_sqpoll: Enablesio_uringsubmit-queue polling. This reduces the number ofio_uring_enter()system calls at the cost of a kernel thread running at 100% all the time, busy-polling for new entries.io_uring_sq_thread_cpu: Bind theio_uring_sqpollkernel worker thread to the specified CPU.verbose_response: Set toyesto reveal internal error messages in HTTP responses.session_save_path: A file path where all sessions will be saved periodically and on shutdown. On startup, it will attempt to load the sessions from there. This option allows restarting the server without losing sessions.
All memory sizes can be suffixed using kB, MB or GB.
Cluster Options¶
To run beng-proxy as a beng-lb cluster node with sticky sessions, each node needs special configuration. It needs to generate new session numbers in a way that allows beng-lb to derive the cluster node from it.
To do that, specify the two command line options --cluster-size
and --cluster-node to each beng-proxy node. Example for
a cluster with 3 nodes:
first# cm4all-beng-proxy --cluster-size=3 --cluster-node=0 ...
second# cm4all-beng-proxy --cluster-size=3 --cluster-node=1 ...
third# cm4all-beng-proxy --cluster-size=3 --cluster-node=2 ...
Each node number is assigned to exactly one cluster node.
The according lb.conf would look like this:
pool foo {
sticky "session_modulo"
member first:http
member second:http
member third:http
}
The ordering of nodes matters. beng-lb assumes that the
first node runs with --cluster-node=0, the second node runs with
--cluster-node=1 and so on.
Running¶
Signals¶
SIGTERM on the master process initiates shutdown.
On SIGHUP, the error log file is reopened, all caches are flushed
and all spawned child processes are faded out (see
FADE_CHILDREN).
Triggers¶
The Debian trigger cm4all-apps-changed reloads all spawned
applications. It shall be invoked after updating application packages
(or widgets).
Tuning¶
Optimized Build¶
The default package cm4all-beng-proxy is built with debugging code
enabled. It is about 2-10 times slower than the optimized build. If
performance really counts, you should install the package
cm4all-beng-proxy-optimized instead (and restart the daemon).
To switch back to the debug build, uninstall
cm4all-beng-proxy-optimized and then reinstall cm4all-beng-proxy
to get the old /usr/sbin/cm4all-beng-proxy back. Finally, restart
the daemon.
Resource Limits¶
beng-proxy needs to open a lot of file handles at a time, because it serves many connections in one process. Make sure that the file handle limit is adequate. The default init script sets it to 65536. The only reason set that limit at all is to detect bugs (file descriptor leaks).
Keep in mind that beng-proxy may open more than one file descriptor per connection. For example, a connection to a WAS application needs 3 file descriptors.
Connection Limits¶
beng-proxy is very good at managing lots of incoming connections, and manages system resources economically. The default value is 8192.
There are good reasons to limit the number of outgoing connections per
host (tcp_stock_limit): most servers don’t handle so many
connections as well as beng-proxy, and performance degrades when there
are too many. By default, there is no limit.
Pipe Limits¶
Linux has a global setting called
/proc/sys/fs/pipe-user-pages-soft which controls how many
pages of memory one user may allocate for pipe buffers. The default
setting 16384 is too small for beng-proxy, and pipes
will max out at one page, which decreases performance. It is
recommended to increase it to 1048576 by adding to
/etc/sysctl.d:
fs.pipe-user-pages-soft = 1048576
Firewall¶
Benchmarks have demonstrated that Netfilter (and its connection tracking) account for a good amount of the CPU load on a busy server. A good server does not need to depend on a firewall for security: rather than blocking protocols and ports, the administrator should make sure that these services aren’t bound to public interfaces in the first place. An internal services bound on all interfaces is an indicator for misconfiguration.
It is a good idea to disable the firewall (in the kernel configuration) and audit all listeners. If you cannot do without a firewall, you can disable connection tracking for beng-proxy connections:
table raw {
chain PREROUTING proto tcp dport http NOTRACK;
chain OUTPUT proto tcp sport http NOTRACK;
}
Cacheable Widgets and Containers¶
If you do a lot of direct communication with widgets, its container should be cacheable. If not, the container will be queried each time a request for a widget is handled. On pages with many widgets, you should try to make all of them cacheable. See Caching for details.
Disabling Widget Options¶
Don’t enable widget options when you don’t need them. That affects the options “processor”, “container”, “stateful” and others. Each of them adds some bloat to the response handler, and slows down the application. See Widget registry for details.
Load Balancing¶
If a machine serving a resource is too slow, you may be able to parallelize its work. Note that this increases throughput, but usually does not reduce latency considerably. See Load balancing, failover.
The Stopwatch¶
The stopwatch measures the latency of external resources (e.g. remote
HTTP servers, CGI and pipe programs). It is only available in the
debug build (compile-time option --enable-stopwatch).
Example output:
stopwatch[172.30.0.23:80 /test.py]: request=5ms headers=85ms end=88ms (beng-proxy=1+2ms)
Here, the HTTP request to 172.30.0.23:80 was sent within 5
milliseconds. After 85 milliseconds, the response headers were
received, and after 3 more milliseconds, the response body was
received. All of these refer to wallclock time, relative to the start
of the operation. Each client library may have its own set of
breakpoints.
During this HTTP request, beng-proxy consumed 3 milliseconds of raw CPU time (not wallclock time): 1 millisecond in user space, and 2 milliseconds for the kernel.
Resources¶
beng-proxy delivers resources to its HTTP clients. It obtains these resources from several sources.
Static files¶
Local “regular” files can be served by beng-proxy. This is
the fastest mode, and should be preferred, if possible. The Range
request header is supported (bytes only).
Directory index¶
For security (by obscurity) reasons, beng-proxy has no code for generating directory listings.
HTTP proxying¶
beng-proxy implements an HTTP client, which allows it to act as a reverse HTTP proxy server. You should never make beng-proxy connect to itself.
Caching¶
Responses from the remote servers are cached, if possible. To allow
proper caching, the remote server must set the response headers
Last-Modified, Expires and ETag properly. Additionally,
they should understand the according request headers
If-Modified-Since and If-Unmodified-Since, If-Match,
If-None-Match.
The cache is local to a beng-proxy worker.
Connection pooling¶
beng-proxy attempts to use HTTP 1.1 keep-alive, to be able to reuse existing connections to a remote server.
Load balancing, failover¶
For a remote URL, more than one server may be specified. beng-proxy tries to use all of these equally. If one server fails on the socket level, beng-proxy ignores it for a short amount of time.
Forwarded headers¶
Not all request and response headers are forwarded, for various reasons:
hop-by-hop headers (RFC 2616 13.5.1) must not be forwarded
headers describing the body are not forwarded if there is no body
some headers reveal otherwise private information about the communication partner at the other end (e.g. IP address)
some servers rely on the authenticity of the
X-CM4all-BENG-Userheaderdue to imponderable security implications, much of the header forwarding is opt-in
By default, only the following original request headers are forwarded to the remote HTTP server:
the
Accept-*headersUser-AgentCache-Controlin the presence of a forwarded request body:
Content-Typeand the otherContent-*headersCookie2is taken from the current session
Response headers forwarded to beng-proxy’s client:
Age,ETag,Cache-Control,Last-Modified,Retry-After,Vary,LocationContent-Typeand the otherContent-*headersSet-Cookie2is generated from the current session
The translation server can change the header forwarding policy, see Forwarding HTTP Headers.
SSL/TLS¶
To enable SSL/TLS, specify a https:// URL in the HTTP packet.
After that, the CERTIFICATE packet can choose a client certificate.
CGI and FastCGI¶
Local CGI programs may be used to generate dynamic resources.
CGI/FastCGI resources are cached in the same manner as remote HTTP resources.
WAS¶
Web Application Socket (WAS) is a protocol that can let a child
process render a resource, similar to FastCGI. Unlike FastCGI, it
copies raw data through separate pipes, which allows using the
splice() system call for efficient zero-copy transfer.
Pipe filters¶
A pipe is a program which filters a resource by reading it from standard input, and writing the result to standard output. This option cannot be used to generate a resource, but only for resource filters. The same can be achieved with CGI, but pipes are simpler to implement, because they do not need to bother with HTTP status code and headers.
Local HTTP¶
“Local HTTP” is a way for beng-proxy to launch local HTTP servers. An address for a “local HTTP” resource contains at least:
a server program
a request URI
Optional attributes:
command-line arguments (one or more
APPENDpackets)a “Host” request header (packet
LHTTP_HOST)concurrency (packet
CONCURRENCY)
How it works: beng-proxy spawns the specified process with a bound listener socket on file descriptor 0. The server program then accepts regular HTTP connections on this listener socket.
Remote Control Protocol¶
Logging Protocol¶
See Logging Protocol.
Widget protocol¶
A widget server is simply an HTTP server. Its content type must be
text/html or text/xml.
Hyperlinks¶
A widget may provide hyperlinks, e.g. with anchor elements or with FORM elements.
“Internal links” are links which are relative to the widget’s base URI - these links can be loaded into the widget’s dock. In CGI, this feature is called “PATH_INFO”. An internal link may include a query string.
“External URIs” are not relative, they should load in a new browser window.
Redirection¶
Widgets can send the usual HTTP redirection responses (status 3xx).
The new location must be below the widget’s base URI.
beng-proxy is currently limited to sending a GET request following
the redirect, because it does not save the request body. This is always
correct for “303 See Other”, but may not be for the other redirection
types. Widget servers should therefore always redirect with “303 See
Other” as follow-up to a POST request.
Focus¶
To navigate inside a widget, the widget must be “focused”. A focus can be assigned by clicking on a hyperlink that was generated using the “focus” URI rewriting mode (see c:mode).
A link pointing to the focused widget may change its current URI (relative to the widget’s base URI). If the HTTP request contains a query string or a request body, they are forwarded to that widget, instead of being sent to the template.
POSTing and other methods¶
Making the browser send a request body with a POST request is possible. It is recommended that you send a “303 See Other” redirect as a response to a POST request. Always reckon that beng-proxy may request a resource multiple times, even without interaction of the browser.
The same is true for other HTTP methods: PUT, DELETE and others
are passed to the focused widget (see Focus).
Session tracking¶
A widget may use HTTP cookies for session tracking, even if the browser does not support it - beng-proxy will take care of it. The widget should not include some kind of session identification in the URI.
These cookies are not available in JavaScript. Besides that, it would be a bad practice to use cookies in JavaScript which are not actually evaluated by the server (and cannot be used by the widget server in this case, since beng-proxy does not forward them). These cookies would generate a lot of network load for no good, which would have to go through the visitor’s narrow upstream with every request.
It is recommended to use (cookie based) sessions only if really required. In many situations, there are more elegant solutions, like storing the current state of a widget in its current URI (path info).
Authentication¶
HTTP-level Authentication¶
A translation response containing HTTP_AUTH enables HTTP-based
authentication according to RFC 2617. The packet may contain an
opaque payload. Additionally, the translation server should send
WWW_AUTHENTICATE and AUTHENTICATION_INFO, which will be sent
to the client in the WWW-Authenticate and Authentication-Info
response headers.
Without an Authorization request header, the HTTP request will
result in a 401 Unauthorized response (with headers
WWW-Authenticate and Authentication-Info).
If the Authorization header is available, beng-proxy
submits a new translation request with the following packets:
HTTP_AUTH: echoing the response packet, plus optionally theAPPEND_AUTH/EXPAND_APPEND_AUTHpayloadAUTHORIZATIONcontains theAuthorizationrequest headerLISTENER_TAG,HOST
The translation server responds with one of:
USERspecifying the user handle to be forwarded inX-CM4all-BENG-Userrequest headers (optionally followed byMAX_AGE, because beng-proxy is allowed to cache these responses)STATUS=401if theAuthorizationvalue was rejected
Example conversation:
beng-proxy:
URI=/protected/foo.htmltranslation server:
PATH=/var/www/protected/foo.html HTTP_AUTH=opaque WWW_AUTHENTICATE='Basic realm="Foo"'beng-proxy:
HTTP_AUTH=opaque AUTHORIZATION='Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ=='translation server:
USER=Aladdin MAX_AGE=300
HTTP-level Authentication (old)¶
beng-proxy supports HTTP-level authentication according to
RFC 2617.
It forwards the Authorization request header to the translation
server wrapped in a AUTHORIZATION packet, and allows the translation
server to send WWW-Authenticate and Authentication-Info response
headers back to the client, wrapped in WWW_AUTHENTICATE and
AUTHENTICATION_INFO.
Token Authentication¶
A translation response containing TOKEN_AUTH enables token-based
authentication. The packet may contain an opaque payload.
The token is extracted from the access_token query string parameter.
To check it, beng-proxy sends a new request with the
following packets:
TOKEN_AUTH: echoing the response packet, plus optionally theAPPEND_AUTH/EXPAND_APPEND_AUTHpayloadAUTH_TOKENcontains theaccess_tokenquery string parameter (unescaped)URIis the full request URI with only theauth_tokenquery string parameter removedLISTENER_TAG,HOST
If no access_token parameter was present, beng-proxy
checks if a USER is already set in the current session; if yes,
then translation request will be skipped completely. If not, then the
TOKEN_AUTH request will be sent, but without an AUTH_TOKEN
packet.
The translation server may now reply:
STATUS(optionally withMESSAGE) on errorREDIRECT(optionally withSTATUS), e.g. to redirect to a login pageDISCARD_SESSION,SESSION,USER: the session is updated and the client will be redirected to the current URI, but without theauth_tokenquery string parameter
A non-empty USER value means the user is authenticated. This
value is passed in the proprietary X-CM4all-BENG-User request
header (if the request header group SECURE is set to MANGLE).
Additionally, the header X-CM4all-BENG-Has-Session: 1 is sent to
indicate that this authenticated request is based on a cookie-managed
session (and not at the HTTP level with the Authorization
header). This difference is important for some services, e.g. to
decide whether CSRF protection is necessary.
Combining HTTP_AUTH and TOKEN_AUTH¶
When HTTP_AUTH and TOKEN_AUTH are both specified,
HTTP_AUTH is only used if the client sends an Authorization
header.
This precedence implies that WWW_AUTHENTICATE and
AUTHENTICATION_INFO are useless, and they must not be used.
This also implies that if there is neither an Authorization header
nor an authenticated session, then the TOKEN_AUTH handler decides
how the request is going to be handled. Usually, it means that the
client gets redirected to a login HTML page.
Recovering a Session¶
If beng-proxy does not have a valid session for the client
(and there is no access_token query string parameter), but the
client sent a RECOVER_SESSION cookie, the translation request will
contain that value, e.g.:
TOKEN_AUTH(echoing the response packet)RECOVER_SESSIONcontains the value of the recover cookieURIis the full request URIHOST
The translation server validates the RECOVER_SESSION value
(e.g. by checking a crypto signature contained within) and may then
configure the new session with values copied from the lost session.
Any TOKEN_AUTH translation response may contain a new
RECOVER_SESSION value which beng-proxy will forward to
its client as a cookie.
Application level Authentication¶
Authentication is supported in the translation protocol. After the
translation server sets the USER session variable to a non-empty
string, the session is presumed to be authenticated. This user variable
is passed to widget servers in the proprietary X-CM4all-BENG-User
request header. The user is logged out when the translation sends an
empty USER packet.
The CHECK packet¶
On a protected resource, the translation server may send the CHECK
packet together with the normal response. Now beng-proxy
queries the translation server again, sending the same request and a
copy of the CHECK packet. The translation server may now verify
the current session, redirect to a login page, or anything else needed
to authenticate the user. The response to this second translation
request may be a resource address as usual, or the PREVIOUS
packet, which indicates that the first translation shall be used.
While the first response is usually cached for a long time, the second
one may specify a short MAX_AGE value. This means the latter is sent
more often, but since it refers to the former, it is very small.
Example 1, unauthenticated user logs in:
beng-proxy:
URI=/protected/foo.htmltranslation server:
PATH=/var/www/protected/foo.html SESSION=1234 CHECK=xyzbeng-proxy:
URI=/protected/foo.html SESSION=1234 CHECK=xyztranslation server:
MAX_AGE=0 STATUS=403 CGI=/usr/lib/cgi-bin/login.pluser enters his credentials, login.pl marks the session “authenticated”, redirects back to the original URI
beng-proxy:
URI=/protected/foo.html SESSION=1234 CHECK=xyz(from the cached translation response)translation server:
MAX_AGE=300 VARY=SESSION PREVIOUS
Example 2, authenticated user:
beng-proxy:
URI=/protected/foo.html SESSION=2345translation server:
PATH=/var/www/protected/foo.html CHECK=xyzbeng-proxy:
URI=/protected/foo.html SESSION=2345 CHECK=xyztranslation server:
MAX_AGE=300 VARY=SESSION PREVIOUS
Example 3, with CHECK_HEADER:
beng-proxy:
URI=/footranslation server:
CHECK=abc CHECK_HEADER=api-keybeng-proxy:
URI=/foo CHECK=abc CHECK_HEADER=api-key:12345678translation server:
PATH=/var/www/12345678/foo
The AUTH packet¶
AUTH provides another authentication protocol that was designed to
support SAM and similar authentication services. If the client is not
already authenticated, the translation server receives a dedicated
authentication request, echoing the AUTH packet. Additionally, it
receives the full request URI in the URI packet, the “Host” header
in the HOST packet and the session id in the SESSION packet.
The response to this AUTH request may be one of the following:
USERspecifying the new session user (optionally followed byMAX_AGE)REDIRECT(optionally withSTATUS)BOUNCE(optionally withSTATUS)STATUS
Only clients with a fresh USER will be allowed to actually perform
the request.
Caching AUTH requests is not implemented properly; to be
future-proof, the response must begin with MAX_AGE=0.
Compatibility will not be guaranteed without it.
Example:
…
translation server: …
SESSION=opaque1beng-proxy:
URI=/foo.html HOST=example.comtranslation server: …
AUTH=opaque2beng-proxy:
AUTH=opaque2 SESSION=opaque1 URI=/foo.html;a=b?c=d HOST=example.comtranslation server:
MAX_AGE=0 USER=hans MAX_AGE=300
Note the two MAX_AGE packets. The first one disables caching for the
whole translation response (mandatory, see above) and the second one
enforces revalidation every 5 minutes.
An alternative to AUTH is the packet AUTH_FILE which specifies
the path to a file containing the AUTH payload (no more than 64
bytes). This path can be specified dynamically using
EXPAND_AUTH_FILE.
Additionally, APPEND_AUTH may specify a payload that will be
appended to the contents of the AUTH_FILE. There’s also
EXPAND_APPEND_AUTH.
If the listener option auth_alt_host is enabled, then the request
header X-CM4all-AltHost will be forwarded to the translation server
in a ALT_HOST translation packet.
Referrer¶
The Referer request header is not supported.
Views¶
A widget class may have a number of named views. Only the “default” view has no name, and it cannot be selected explicitly. A view may have a different server address, different transformations and other settings.
A view other than the default one can be selected in three different ways:
in the template with the element
c:viewas a request argument from the client
as a HTTP response header from the widget server
For security reasons, the view a client is allowed to choose is limited. A view that has an address can only be selected by the template, to avoid unauthorized access to vulnerable areas. If the view chosen by the template enables the HTML processor with the “container” flag, beng-proxy disallows the client to switch to another view that is not a “container”, to avoid exposing the template’s widget parameters (unless the response is not processable). Switching to a view without an address is always allowed if the previous view does not make the widget a container.
While the limitations described above do not guarantee real security, it was decided that it would be an acceptable compromise.
The widget server can select the view with the response header
X-CM4all-View. Just the list of transformations (processor, filter)
will be used, the new URI of the view will be ignored. At this point, a
“partial” request for a child widget may be discarded already when the
previous view did not declare the widget as a “container”. Due to these
side effects, this feature should be avoided if possible; it is better
to select the view in the request.
Generic Views¶
Regular HTTP resources can have views, too. Usually, only the default
view is used. There is only one way to select a different view: by using
the X-CM4all-View response header.
The Beng Template Language¶
The beng-proxy template language defines commands which may be
inserted into XHTML stream. They are implemented as XML elements and
attributes with the prefix c:. If you care about validating the
processor input, you must declare the XML namespace c:. There is
currently no suggested namespace URI, and beng-proxy does not actually
care, because it does not implement a full-featured XML parser.
Options¶
The following translation packets may be used to configure the processor:
PROCESS: Enables the processor.CONTAINER: Allows embedding other widgets.SELF_CONTAINER: Allows embedding more instances of the current widget type.GROUP_CONTAINER: Allow this widget to embed instances of this group. This can be specified multiple times to allow more than one group. It can be combined withSELF_CONTAINER.WIDGET_GROUP: Assign a group name to the widget type. This is used byGROUP_CONTAINER.FOCUS_WIDGET: Set the default URI rewriting options to “base=widget, mode=focus”.ANCHOR_ABSOLUTE: A slash at the beginning or a URI refers to the widget base, not to the server root.PREFIX_CSS_CLASS: CSS class names with leading underscore get a widget specific prefix, see Local Classes.PREFIX_XML_ID: XML ids with leading underscore get a widget specific prefix, see Local Classes.PROCESS_STYLE: Shall the processor invoke the CSS processor for “style” element/attribute contents?
Adding a widget¶
To add a widget, use the following command:
<c:widget id="foo" type="date" />
The following attributes may be specified:
id: unique identification of this widget; this is required for proper session and form management if there are several widgets with the same server URItype: registered name of the widget serverdisplay: specifies how the widget is to be displayed:inlineis the default, and inserts the widget’s HTML code into the current page;nonedoes not display the widget, but it may be referenced later (see section Frames)session: the scope of the widget session (which widgets with the same id share the same session data?):resourceis the default and means that two documents have different sessions;sitemeans documents in the same site share session data
Registered widgets are not yet implemented.
Passing arguments to widgets¶
Example:
<c:widget id="foo" type="date">
<c:parameter name="timezone" value="PST" />
<c:path-info value="/bla" />
</c:widget>
parameter elements adds query string parameters. These are added to
the query string provided by the browser. In the value, the standard XML
entities amp, quot, apos, lt, gt are recognized.
There may be one path-info element whose value is appended to the
widget URI, if none was sent by the browser.
This is not a reliable way to transfer bulk data. Only very short values should be passed this way to a widget. There is no guarantee that beng-proxy or other web servers can cope with URIs longer than 2 kB. If your widget comes even close, you should reconsider your approach.
As usual: never trust user input! The widget server cannot see if input came from the template or from the user’s browser.
Passing HTTP headers to widgets¶
Example:
<c:widget id="foo" type="date">
<c:header name="X-CM4all-Foo" value="Bar" />
</c:widget>
header elements create HTTP request headers. Headers are replaced,
i.e. if a header with such a name was about to be forwarded from the
client to the widget, the client’s value will be removed. In the header
name, only letters, digits and the dash is allowed. It must start with
“X-”.
Selecting the widget view¶
Example:
<c:widget id="foo" type="bar">
<c:view name="raw"/>
</c:widget>
The c:view element selects the transformation view for this widget.
It can be one of the view names provided by the widget registry (i.e.
the translation server).
Variable substitutions¶
beng-proxy defines special entities beginning with c: for its
purposes. Namespaced entities are not actually allowed in XML or HTML,
and this is only an interim solution until the javascript filter is
finished. These entities are (unlike normal HTML entities) also expanded
in SCRIPT elements.
&c:local;: the “local” URI of this widget class (see LOCAL_URI).&c:type;: the class name of this widget&c:class;: the quoted class name of this widget&c:id;: the id of this widget&c:path;: the location of this widget&c:prefix;: XML id and Javascript prefix&c:uri;: absolute external URI of the current page; use this variable for redirecting&c:base;: base URI of the current page (i.e. without beng-proxy arguments and without the query string)&c:frame;: the top widget in this frame (if any)&c:view;: the name of the current view
Before inserting, the values are escaped using the standard XML entities.
Relative URIs¶
Relative links are difficult with beng-proxy, because the browser interprets links as relative to the document by default. A widget author cannot specify a link relative to the widget itself. To allow this, beng-proxy can rewrite relative links to the following bases:
template: links are relative to the main template (default)widget: links are relative to the widget; the browser will leave beng-proxy if the user clicks on such a link, because it points to the widget serverchild: link to a child widget; the URI is the ID of the child widget. You may append a relative URI separated by a slash.parent: links are relative to the parent of this widget, i.e. the container which declared it
The base name must be specified in the element attribute c:base
before the attribute containing the URI. To specify the mode of the
rewritten URI, you may use the attribute c:mode:
direct: direct link to the resourcefocus: link to beng-proxy serving the full page (or the current frame), focusing the widget (see Focus)partial: link to beng-proxy serving only the selected widget; useful for frame contentsresponse: send a HTTP request to the widget and read the response body
The mode is ignored when the base is “template”.
The attribute c:view may be used to specify a view name.
beng-proxy knows the following HTML elements, and optionally rewrites URIs:
AAUDIOEMBEDFORMIFRAMEIMGSCRIPTVIDEO
Example:
<img c:base="widget" c:mode="partial" c:view="raw" src="foo.jpg"/>
Processing Instruction syntax¶
To set a default value for all following link elements, you may use the
<?cm4all-rewrite-uri?> XML Processing Instruction:
<?cm4all-rewrite-uri c:base="widget" c:mode="focus"?>
This is recommended when many adjacent links share the same URI rewrite settings, or when you cannot guarantee the order of attributes (many XSLT processors mix the attribute order, which is allowed).
Absolute Widget Links¶
For widget with many nested levels of “directories”, it can become hard to build a absolute links to its resources: a URI with a leading slash is difficult to do, because that would require the widget code to know where it was mounted; a relative link is as difficult, because it requires the widget to be aware of the current nesting level, and needs extra code.
To do that more easily, the tilde symbol may be used as a URI prefix: the tilde followed by a slash is considered an absolute link pointing to the root of the widget. Example:
Give a widget served from http://widget.server/foo/, the URI
~/bar.html always points to http://widget.server/foo/bar.html.
This is a proprietary extension in the spirit of the UNIX shell syntax (referring to the “home” of a widget). It does not work without beng-proxy.
Static Widget Resources¶
It is often desirable for widgets to publish static resource files in a special global location, served without the processor overhead. This location can be configured with the LOCAL_URI translation packet.
Within a widget, the URI prefix @/ refers to this
location. Example:
<img src="@/logo.png"/>
All resources in this location are decoupled from the widget instance and from the current document. Therefore, the URI rewriting mode is ignored.
Frames¶
beng-proxy supports displaying widgets in an IFRAME or IMG
element. To do this, declare your widget with display=none. After
that, insert an IFRAME element (or any other element which
references its content with an URI), and let beng-proxy rewrite the
URI:
<c:widget id="post" type="demo_post" display="none"/>
<iframe width="200" height="200" c:base="child"
c:mode="partial" src="post"/>
This may be used for any HTML tag which is supported by the beng-proxy URI rewriting code, here an example for a widget rendering an image:
<c:widget id="logo" type="logorenderer" display="none"/>
<img c:base="child" c:mode="partial" c:view="raw" src="logo"
alt="Our website logo"/>
Note that we use c:view=raw here (assuming a view with that name
was defined), because an image should not (and can not) be processed
by beng-proxy. You can also use c:mode=direct if you
want the browser to request the resource from widget server directly
instead of proxying through beng-proxy.
Untrusted Widgets¶
Usually, widgets are embedded inside the one single HTML page. The problem is that all scripts run with the same privileges, and each widget’s scripts can access the whole page, each widget can invoke requests to any other widget.
As a safeguard against potentially malicious widgets, beng-proxy can run widgets in a separate domain. The default security settings of browsers will disallow cross-domain script access.
To make a widget class “untrusted”, the translation server generates the
HOST packet with a host name for that widget. A host name may be
shared by a group of widget classes.
While translating a request, the translation server may send the
UNTRUSTED packet, repeating the host name of the request. This makes
the request itself “untrusted”: trusted widgets are rejected, and only
those untrusted widgets matching the specified host name are accepted.
If the packet is absend, all untrusted widgets are rejected.
The Beng JavaScript API¶
JavaScript code in a widget frequently needs to send HTTP requests to the widget server. All these requests must got through beng-proxy. Since the structure of a beng-proxy URI is regarded internal, it provides a JavaScript function to generate such an URI:
function
beng_widget_uri(base_uri, session_id, frame, focus, mode,
path, translate, view);
The return value is the URI which can be safely requested by the
widget server. For base_uri and, frame, you should pass the
value of &c:base;, &c:frame;. The session_id parameter is
obsolete and should be null.
focus is the path of the focused widget, and can be filled with
&c:path; most of the time, unless you can to request a different
widget than the current one.
mode is one of the following:
focus: the full page (the default ifnullis passed)partial: just this one widget, processor enabled (must betext/html)
The path argument is an URI relative to the widget. It may include a
query string.
The translate argument is passed to the translation server as
PARAM packet.
view is the name of the transformation view to use. This parameter
is ignored unless frame is set, or mode is “partial”.
The Text Processor¶
The text processor expands the entity references described in Variable substitutions, but does nothing else. It may be useful to insert values into JavaScript files.
The CSS Processor¶
The CSS processor is a transformation for cascading style sheets. The
translation server enables it with the packet PROCESS_CSS. It is the
equivalent of the HTML processor for CSS: it can convert URLs to widget
resources. This allows proxying resources that are referenced in CSS.
The proprietary property -c-mode specifies the URL rewriting mode
for the following URLs in the current block. See c:mode for a list of valid values. -c-mode configures a view
name. Example:
body {
-c-mode: partial;
-c-view: raw;
background-image:url('background.png');
}
Options¶
The following translation packets may be used to configure the CSS processor:
PROCESS_CSS: Enables the CSS processor.PREFIX_CSS_CLASS: CSS class names with leading underscore get a widget specific prefix, see below.
Local Classes¶
When the option PREFIX_CSS_CLASS is enabled, CSS class names with a
leading underscore are rewritten. The option is available in both
processors (HTML and CSS).
Two leading underscore makes the class local to the current widget
class. It may be shared by multiple instances of the same class. The
two underscores are replaced by the value of &c:class; (see
Variable substitutions).
Three leading underscore makes the class local to the current widget
instance. The three underscores are replaced by the value of
&c:prefix; (see Variable substitutions). Each instance may define
different styles for this class.
The expansion is applied even when the class/id consists only of two or three underscores.
Security Considerations¶
The values are inserted raw into the stream, i.e. without any escaping/quoting. This has implications which need to be kept in mind.
If an attacker controls variable values, he may be able to inject JavaScript or, more dangerously: if the substitution filter comes before a XML processor, he may be able to inject widget instances. On the other hand, if the substitution filter comes after the XML processor, variable references in inline widgets will also be substituted, which may have displeasing consequences.
The prototype translation server¶
Until the jetserv daemon is finished, the prototype translation
server should be used. It is not configurable; this section describes
its hard-coded behaviour.
Request translation¶
The document root is /var/www. File names ending with .html are
mapped to the content type “text/html; charset=utf-8” and are marked
with the flags PROCESS, CONTAINER.
Widget registry¶
The translation server expects a file for each registered widget type
named /etc/cm4all/beng/widgets/TYPENAME. Example:
server "http://cfatest01.intern.cm-ag/date.py"
process
container
The first line is mandatory: it specifies the widget server. process
enables the template processor; if that is not specified, the HTML
output is inserted into the resulting page verbatim. container
allows the widget to embed sub widgets, stateful sets the “stateful”
flag.
Disabling features may increase the performance dramatically, because it
allows beng-proxy to make better assumptions on data it does not know
yet. So if you know the widget is a leaf widget, do not specify
container.
Instead of server, you can use cgi to specify the absolute path
of a CGI script which will serve the widget, or path for a static
widget.
For CGI widgets, you can also specify the options script_name,
document_root, action, interpreter.