Wasm Labs @ VMware OCTO

WebAssembly and Sockets: PHP development server on WasmEdge

By Asen Alexandrov
At 2022 / 12 20 mins reading

As part of the Wasm Language Runtimes project, we have been working to expand the functionality of the server-side WebAssembly PHP build we developed on top of WASI. As we explained in the outline of our initial work on this, due to WASI being still incomplete we were not able to port the code that uses server-side sockets.

However, other WebAssembly runtimes like WasmEdge have gone beyond the current WASI standard and extended it with API methods that offer the missing socket support. We decided to leverage it and provide an improved version of PHP for wasm32-wasi which now includes the PHP Development server.

This article explores some of the challenges we found along the way of this effort. We hope that the lessons we learned will help others in their work with WASI and legacy applications.

We will cover, among other topics:

  • Socket support with WASI and WasmEdge
  • Porting a C application with sockets to WASI
  • File descriptors (fd-s) in WASI
  • The call_indirect instruction

Server-side sockets with Wasm/WASI

The Berkeley Sockets API offers a widely used flow for implementing a TCP server. After creating a socket fd (file descriptor), we bind it to a port, start listen-ing for incoming connections and accept them when they arrive. The latter gives us a new fd for the established connection, which we can use to send or recv data.

The wasi_snapshot_preview1 version of WASI follows the Berkeley Sockets approach, but the committee is taking the time and thought to polish the API to the best possible version. As a result, we still don't have WASI equivalents for the socket, bind and listen part.

It is expected that the Wasm runtime will offer the capability to pre-open needed sockets and allow the Wasm application to accept connections on them. On one hand, this behavior aligns with the approach of pre-opening host folders, which only adds security. On the other hand, it requires applications to do additional work to find out what is the fd on which they would accept connections, for example via sd_listen_fds.

Support for accepting on pre-opened sockets is implemented by runtimes such as Wasmtime. Here is how that works for a server app written in C.

  1. Wasmtime gets an additional --tcplisten HOST:PORT argument when running the Wasm application
  2. Wasmtime starts listening on HOST:PORT and passes the file descriptor number for this to the Wasm application
  3. The server code uses sd_listen_fds to get the listening file descriptor
  4. The server code calls accept which is implemented by wasi-libc and translated to sock_accept from WASI.
  5. Wasmtime implements wasi_snapshot_preview1 which includes sock_accept
Accepting on pre-opened sockets with wasi-libc

This approach does allow the implementation of server-side applications with WASI. However, it means that one cannot port existing code-bases to WASI as-is. You have to do extra effort to replace the socket, bind, listen calls with sd_listen_fds. It requires that part of the application logic gets shifted to the underlying Wasm runtime. This has already caused a known issue around confusing fd-ss for pre-opened folders with pre-opened listening sockets.

Bearing in mind what we outlined earlier in the article, we decided to follow a different approach while porting the PHP server to wasm32-wasi.

Server-side sockets with WasmEdge

The WasmEdge team decided to implement socket support ahead of WASI standardization. In version 0.8.2 they extended the standard wasi_snapshot_preview1 with the methods that offer parity with a typical application flow with Berkeley sockets. This allows us to compile existing code that uses sockets and run it on WasmEdge.

Of course, while the approach makes things easier for legacy code, it could lead to compatibility problems if one has to use another WebAssembly runtime. As other runtimes do not support the additional methods (e.g. sock_bind, sock_listen, etc.) a binary built for WasmEdge will not run on all of them. If and when a version of those methods gets added to the WASI standard it may not necessarily match the one defined by WasmEdge. This means that there will be a period of lack of compatibility even after the functions get standardized. A typical example is the sock_accept method, which was first introduced by WasmEdge and later standardized, but with a slightly different signature.

There will be a period of lack of compatibility even after the functions get standardized

With this approach, one would have to use an extra layer that wraps the wasi_snapshot_preview1 socket methods behind a POSIX-compliant interface. The WasmEdge team has provided a Rust SDK which greatly helps in writing new software against their socket API. There is some work in progress on a POSIX-based C SDK, currently at the prototype stage. When WASI gets full socket support, you can expect that the methods provided by wasmedge_wasi_socket_c will be implemented by wasi-libc, so the legacy code will continue working.

The way to go with this approach is shown in the diagram below.

  1. WasmEdge is called as usual. HOST:PORT arguments are passed to the Wasm application to interpret and handle as it sees fit.
  2. The application calls bind, listen, accept as usual. However instead of wasi-libc they are implemented by a wasmedge-socket-c-SDK module, which translates them to sock_bind, sock_listen, sock_accept.
  3. WasmEdge implements wasi_snapshot_preview1 but extends it with its non-standard set of sock_* methods.

![Bind, listen, accepting with wasmedge socket SDK", "WasmEdge-sockets.png)

In Wasm Language Runtimes we are working on building existing codebases to work on WASI, such as traditional programming language interpreters like Python and PHP. Therefore we chose to follow this second approach. The main benefit we get is flexibility. If WASI changes in the future, we only need to apply the required changes on the SDK that translates the calls between WASI and POSIX.

Getting a server to listen with WasmEdge

The first challenge we had to tackle was to make a simple bind, listen, accept server written in C work on WasmEdge. We started with the hangedfish/wasmedge_wasi_socket_c code and the accompanying examples in hangedfish/httpclient_wasmedge_socket.

While wasmedge_wasi_socket_c was working great for client-side sockets, it turned out that there were issues with server-side sockets. In particular, we found the following issues:

  • Translating network addresses between the POSIX standard and the types defined by WasmEdge's wasi_snapshot_preview1
  • Proper handling of memory ownership

After we had a running server that proved we can compile socket-based C code to run on WasmEdge we set off to do this for the PHP development server.

Enabling the PHP development server

This turned out to be a not-that-easy task. We hit several different issues outlined below.

socket_accept's signature

First, we had to re-enable some networking-related code, which was previously disabled for __wasi__ builds. We added our patched wasmedge_wasi_socket_c as part of the PHP code, with include and link priority over wasi-libc. At include time our netdb.h will shadow the one from wasi-libc. At link time we would get the definitions from wasmedge_wasi_socket_c instead of those in wasi-libc (if there was overlap).

However, this exposed the already mentioned incompatibility of sock_accept between the WASI standard and the WasmEdge approach. As a result, even the php-cgi build target stopped working on Wasmtime because of the signature mismatch. And vice versa, if we removed the wasmedge_wasi_socket_c and built only php-cgi (without server-side networking) with wasi-libc, it would not work on WasmEdge.

[error]     Mismatched function type. Expected: FuncType {params{i32 , i32 , i32} returns{i32}} ,
Got: FuncType {params{i32 , i32} returns{i32}}
[error] When linking module: "wasi_snapshot_preview1" , function name: "sock_accept"

This issue forced us to modify our build so that we add the wasmedge_wasi_socket_c code only when we build for WasmEdge explicitly. For us, this is a valuable lesson: WASI is a work in progress. If you want to work across different runtimes, you should be prepared for customized builds or patches per runtime.

WASI fd-s are random

Getting the server-side socket PHP code to build and start listening felt great. However, it was still not working. External debugging showed that TCP connections got accepted by the server, but still nothing happened. So after arduous debugging, we found a peculiar thing.

As it turns out the PHP code is using the select method to find out on which of the currently opened fd-s it should act. So after bind-ing and listen-ing it would only call accept on sockets that have received client connections. This is a normal approach but it has a great caveat. On one side the POSIX documentation is clear about one thing.

WARNING: select() can monitor only file descriptors numbers that
are less than FD_SETSIZE (1024)—an unreasonably low limit for
many modern applications—and this limitation will not change.

When designing WASI people made a conscious effort to remove the chance for implicit generation of fd numbers. Thus the path_open documentation states that:

The returned file descriptor is not guaranteed to be the lowest-numbered file descriptor not currently open; it is randomized to prevent applications from depending on making assumptions about indexes, since this is error-prone in multi-threaded contexts From WASI documentation

When the WasmEdge team was extending wasi_snapshot_preview1 path_open was the only method returning new fd-s. So they justly decided to adopt the same approach.

So there we had socket fd-s randomly ranging from 3 to 2^31 and different on each run. However, as the PHP code was using select it would live in a world where socket fd-s are less than FD_SETSIZE. This includes code where things won't happen for large fd numbers - these fd-s would just be ignored and "disappear" without a trace in the application flow:

# define PHP_SAFE_FD_SET(fd, set) do { if (fd < FD_SETSIZE) FD_SET(fd, set); } while(0)

Also, there would be code that processes fd-s by looping through 0 till max_fd (the maximum number among all opened fd-s) and checking the fd state, instead of only checking the known fd-s. This approach is fast enough if the file descriptors are really small numbers and get reused after closing. However, with random numbers in [3,2^32) one gets looping through billions of useless iterations when `max_fd`` happens to be huge. This made for some runs where the PHP server would become useless.

To handle this issue we first tested a local build of WasmEdge where the random generator of fd-s was limited to FD_SETSIZE. This worked and we discussed with the WasmEdge team that porting existing applications that use say select to WASI could become very hard with these huge fd-s. Therefore they created an issue on their side to make the generator configurable so one can cap the max number of a fd if necessary.

We needed a solution now, however. We also wanted it to work for WasmEdge today as well as tomorrow. So our final approach to deal with the problem was to:

  1. Modify the PHP_SAFE_FD_* set of macros so they never check if fd < FD_SETSIZE
  2. Modify all places that loop through fd numbers 0 to max_fd to loop only through the "known" fd-s

Fixing a "call_indirect - mismatched function type"

We got the PHP server to work and start serving requests. It worked great for small PHP scripts. However, once we started experimenting with WordPress we started seeing a strange error

[error] execution failed: indirect call type mismatch, Code: 0x8c
[error] In instruction: call_indirect (0x11) , Bytecode offset: 0x001df4f1 , Args: [2164]
[error] Mismatched function type. Expected: FuncType {params{i32} returns{}} ,
Got: FuncType {params{i32} returns{i32}}
[error] When executing function name: "_start"

Let's discuss this in comparison to the well-known concept of function pointers in C. The call_indirect instruction is equivalent to trying to call a function from a function pointer. So its argument is equivalent to a function pointer. In our case, the argument 2164 indicates that this is the 2164-s function in the functions table of the WebAssembly module. Now the error itself is the equivalent of saying that there is a function pointer to void(i32), but it is assigned the address of a function whose signature is actually i32(i32).

To solve this, we used wasm2wat from wabt to get to see the funcref section. There we found out that the function with index 2164 is zend_list_free. From there on, with a bit of code searching we found this function, that was assigned to a function pointer:

int ZEND_FASTCALL zend_list_free(zend_resource *res)

// ...

typedef void (ZEND_FASTCALL *zend_rc_dtor_func_t)(zend_refcounted *p);

This would work in C as the return value was ignored with the function pointer. However, in WebAssembly that was a type mismatch.

As this was only one place and we couldn't change the function pointer type, the fix we used was to wrap the mismatching function in one that has the expected signature.

static void zend_list_free_void_wrapper(zend_resource *res) {
zend_list_free(res);
}

For anyone interested to know more about call_indirect, the funcref section and how to debug this issue, here is a great article by CoinEx Chain lab with hands-on examples.

Try it out!

Note: as of writing this article our code changes are a work in progress. This section will be updated in the future.

If you want to try out the PHP development server in WebAssembly you can build and run it from the php-server-wasmedge branch of WebAssembly Language Runtimes on https://github.com/vmware-labs/webassembly-language-runtimes/tree/php-server-wasmedge.

  1. Install WasmEdge

  2. Install all prerequisites or use the wasi-builder container image

    ghcr.io/vmware-labs/wasi-builder:16
  3. Define the WASMLABS_RUNTIME environment variable:

    export WASMLABS_RUNTIME=wasmedge
  4. Run wl-make.sh tool to get the binary

    ./wl-make.sh php/php-7.4.32
  5. To serve the example docroot go with

    wasmedge --dir /:/ \
    build-output/php/php-7.4.32/bin/php \
    -S 0.0.0.0:8080 \
    -t images/php/docroot/

Work in progress

Handling all of the above challenges we managed to build a stable, working PHP server that is able to run WordPress. However, this is a work in progress. We have cut corners here and there and future changes to WASI or WasmEdge are expected to require that we revisit the way we patched the original PHP code.

As already mentioned we hit issues in trying to translate between the POSIX address types and the ones defined in WASI and WasmEdge's extension of it. Therefore we decided to make the server always listen on 0.0.0.0. If one needs to run this code on a specific IPv4 address or use IPv6 the patched code will need to be revised.

We want to thank Michael Yuan and the WasmEdge team for their help during development. The WebAssembly ecosystem is in its early days and it is exciting to get to collaborate with other projects to keep moving Wasm forward. We hope you enjoyed this article and look forward to your comments and suggestions!

Do you want to stay up to date with WebAssembly and our projects?