Writing Portable C

I like to write C that is portable to build as well as run. To that end, all of my projects should build with any C compiler cc that supports the C99, C11 or C23 standard. They have no other development dependencies: no need for pkgconfig, or the presence of headers or libraries beyond those that come included with a minimal version of the target operating system (e.g. libc on Unix).

I test all projects with gcc, clang, tcc, cproc, and cl.exe—and most will even build with chibicc.

Code
Compilers
Conclusion
1. Easy targets

The Code

There are two main rules I follow to ensure portability:

statically linked dependencies should be built from source and vendored with the project, using as few translation units (*.o files) as possible;
dynamic dependencies should be avoided at all costs, and when required they must be dynamically loaded into vtables via dlopen and dlsym.

The first rule requires that, besides a C toolchain, everything required to build a project must be included with the source code: no build system should be necessary!

If external dynamic dependencies are necessary, e.g. graphics libraries (libGL, libvulkan), then the second rule requires these to be runtime-only dependencies—not build dependencies. The purpose of this rule is to ensure easy cross-compilation: a minimal sysroot should be all that is required. Userspace libraries that are part of a minimally configured target OS are not considered “dependencies” as they must be present in any viable sysroot; e.g. nt.dll is the OS API on Windows, libc.so is (part of) the OS API on FreeBSD.

`aven-cfmt` project structure

The aven-cfmt project builds a single aven-cfmt executable and exports some functionality in include/aven/c.h. The aven-cfmt executable is a C source code formatter that parses and renders C abstract syntax trees. The c.h header contains the implementation of the core functionality (tokenization, parsing, rendering, etc) separate from the application code (command line parsing, IO, etc), to make it easier for other projects to use.

aven-cfmt/
    src/
        aven-cfmt.c
    test/
        test.c
    include/
        aven/
            c.h
    deps/
        libaven/
            include/
                aven/

In order to build the project, you would execute the following.

cc -I include -I deps/libaven/include -o aven-cfmt src/aven-cfmt.c

To build and run the tests, you would instead compile test/test.c.

cc -I include -I deps/libaven/include -o test src/test.c
./test

Even when I provide other build systems, I will always structure the project such that it builds with a single cc command that follows the above pattern.

`libavengraph` project structure

The libavengraph project implements several graph algorithms, focusing on topologically embedded graphs and coloring algorithms for embedded graphs. The include/ directory contains the library code, the visualization/ directory contains application code for algorithm visualizations, the benchmarks/ directory contains application code for algorithm benchmarks.

libavengraph/
    visualization/
        visualization.c
    benchmarks/
        all.c
        pyramid.c
    include/
        graph.h
        graph/
    deps/
        libavengl/
        include/
        deps/
            libaven/
                include/
            glfw/
                include/
                src/
                glfw.c
            stb/
                include/
                stb.c
            gles3/
                include/
            wayland/
                include/
            X11/
                include/
            xkbcommon/
                include/

The visualization/ code depends on my libavengl library, which in turn depends on glfw and stb_truetype, as well as the headers for gles3, X11, wayland, and xkbcommon. To build the visualization binary you would run a command along the lines of the following.

cc -I include -I deps/libavengl/include \
    -I deps/libavengl/deps/libaven/include \
    -I deps/libavengl/deps/glfw/include \
    -I deps/libavengl/deps/stb/include \
    -I deps/libavengl/deps/gles3/include \
    -I deps/libavengl/deps/wayland/include \
    -I deps/libavengl/deps/X11/include \
    -I deps/libavengl/deps/xkbcommon/include \
    -o visualization visualization/visualization.c \
    deps/libavengl/deps/glfw/glfw.c \
    deps/libavengl/deps/glfw/stb.c

The benchmarks do not require the graphical dependencies and thus may be built with fewer flags.

cc -O3 -march=native -I include -I deps/libavengl/deps/libaven/include \
    -o all_benchmarks benchmarks/all.c

Of course, I also provide simple build system written in C.

$ cc -o build build.c
$ ./build
$ ls build_out
visualization

But this is only for convenience, the project structure dictates how the code is built!

Compiler specific builtins

If you want to take advantage of compiler builtins, check they are available. E.g. the following code uses an assert macro based on __builtin_unreachable from gcc/clang so long as it is available.

#ifndef __has_builtin
    #define __has_builtin(unused) 0
#endif

#if !defined(assert) and __has_builtin(__builtin_unreachable)
    #define assert(c) ((!(c)) ? __builtin_unreachable() : (void)0)
#else
    #include <assert.h>
#endif

The same trick can be used for GNU attribute specifiers.

#ifndef __has_attribute
    #define __has_attribute(unused) 0
#endif

#if __has_attribute(malloc)
    __attribute__((malloc))
#endif
#if __has_attribute(alloc_size)
    __attribute__((alloc_size(2, 4)))
#endif
#if __has_attribute(alloc_align)
    __attribute__((alloc_align(3)))
#endif
void *aven_arena_alloc(
    AvenArena *arena,
    size_t count,
    size_t align,
    size_t size
);

Graphics and other libraries

As mentioned above, I avoid dynamic dependencies wherever possible, but sometimes there is no getting around them. E.g. in order to access GPU functionality, it is difficult to avoid libvulkan or libGL. To avoid requiring that the development version of each library be installed on each build machine, we can provide the headers with our source and dynamically load the libraries at runtime.

The glfw library is a classic wrapper and dynamic loader for windowing libraries like X11 and/or waylaid. Another example is the volk dynamic loader for Vulkan.

My OpenGL loader works with a common subset of OpenGL 4.3, OpenGL ES 3.2, and WebGL 2.0.

#include <aven.h>

#define GL_GLES_PROTOTYPES 0
#include <GLES3/gl32.h>

typedef struct {
    PFNGLACTIVETEXTUREPROC ActiveTexture;
    PFNGLATTACHSHADERPROC AttachShader;
    // ...
    PFNGLDEBUGMESSAGECALLBACKPROC DebugMessageCallback;
    bool es;
} AvenGl;

typedef void (*AvenGlProcFn)(void);
typedef AvenGlProcFn (*AvenGlLoadProcFn)(const char *);

static inline AvenGl aven_gl_load(AvenGlLoadProcFn load, bool es) {
    AvenGl gl = { .es = es };

    gl.ActiveTexture = (PFNGLACTIVETEXTUREPROC)load("glActiveTexture");
    gl.AttachShader = (PFNGLATTACHSHADERPROC)load("glAttachShader");
    // ...
    gl.DebugMessageCallback = (PFNGLDEBUGMESSAGECALLBACKPROC)load(
        "glDebugMessageCallback"
    );

    return gl;
}

#ifdef AVEN_GL_NDEBUG
    #define aven_gl_check_error(gl) (void)(gl)
#else
    #define aven_gl_check_error(gl) do { \
            if ((gl)->GetError() != 0) { \
                aven_panic("opengl error"); \
            } \
        } while (0)
#endif

#define aven_gl_shader(gl, str) ( \
        gl->es ? "#version 300 es\n" str : "#version 430\n" str \
    )

static inline void aven_gl_ActiveTexture(AvenGl *gl, GLenum texture) {
    gl->ActiveTexture(texture);
    aven_gl_check_error(gl);
}

static inline void aven_gl_AttachShader(
    AvenGl *gl,
    GLuint program,
    GLuint shader
) {
    gl->AttachShader(program, shader);
    aven_gl_check_error(gl);
}

// ...

static inline void aven_gl_DebugMessageCallback(
    AvenGl *gl,
    GLDEBUGPROC callback,
    const void *userParam
) {
    gl->DebugMessageCallback(callback, userParam);
    aven_gl_check_error(gl);
}

Compilers

A good way to test whether your project is written in a portable manner is to build it with as many C compilers as you can get your hands on. The standard toolchains are:

Linux/BSDs: gcc, clang
Windows GNU: gcc.exe, clang.exe
Windows MSVC: cl.exe, clang.exe
MacOS: clang

In general, it is quite simple to cross-compile applications between Linux, the BSDs, and Windows using clang or zig cc (the version of clang packaged with the Zig programming language). It is also fairly simple to test Windows apps from Unix via wine, and to test Linux/BSD apps from any system by using a qemu virtual machine. That leaves MacOS as the one major outlier—fuck Apple.

`tcc`

The Tiny C Compiler (tcc) is probably the most well known C compiler outside of the mainstream. It’s a fantastic, fast compiler, but it doesn’t do much optimization when it comes to code generation. It can generate position independent code and work with dynamic libraries.

`cproc`

One of my favorite open source tools is cproc: a C compiler built on the qbe optimizing code generation backend. Each of cproc and qbe is written in around ~10,000 lines of simple C99 code, and each builds using just make and a C toolchain. By my benchmarks, binaries produced by cproc run around 50% as fast as fully optimized gcc/clang binaries (-O3 -march=native), but with a tiny fraction of the build time.

The cproc compiler does not include a preprocessor, assembler, or a linker: it uses those from the system C toolchain (e.g. gcc -E, ar, and ld). It also cannot generate position independent code, and therefore cannot compile shared libraries.

`chibicc`

The Chibi C Compiler (chibicc) is an educational, toy compiler that includes a full preprocessor implementation. In order to use chibicc as a drop-in replacement for gcc in my projects, I had to apply the following patch:

diff --git a/main.c b/main.c
index ffaabf4..dbe8d0f 100644
--- a/main.c
+++ b/main.c
@@ -429,6 +429,47 @@ static void run_cc1(int argc, char **argv, char *input, char *output) {
   run_subprocess(args);
 }
 
+static void print_escaped_str(FILE *out, const char *str) {
+  fprintf(out, "\"");
+  for (const char *c = str; *c != 0; c++) {
+    switch (*c) {
+      case '\a':
+        fprintf(out, "\\a");
+        break;
+      case '\b':
+        fprintf(out, "\\b");
+        break;
+      case '\f':
+        fprintf(out, "\\f");
+        break;
+      case '\n':
+        fprintf(out, "\\n");
+        break;
+      case '\r':
+        fprintf(out, "\\r");
+        break;
+      case '\t':
+        fprintf(out, "\\t");
+        break;
+      case '\v':
+        fprintf(out, "\\v");
+        break;
+      case '\\':
+        fprintf(out, "\\\\");
+        break;
+      case '\"':
+        fprintf(out, "\\\"");
+        break;
+      case '\'':
+        fprintf(out, "\\\'");
+        break;
+      default:
+        fprintf(out, "%.*s", 1, c);
+    }
+  }
+  fprintf(out, "\"");
+}
+
 // Print tokens to stdout. Used for -E.
 static void print_tokens(Token *tok) {
   FILE *out = open_file(opt_o ? opt_o : "-");
@@ -439,7 +480,10 @@ static void print_tokens(Token *tok) {
       fprintf(out, "\n");
     if (tok->has_space && !tok->at_bol)
       fprintf(out, " ");
-    fprintf(out, "%.*s", tok->len, tok->loc);
+    if (tok->str)
+      print_escaped_str(out, tok->str);
+    else
+      fprintf(out, "%.*s", tok->len, tok->loc);
     line++;
   }
   fprintf(out, "\n");
@@ -601,6 +645,7 @@ static char *find_gcc_libpath(void) {
     "/usr/lib/gcc/x86_64-linux-gnu/*/crtbegin.o",
     "/usr/lib/gcc/x86_64-pc-linux-gnu/*/crtbegin.o", // For Gentoo
     "/usr/lib/gcc/x86_64-redhat-linux/*/crtbegin.o", // For Fedora
+    "/usr/lib/gcc/x86_64-chimera-linux-musl/*/crtbegin.o", // For Chimera
   };
 
   for (int i = 0; i < sizeof(paths) / sizeof(*paths); i++) {
@@ -656,16 +701,16 @@ static void run_linker(StringArray *inputs, char *output) {
 
   if (opt_static) {
     strarray_push(&arr, "--start-group");
-    strarray_push(&arr, "-lgcc");
-    strarray_push(&arr, "-lgcc_eh");
     strarray_push(&arr, "-lc");
     strarray_push(&arr, "--end-group");
   } else {
     strarray_push(&arr, "-lc");
-    strarray_push(&arr, "-lgcc");
-    strarray_push(&arr, "--as-needed");
-    strarray_push(&arr, "-lgcc_s");
-    strarray_push(&arr, "--no-as-needed");
   }
 
   if (opt_shared)
diff --git a/parse.c b/parse.c
index 6acaeb8..80a86b7 100644
--- a/parse.c
+++ b/parse.c
@@ -1209,6 +1209,7 @@ static void union_initializer(Token **rest, Token *tok, Initializer *init) {
     Member *mem = struct_designator(&tok, tok->next, init->ty);
     init->mem = mem;
     designation(&tok, tok, init->children[mem->idx]);
+    consume(&tok, tok, ",");
     *rest = skip(tok, "}");
     return;
   }

The escaped string printing fixes an issue that when used with -E, the output will omit all but the first token of a compound string literal; e.g. "hi " "there" would be preprocessed as "hi ".

Conclusion

If your project will build with a single evocation of each of the above compilers targeting each of the mutually supported operating systems and architectures, that is a pretty good sign that you have crafted something lasting and portable.

Easy targets

Every application and library I write supports Linux, the BSDs, and Windows; I will only support MacOS for command-line tools: I don’t have Mac machine to test with, and it is a pain in the ass to create a full cross-compilation toolchain. For graphical applications, I additionally support using emcc to target the browser with WebAssembly and WebGL 2, and I have a simple Android build process to package Android APKs.