DumbCycle
April 12, 2024  |  asm c  |  GitHubWhat is the easiest way to draw pixels to the screen on a modern Linux system?
I have had this question bouncing around in the back of my mind for a long time.
Some time ago I made a few tiny applications that displayed images using the
/dev/fb0
device. Unfortunately,
fbdev
is a legacy subsystem that won't always be present.
Fortunately, the modern
Direct Rendering Manager
subsystem provides corresponding functionality via "dumb
buffer" objects.
DumbCycle is a minimal game that I created using the Direct Rendering Manager subsystem of Linux. In order to meet my criteria it had to:
- run on any1 modern
x86_64
Linux machine with working video card and keyboard drivers; - compile with any C compiler that supports the C99 standard (or later);
- link with a basic
x86_64
assembly runtime (nolibc
or other libraries).
The above restrictions made C preprocesser directives mostly unnecessary,
so none were be used. I.e. #include
, #define
, etc. were not allowed.
Code for the finished game is available on GitHub. Each commit corresponds to a section of this article.
Requirements
To build the game you will need a C compiler that
supports at least the C99 standard, e.g.
gcc,
clang,
zig cc,
cproc.
You will also need
make and
binutils
for the as
assembler and
ld
linker. To run the virtual test
environment discussed later in the article you will need
QEMU.
By design the game will only build for
x86_64
Linux.
On Windows the default Ubuntu
WSL installation
should work as a development environment after adding the build-essential
and qemu-system
packages.
If you don't have an x86_64
Linux or Windows system then
you will need cross-compilation tools and a virtual
machine, e.g.
musl-cross-make and
QEMU.
Creating an executable: an entry point in assembly
To link a basic
ELF
executable on Linux we need to define an entry point
symbol that runs our code and then calls the exit
system call. The default
entry point symbol is _start
which we will define in the file src/runtime.s
.
.text
.extern _cstart
.global _start
_start:
xor %rbp, %rbp
mov (%rsp), %rdi
mov %rsp, %rsi
add $8, %rsi
call _cstart
ud2
.section .note.GNU-stack,"",@progbits
Linux follows the
System V ABI.
The ABI states that at the start of
process execution the stack pointer register
rsp
will contain a 16 byte aligned
pointer to the top of the stack. The stack will contain
a positive 32-bit integer indicating the number of
program argument strings,
followed by a null-terminated array of string
pointers of the given length.
Other values may also be present further up the stack, but we can
ignore them for our purposes.
We first zero out the base pointer rbp
to mark that this is the top
stack frame of our process.
Next we copy the argument count from the stack into the first parameter
register rdi
and compute a pointer to the array of argument string
pointers in the second parameter register rsi
.
Finally, we call the
function _cstart
that will be defined in src/main.c
.
An undefined instruction ud2
is inserted following the call to
_cstart
to crash the program if _cstart
happens to return.
The .note.GNU-stack
section indicates
to the assembler that the stack memory should
not be executable.
We'll also create a src/main.c
file with a dummy
_cstart
function.
void _cstart(int argc, char **argv) {}
Next, let's add a simple Makefile
to build the executable.
CC = gcc
CFLAGS = -fno-stack-protector
LD = ld
LDFLAGS =
AS = as
ASFLAGS =
all: dumb_cycle
clean: clean_dumb_cycle
dumb_cycle: src/main.o src/runtime.o
$(LD) $(LDFLAGS) -o dumb_cycle src/main.o src/runtime.o
clean_dumb_cycle: clean_main clean_runtime
rm -f dumb_cycle
src/main.o: src/main.c
$(CC) $(CFLAGS) -c -o src/main.o src/main.c
clean_main:
rm -f src/main.o
src/runtime.o: src/runtime.s
$(AS) $(ASFLAGS) -o src/runtime.o src/runtime.s
clean_runtime:
rm -f src/runtime.o
If we build and run our program it should terminate with an illegal instruciton.
$ make
$ ./dumb_cycle
Illegal instruction (core dumped)
Exiting without crashing: Linux system calls
In order to exit from our process without crashing we need to invoke the
Linux exit
system
call. A system call is made using the syscall
instruction,
similar to calling a function in assembly with the call
instruction.
Rather than specifying a function address to jump to, however, syscall
instead
looks at the rax
register for the number of the system call to execute.
Arguments are passed in registers similar to function calls, but the
register order is slightly different (see the System V ABI).
We will define a set of assembly functions to call from C
that will execute
system calls. The first argument to each function will be the system call
number. Linux syscalls may have up to six arguments, so we will implement a
separete function for each possible argument count. The registers are
re-arranged to place the syscall number in rax
and the remaining arguments
into the appropriate syscall argument registers2.
.global syscall0
syscall0:
movq %rdi, %rax
syscall
ret
.type syscall0, @function
.size syscall0, .-syscall0
.global syscall1
syscall1:
movq %rdi, %rax
movq %rsi, %rdi
syscall
ret
.type syscall1, @function
.size syscall1, .-syscall1
.global syscall2
syscall2:
movq %rdi, %rax
movq %rsi, %rdi
movq %rdx, %rsi
syscall
ret
.type syscall2, @function
.size syscall2, .-syscall2
.global syscall3
syscall3:
movq %rdi, %rax
movq %rsi, %rdi
movq %rdx, %rsi
movq %rcx, %rdx
syscall
ret
.type syscall3, @function
.size syscall3, .-syscall3
.global syscall4
syscall4:
movq %rdi, %rax
movq %rsi, %rdi
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
syscall
ret
.type syscall4, @function
.size syscall4, .-syscall4
.global syscall5
syscall5:
movq %rdi, %rax
movq %rsi, %rdi
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
movq %r9, %r8
syscall
ret
.type syscall5, @function
.size syscall5, .-syscall5
.global syscall6
syscall6:
movq %r9, %r11
movq %rdi, %rax
movq 8(%rsp), %r9
movq %rsi, %rdi
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
movq %r11, %r8
syscall
ret
.type syscall6, @function
.size syscall6, .-syscall6
Now we can define a function stub for each syscall function. I like to refer
to basic C types by short sign-and-size aliases, so we will also
add a typedef
statement for each relevant type.
typedef short i16;
typedef unsigned short u16;
typedef int i32;
typedef unsigned int u32;
typedef long i64;
typedef unsigned long u64;
u64 syscall0(u64 scid);
u64 syscall1(u64 scid, u64 a1);
u64 syscall2(u64 scid, u64 a1, u64 a2);
u64 syscall3(u64 scid, u64 a1, u64 a2, u64 a3);
u64 syscall4(u64 scid, u64 a1, u64 a2, u64 a3, u64 a4);
u64 syscall5(u64 scid, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5);
u64 syscall6(u64 scid, u64 a1, u64 a2, u64 a3, u64 a4, u64 a5, u64 a6);
enum syscall {
SYS_EXIT = 60,
};
static void exit(i32 error_code) {
syscall1(SYS_EXIT, (u64)error_code);
}
void _cstart(i32 argc, char **argv) {
exit(0);
}
To make a specific system call we need to pass the corresponding
syscall number as the first parameter. A quick and simple
way to find the exit
syscall number and required arguments is to
use grep
to search the Linux source.
$ git clone --depth=1 https://github.com/torvalds/linux.git
$ cat linux/arch/x86/entry/syscalls/syscall_64.tbl | grep "exit"
60 common exit sys_exit
231 common exit_group sys_exit_group
$ grep -rn "SYSCALL_DEFINE.\?(exit," linux/
linux/kernel/exit.c:992:SYSCALL_DEFINE1(exit, int, error_code)
The syscall number for exit
is 60
and it takes a single integer
error_code
argument. The error code indicates
whether the process completed successfully or encounterd an error.
$ make
$ ./dumb_cycle
Our executable now exits cleanly!
Greeting the world: error handling and file descriptors
Exiting cleanly is nice, but it would be even nicer to have some input and
output. Before defining C functions
to make the write
system call, we will need to write a little
boilerplate to manage error handling.
The standard way for
system calls to return error values is to place a value between -4095
and -1
in the return register3. It is therefore necessary to test if the
return value of a system call fits in this range and, if so, extract the error
code.
enum syscall {
SYS_WRITE = 1,
SYS_EXIT = 60,
};
enum error_code {
EINTR = 4,
};
static i32 syscall_error(u64 return_value) {
if (return_value > -4096UL) {
return (i32)(-return_value);
}
return 0;
}
static i64 write(i32 fd, char *bytes, i64 bytes_len) {
u64 return_value;
i32 error;
do {
return_value = syscall3(SYS_WRITE, (u64)fd, (u64)bytes, (u64)bytes_len);
error = syscall_error(return_value);
} while (error == EINTR);
if (error != 0) {
return -error;
}
return (i64)return_value;
}
There is one particular syscall error that often needs special handling:
EINTR
. The EINTR
error has code 4
on x86_64
and indicates that
the system
call was interrupted by a signal. If we hit an EINTR
error code we should
retry the given syscall.
Now we can write a classic greeting program. The first argument we
need to pass to the write
syscall is the file descriptor that
we want to write
to. There are three file descriptors that are open by default for all
processes: descriptor 0
is standared input, descriptor 1
is standard
output, and descriptor 2
is standard error.
enum std_fd {
STDIN = 0,
STDOUT = 1,
STDERR = 2,
};
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_WRITE_STDOUT,
};
i32 main(i32 argc, char **argv) {
char greeting[] = "Hello, World!\n";
i64 len = write(STDOUT, greeting, sizeof(greeting));
if (len < 0){
return MAIN_ERROR_WRITE_STDOUT;
}
return MAIN_ERROR_NONE;
}
void _cstart(i32 argc, char **argv) {
exit(main(argc, argv));
}
Building and running we should now receive a friendly message.
$ make
$ ./dumb_cycle
Hello, World!
Accessing the filesystem
Almost everything in Linux is treated as a file, and thus we will need
some system calls to access the file system: read
, open
and close
.
enum syscall {
SYS_READ = 0,
SYS_WRITE = 1,
SYS_OPEN = 2,
SYS_CLOSE = 3,
SYS_EXIT = 60,
};
static i64 read(i32 fd, char *bytes, i64 bytes_len) {
u64 return_value;
i32 error;
do {
return_value = syscall3(SYS_READ, (u64)fd, (u64)bytes, (u64)bytes_len);
error = syscall_error(return_value);
} while (error == EINTR);
if (error != 0) {
return -error;
}
return (i64)return_value;
}
enum open_mode {
O_RDONLY = 0,
O_WRONLY = 1,
O_RDWR = 2,
};
static i32 open(char *fname, i32 mode, i32 flags) {
u64 return_value;
i32 error;
do {
return_value = syscall3(SYS_OPEN, (u64)fname, (u64)mode, (u64)flags);
error = syscall_error(return_value);
} while (error == EINTR);
if (error != 0) {
return -error;
}
return (i32)return_value;
}
static i32 close(i32 fd) {
u64 return_value;
i32 error;
do {
return_value = syscall1(SYS_CLOSE, (u64)fd);
error = syscall_error(return_value);
} while (error == EINTR);
return error;
}
We can now open files, read and write to them, and close them. For a simple
example we will have our program try to open a file name.txt
. If the file
opens successfuly, the program will read the contents into the name
character
array.
If the file fails to open, the program will
ask the user to enter a name and then read the name from the standard input
instead. Finally, the program will print a greeting using the given name.
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_WRITE_STDOUT,
MAIN_ERROR_READ_NAME,
};
i32 main(i32 argc, char **argv) {
i64 len;
i32 name_fd = open("name.txt", O_RDONLY, 0);
if (name_fd < 0) {
char question[] = "What is your name?\n";
len = write(STDOUT, question, sizeof(question));
if (len < 0){
return MAIN_ERROR_WRITE_STDOUT;
}
name_fd = STDIN;
}
char name[255];
i64 name_len = read(name_fd, name, sizeof(name));
if (name_len < 0) {
return MAIN_ERROR_READ_NAME;
}
char greeting1[] = "Hello ";
len = write(STDOUT, greeting1, sizeof(greeting1) - 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
len = write(STDOUT, name, name_len);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
return MAIN_ERROR_NONE;
}
We can now run and test that both methods work correctly.
$ make
$ ./dumb_cycle
What is your name?
Aven
Hello Aven
$ echo "Aven" > name.txt
$ ./dumb_cycle
Hello Aven
Keeping time: the clock_gettime
system call
To manage game updates and draw frames we will need some method to track how
much time has passed. The standard Linux system call for tracking time in
high resolution is clock_gettime
.
enum syscall {
// ...
SYS_CLOCK_GETTIME = 228,
};
enum clock_id {
CLOCK_MONOTONIC = 1,
};
struct timespec {
i64 sec;
i64 nsec;
};
static i32 clock_gettime(i32 clock_id, struct timespec *timespec) {
u64 return_value = syscall2(
SYS_CLOCK_GETTIME,
(u64)clock_id,
(u64)timespec
);
return syscall_error(return_value);
}
static i64 time_since_ns(struct timespec *end, struct timespec *start) {
i64 seconds = end->sec - start->sec;
return (seconds * 1000L * 1000L * 1000L) + end->nsec - start->nsec;
}
The clock_gettime
system call takes a clock ID and writes the current timestamp
from the given clock into the provided timespec
output parameter. In order to
track time we will call clock_gettime
to get two timestamps and compute the
time since with time_since_ns
.
It is quite inefficient to make a full system call for clock_gettime
since
such a call would usually be made using
VDSO. However, I have decided
that the runtime setup required for VDSO calls is beyond the scope of this
singularly focused article.
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_WRITE_STDOUT,
MAIN_ERROR_CLOCK_GETTIME,
};
i32 main(i32 argc, char **argv) {
struct timespec last, now;
i32 error = clock_gettime(CLOCK_MONOTONIC, &last);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
i64 len;
i32 steps = 0;
while (steps < 5) {
i32 error = clock_gettime(CLOCK_MONOTONIC, &now);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
if (time_since_ns(&now, &last) >= 1000L * 1000L * 1000L) {
last = now;
steps += 1;
len = write(STDOUT, ".", 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
}
}
len = write(STDOUT, "\n", 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
return MAIN_ERROR_NONE;
}
Our program now prints a dot every second for five seconds.
$ make
$ ./dumb_cycle
.....
Detecting user input: the poll
syscall
In order to run a game loop we will need to be able to read user input while
updating the game state and rendering the screen. To accomplish this we will
need some way of detecting when user input is available to read from a given
file descriptor. There are many different ways to
accomplish this on Linux, but the poll
system call is simple and
suits our use case perfectly.
enum syscall {
// ...
SYS_POLL = 7,
// ...
};
enum poll_event {
POLLIN = 1,
};
struct pollfd {
i32 fd;
i16 events;
i16 revents;
};
static i32 poll(struct pollfd *fds, i64 fds_len, i32 time_ms) {
u64 return_value;
i32 error;
do {
return_value = syscall3(SYS_POLL, (u64)fds, (u64)fds_len, (u64)time_ms);
error = syscall_error(return_value);
} while (error == EINTR);
if (error != 0) {
return -error;
}
return (i32)return_value;
}
The poll
syscall takes an array of struct pollfd
values, the length of the
array, and a number of milliseconds to wait for an event to occur.
The pollfd
struct stores a file descriptor fd
to monitor, a 16 bit integer
events
indicating which events to poll for, and a corresponding
field revents
where the events that actually occurred will be written.
We will only be looking for the POLLIN
event that signals when a file
descriptor has data available to be read.
As an example we can change our program to poll for data on the standard input file descriptor while it loops printing elipses.
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_WRITE_STDOUT,
MAIN_ERROR_CLOCK_GETTIME,
MAIN_ERROR_POLL,
};
i32 main(i32 argc, char **argv) {
char greeting[] = "Enter a message to exit.\n";
i64 len = write(STDOUT, greeting, sizeof(greeting) - 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
struct timespec last, now;
i32 error = clock_gettime(CLOCK_MONOTONIC, &last);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
struct pollfd stdin_pollfd = { .fd = STDIN, .events = POLLIN, };
i32 steps = 0;
while (1) {
i64 events = poll(&stdin_pollfd, 1, 0);
if (events < 0) {
return MAIN_ERROR_POLL;
}
if (events > 0) {
char dummy_buffer[255];
read(STDIN, dummy_buffer, sizeof(dummy_buffer));
break;
}
i32 error = clock_gettime(CLOCK_MONOTONIC, &now);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
if (time_since_ns(&now, &last) >= 1000L * 1000L * 1000L) {
last = now;
steps += 1;
len = write(STDOUT, ".", 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
}
}
return MAIN_ERROR_NONE;
}
We now have an application that will continually print '.'
characters until
the user enters a message.
$ make
$ ./dumb_cycle
Enter a message to exit.
....Hey!
Making memory: pages and arenas
It is often necessary to allocate memory during runtime, such as when it
isn’t known at compile time how much space a block of data will take up.
In standard C dynamic memory
allocation is accomplished using the malloc
or calloc
functions
from libc
.
Since we aren’t linking with libc
, we will build a very simple
memory allocation scheme from scratch.
The basic way to request memory from the operating system on Linux is the
mmap
system call.
enum syscall {
// ...
SYS_MMAP = 9,
// ...
};
enum mmap_prot {
PROT_READ = 1,
PROT_WRITE = 2,
};
enum mmap_flag {
MAP_SHARED = 0x01,
MAP_ANONYMOUS = 0x20,
};
static void *mmap(
void *hint,
i64 size,
i32 prot,
i32 flags,
i32 fd,
i64 offset
) {
u64 return_value = syscall6(
SYS_MMAP,
(u64)hint,
(u64)size,
(u64)prot,
(u64)flags,
(u64)fd,
(u64)offset
);
i32 error = syscall_error(return_value);
if (error != 0) {
return 0;
}
return (void *)return_value;
}
An mmap
call asks the operating system to allocate contiguous blocks of
virtual memory called pages. On Linux a memory page
is 4096
bytes. The hint
argument suggests a virtual address to
start the allocation. The prot
argument tells the OS what permissions
the allocated memory should have, e.g. whether it is readable (PROT_READ
)
and/or writable (PROT_WRITE
).
The flags
argument indicates
other properties of the allocation such as whether the mapped memory is shared
with child processes (MAP_SHARED
) and/or not backed by a file descriptor
(MAP_ANONYMOUS
). The fd
flag is used to provide a backing file descriptor
and the offset
flag indicates where in the associated file the mapping
should start.
Making a call to mmap
for every allocation is slow (system calls take
much longer than function calls) and wasteful (we must allocate multiples
of 4096 bytes).
Both problems can be solved with memory arenas, also known as bump
allocators. An arena is a simple struct that stores a pair of pointers:
one to the start of a block of memory, and one to the end of the block.
We will create a new src/mem.c
file that will contain our arena code.
typedef long i64;
typedef unsigned long u64;
struct arena {
char *start;
char *end;
};
void *alloc(struct arena *arena, i64 size) {
i64 available = arena->end - arena->start;
if (size > available) {
return 0;
}
char *p = arena->start;
arena->start += size;
for (i64 i = 0; i < size; ++i) {
p[i] = 0;
}
return p;
}
To “allocate” memory from an arena, we simply check that there is enough space
and then return the start
pointer, setting the new
start
to point at the spot in memory just after the allocated chunk.
Some C types are expected to be n
byte aligned in memory, i.e. have a memory
address that is evenly divisible by n
.
On x86_64
Linux the maximum alignment required by any type is 16
bytes, so
we will simply ensure that our arena allocator aligns all pointers to
16
bytes.
void *alloc(struct arena *arena, i64 size) {
i64 available = arena->end - arena->start;
i64 padding = -(i64)arena->start & (16 - 1);
if (size > (available - padding)) {
return 0;
}
char *p = arena->start + padding;
arena->start = p + size;
for (i64 i = 0; i < size; ++i) {
p[i] = 0;
}
return p;
}
We’ll need to add build steps for src/mem.c
to our Makefile
.
dumb_cycle: src/main.o src/mem.o src/runtime.o
$(LD) $(LDFLAGS) -o dumb_cycle src/main.o src/mem.o src/runtime.o
clean_dumb_cycle: clean_main clean_mem clean_runtime
rm -f dumb_cycle
src/mem.o: src/mem.c
$(CC) $(CFLAGS) -c -o src/mem.o src/mem.c
clean_mem:
rm -f src/mem.o
Now we can make one mmap
call at the start of our program and create an 8MB
arena large enough for all our dynamic memory allocations.
struct arena {
char *start;
char *end;
};
void *alloc(struct arena *arena, i64 size);
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_MMAP,
};
i32 main(i32 argc, char **argv) {
i64 arena_size = 2000 * 4096;
char *mem = mmap(
0,
arena_size,
PROT_WRITE | PROT_READ,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (mem == 0) {
return MAIN_ERROR_MMAP;
}
struct arena arena = { .start = mem, .end = mem + arena_size };
char *buf = 0;
i64 buf_len = 0;
return MAIN_ERROR_NONE;
}
You may be wondering why we created a separate src/mem.c
file. The truth is that it would almost certainly be
fine to place the definition of the alloc
function directly in the
src/main.c
file, but it would not be strictly correct according to
the C99 standard.
When you write data into memory that has no declared type, e.g. memory
returned by mmap
, the memory assumes the effective type of the data
that was written to it. A new write to the same memory can change the
effective type again, but the memory will not return to its initial untyped
state.
With an arena it may be the case that the same memory is used for different
purposes, e.g. if a temporary copy of an arena is used in a confined scope.
If alloc
was defined within the same translation unit as the code that calls
it, then the C compiler might deduce at compile time that
memory is being re-used. In such a case a void *
returned by a call to
alloc
would point to memory that has
an effective type from a prior write.
In most cases it is fine if the memory returned by alloc
has an effective type:
we will almost always explicitly write data into
allocated memory before we read from it.
Unfortunately, in the next section we will run into a situation where
technicalities arise.
For a contrived example of how issues crop up, let us suppose that we have an arena set up as shown above and we run into the following situation.
{
struct arena temp_arena = arena;
float *f = alloc(&temp_arena, sizeof(*f));
*f = 3.1415f;
}
int *i = alloc(&arena, sizeof(*i));
foo(i);
int x = *i;
The implemenation of foo
is defined in a separate translation unit as follows.
void foo(int *i) {
*i = 42;
}
While we know that calling foo(i)
writes an int
to
*i
, the C compiler does not know about this write.
The compiler will instead believe the effective type of the memory pointed to by i
to be
float
: the last write it can see to that memory is
*f = 3.1415f
.
Thus dereferencing i
to read an int
value is technically undefined behavior.
If the memory returned by alloc
had no declared or effective type, then
dereferencing i
to read an int
value would be perfectly fine:
it is valid to read untyped memory through a typed pointer so long as it
does in fact contain a valid representation for that type.
By defining alloc
in a separate translation
unit we force the compiler to consider each call to alloc
as returning a pointer to unknown memory with no declared type.
Interacting with I/O devices: the ioctl
system call
Our game will need to receive keyboard input and draw pixels to the
screen. Both of these tasks will require the ioctl
system call, short for
“input output control.”
enum syscall {
// ...
SYS_IOCTL = 16,
// ...
};
enum ioctl_dir {
IOCTL_WRITE = 1,
IOCTL_READ = 2,
IOCTL_RDWR = 3,
};
static i32 ioctl(i32 fd, u32 dir, u32 type, u32 number, u32 size, char *arg) {
u32 number_bits = number & 0xff;
u32 type_bits = (type & 0xff) << 8;
u32 size_bits = (size & 0x3fff) << 16;
u32 dir_bits = (dir & 0x3) << 30;
u32 request = dir_bits | size_bits | type_bits | number_bits;
u64 return_value;
i32 error;
do {
return_value = syscall3(SYS_IOCTL, (u64)fd, (u64)request, (u64)arg);
error = syscall_error(return_value);
} while (error == EINTR);
return error;
}
The fd
argument is the file descriptor of the device in question.
The dir
argument indicates whether the request is writing and/or reading.
The type
argument indicates the the high level type of the request, e.g.
a DRM request. The number
parameter indicates the
specific request being made for the given type
. Finally, size
indicates the
size in bytes of the data pointed to by arg
.
Instead of taking dir
, type
, number
, and size
arguments, the standard
C ioctl
implementation will take a single request
integer argument.
It is expected that callers will use C macros to pre-pack the bits of the
request. Since we are not
using C macros in this project, and we are aiming for
clarity over performance, it makes more sense for us to pass each part of the
request separately and then pack the request in the ioctl
function.
Keyboard input: finding and reading keyboard devices
Reading from the standard input works for text lines, but for realtime key presses we really need direct access to the keyboard device.
If the /dev
fileystem has been mounted correctly then a /dev/input
directory will exist
and contain a file for each input device.
We don’t know which file corresponds to the keyboard that the user
will actually be pressing keys on, so we’ll simply have to iterate through
all of them and open a file descriptor for each device that might be a
keyboard.
enum ioctl_type {
IOCTL_EV = (i32)'E',
};
enum ev_ioctl {
EV_IOCTL_GET_BIT = 0x20,
EV_IOCTL_GET_KEY = 0x21,
EV_IOCTL_GRAB = 0x90,
};
enum ev_bits {
EV_KEY = 0x1,
EV_MAX = 0x1f,
};
enum ev_key_bits {
KEY_ESC = 1,
KEY_W = 17,
KEY_A = 30,
KEY_S = 31,
KEY_D = 32,
KEY_MAX = 0x2ff
};
static i32 test_bit(char *bytes, i32 len, i32 bit_num) {
i32 byte_index = bit_num / 8;
i32 bit_index = bit_num % 8;
if (byte_index >= len) {
return 0;
}
return (bytes[byte_index] & (1 << bit_index)) != 0;
}
static i32 is_keyboard(i32 fd) {
char evio_bits[EV_MAX / 8 + 1];
i32 error = ioctl(
fd,
IOCTL_READ,
IOCTL_EV,
EV_IOCTL_GET_BIT,
sizeof(evio_bits),
evio_bits
);
if (error != 0) {
return 0;
}
if (!test_bit(evio_bits, sizeof(evio_bits), EV_KEY)) {
return 0;
}
char evio_key_bits[KEY_MAX / 8 + 1];
error = ioctl(
fd,
IOCTL_READ,
IOCTL_EV,
EV_IOCTL_GET_KEY,
sizeof(evio_key_bits),
evio_key_bits
);
if (error != 0) {
return 0;
}
if (
test_bit(evio_key_bits, sizeof(evio_key_bits), KEY_ESC) &&
test_bit(evio_key_bits, sizeof(evio_key_bits), KEY_W) &&
test_bit(evio_key_bits, sizeof(evio_key_bits), KEY_A) &&
test_bit(evio_key_bits, sizeof(evio_key_bits), KEY_S) &&
test_bit(evio_key_bits, sizeof(evio_key_bits), KEY_D)
) {
return 1;
}
return 0;
}
To determine whether a given file represents a keyboard device we
need to check whether it produces key events and then check whether it has the
keys that we need for our game, namely ESC
, W
, A
, S
, and D
. Checking
such properties requires passing an array of bytes as the arg
output
parameter to an ioctl
syscall, and then testing whether which bits in the
array have been set.
The test_bit
helper function takes an array of bytes and a bit index,
returning 1
if the given bit is set in the provided array, and
0
otherwise.
Next we will write the open_keyboards
function to acquire sole access to all
available input devices that could be the user’s keyboard. We will use the
getdents
system call to iterate over all of the files in
/dev/input
.
enum syscall {
// ...
SYS_GETDENTS = 78,
// ...
};
struct dirent {
u64 ino;
u64 off;
u16 reclen;
char name[];
};
static i64 getdents(i32 fd, struct dirent *dents, i64 dents_size) {
u64 return_value = syscall3(
SYS_GETDENTS,
(u64)fd,
(u64)dents,
(u64)dents_size
);
i32 error = syscall_error(return_value);
if (error != 0) {
return -error;
}
return (i64)return_value;
}
The getdents
system call writes a struct dirent
object into the
provided block of memory for each file in the given directory. The system call is designed
to be called repeatedly until a value of zero is returned indicating that all directory
entries have been read. A struct dirent
object contains an
inode number,
a filesystem specific offset value, the total size of the dirent object, and a null terminated string
for the filename.
static i32 open_keyboards(
struct arena temp_arena,
i32 *keyboards,
i32 keyboards_capacity
) {
char input_dir[] = "/dev/input";
i32 input_dir_fd = open(input_dir, O_RDONLY, 0);
if (input_dir_fd < 0) {
return -1;
}
void *dents = alloc(&temp_arena, 1024);
char path_buffer[sizeof(input_dir) + 1024];
for (i32 i = 0; i < sizeof(input_dir); ++i) {
path_buffer[i] = input_dir[i];
}
path_buffer[sizeof(input_dir) - 1] = '/';
char *name_buffer = &path_buffer[sizeof(input_dir)];
i64 dents_pos = 0;
i64 dents_len = 0;
i32 keyboards_len = 0;
while (keyboards_len < keyboards_capacity) {
if (dents_pos >= dents_len) {
dents_len = getdents(input_dir_fd, dents, 1024);
if (dents_len <= 0) {
break;
}
}
struct dirent *dent = (void *)((char *)dents + dents_pos);
i32 dent_name_len = dent->reclen - (dent->name - (char *)dent);
for (i32 i = 0; i < den_name_len; ++i) {
name_buffer[i] = dent->name[i];
}
dents_pos += dent->reclen;
i32 keyboard_fd = open(path_buffer, O_RDONLY, 0);
if (keyboard_fd >= 0 && !is_keyboard(keyboard_fd)) {
close(keyboard_fd);
continue;
}
i32 error = ioctl(
keyboard_fd,
IOCTL_WRITE,
IOCTL_EV,
EV_IOCTL_GRAB,
sizeof(u32),
(char *)1
);
if (error != 0) {
close(keyboard_fd);
continue;
}
keyboards[keyboards_len] = keyboard_fd;
keyboards_len += 1;
}
close(input_dir_fd);
return keyboards_len;
}
Acquiring a keyboard involves
making another ioctl
call, this time indicating that our process wants to take
sole control over reading input events. Each acquired keyboard file
descriptor will be written into a caller provided keyboards
array output
parameter.
It may seem strange that we use an arena to dynamically allocate
memory for the fixed size 1024 byte dirents
buffer. The reason for
this is subtle and directly related to the effective type issue discussed in
the making memory section above.
During the getdents
syscall the operating system will
write zero or more struct dirent
entries into the memory pointed to by
dents
. Since we don’t a priori know the
size of each entry (struct dirent
has a flexible array member) we can’t
create a stack object with the correct type declared.
The memory pointed to by the void *
returned from alloc
has no
declared type and there are no writes visible to our translation unit.
Thus we are free to read the memory via a struct dirent
pointer.
Now that we have at least one keyboard, we can poll for key press events.
struct input_event {
struct timespec time;
u16 type;
u16 code;
i32 value;
};
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_MMAP,
MAIN_ERROR_WRITE_STDOUT,
MAIN_ERROR_CLOCK_GETTIME,
MAIN_ERROR_POLL,
MAIN_ERROR_OPEN_KEYBOARD,
MAIN_ERROR_READ_KEYBOARD,
};
i32 main(i32 argc, char **argv) {
i64 arena_size = 2000 * 4096;
char *mem = mmap(
0,
arena_size,
PROT_WRITE | PROT_READ,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (mem == 0) {
return MAIN_ERROR_MMAP;
}
struct arena arena = { .start = mem, .end = mem + arena_size };
char greeting[] = "Press ESC exit.\n";
i64 len = write(STDOUT, greeting, sizeof(greeting) - 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
struct timespec last, now;
i32 error = clock_gettime(CLOCK_MONOTONIC, &last);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
i32 keyboards[32];
i32 keyboards_len = open_keyboards(
arena,
keyboards,
sizeof(keyboards) / sizeof(*keyboards)
);
if (keyboards_len <= 0) {
return MAIN_ERROR_OPEN_KEYBOARD;
}
struct input_event keyboard_events[32];
struct pollfd keyboard_pollfds[32];
for (i32 i = 0; i < keyboards_len; ++i) {
keyboard_pollfds[i].fd = keyboards[i];
keyboard_pollfds[i].events = POLLIN;
}
while (1) {
i64 events = poll(keyboard_pollfds, keyboards_len, 0);
if (events < 0) {
return MAIN_ERROR_POLL;
}
for (i32 i = 0; i < keyboards_len; ++i) {
if (keyboard_pollfds[i].revents == 0) {
continue;
}
i32 keyboard_fd = keyboard_pollfds[i].fd;
i64 len = read(
keyboard_fd,
(char *)keyboard_events,
sizeof(keyboard_events)
);
if (len < 0) {
return MAIN_ERROR_READ_KEYBOARD;
}
for (i32 i = 0; i < len / (i64)sizeof(*keyboard_events); ++i) {
struct input_event *keyboard_event = &keyboard_events[i];
if (keyboard_event->type == 1 && keyboard_event->value == 1) {
switch (keyboard_event->code) {
case KEY_ESC:
return MAIN_ERROR_NONE;
default:
continue;
}
}
}
}
i32 error = clock_gettime(CLOCK_MONOTONIC, &now);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
if (time_since_ns(&now, &last) >= 1000L * 1000L * 1000L) {
last = now;
len = write(STDOUT, ".", 1);
if (len < 0) {
return MAIN_ERROR_WRITE_STDOUT;
}
}
}
return MAIN_ERROR_NONE;
}
Our program will now print a dot every second until the user presses the ESC
key. Note that this program needs to have permissions to open and acqure
keyboard devices from /dev/input
, so non-root users will likely need to
run with sudo ./dumb_cycle
.
Testing graphics: a basic Linux virtual machine
The DumbCycle game will run directly on the Linux Direct Rendering Manager (DRM) driver and take over an entire screen along with the system keyboard. Because of this, it is not desirable (or generally possible) to test the game on a system that is already running a traditional X11 or Wayland window manager.
Luckliy, building a bare-bones Linux system and running it in a QEMU virtual machine is fairly simple to acomplish. Testing the game on such a basic system will also help to ensure that the game only depends on the Linux kernel and the corresponding drivers.
In order to boot a Linux system to run our game we need a kernel and a method
to set up /dev/
with at least /dev/dri/
and /dev/input/
.
If we want to do any sort of debugging, even “printf debugging,” we will also
need a shell and some command line utilites.
I have precompiled binaries available for the linux kernel and
toybox, but source is available to build each from scratch. We’ll add
the following rule to our Makefile
to download and extract the vm
directory.
vm:
curl -o vm.tar.gz https://musing.permutationlock.com/static/vm.tar.gz
tar -xvf vm.tar.gz
After running make vm
you should have a vm
directory containing a kernel
and an inital RAM filesystem. The filesystem
contains a /bin
directory set up with toybox and an init
script to
mount the /dev
, /proc
, and /sys
directories and run dumb_cycle
in a /bin/sh
shell.
The easiest way to get graphical output from a QEMU virtual
machine is to use a vnc client like tiger vnc which provides the
vncviewer
binary.
We will add a make test
directive to our Makefile
that builds the game binary, copies it into the VM’s filesystem, runs it
in the VM, and views it with vncviewer
.
test: dumb_cycle vm
cp dumb_cycle vm/fs/bin/dumb_cycle
cd vm; ./mkinitfs.sh
qemu-system-x86_64 -kernel vm/vmlinuz -initrd vm/initramfs \
-vga std -no-reboot & sleep 1 && vncviewer :5900
The long road to pixels: connectors, encoders, and CRTCs
With a runtime entry point, file input and output, memory allocation, ioctl
requests, keyboard input, and a virtual machine test environment, we are
finally ready to look at drawing to the screen.
In order to draw to the screen on a modern Linux system we will need to use the
Direct Rendering Manager (DRM) subsystem, specifically Kernel Mode Setting
(KMS).
The standard user space library for interacting with the DRM is
libdrm.
The code that we will implement in this section was designed
using the libdrm
C source code as an API reference.
Drawing a picture to the screen requires a DRM object called a CRTC,
a legacy acronym for Cathode-Ray Tube Controller. A CRTC connects a buffer
of pixel data to a display. To get a CRTC working we’ll need to open a video
card (generally found in /dev/dri/
), find a connector for the card, find
an encoder for the connector, and finally retrieve a CRTC for
the encoder.
The first step is to get the available resources for our
video card. We will use /dev/dri/card0
since this will be the correct
card to use for the majority of systems.
enum ioctl_type {
IOCTL_EV = (i32)'E',
IOCTL_DRM = (i32)'d',
};
enum drm_ioctl {
DRM_IOCTL_MODE_GET_RESOURCES = 0xa0,
};
struct drm_mode_resources {
u32 *fbs;
u32 *crtcs;
u32 *connectors;
u32 *encoders;
u32 fbs_len;
u32 crtcs_len;
u32 connectors_len;
u32 encoders_len;
u32 min_width;
u32 max_width;
u32 min_height;
u32 max_height;
};
static struct drm_mode_resources *drm_mode_get_resources(
struct arena *arena,
i32 fd
) {
struct drm_mode_resources prev_res;
struct drm_mode_resources *res;
struct arena temp_arena;
i32 error;
do {
temp_arena = *arena;
res = alloc(&temp_arena, sizeof(*res));
if (res == 0) {
return 0;
}
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_RESOURCES,
sizeof(*res),
(char *)res
);
if (error != 0) {
return 0;
}
prev_res = *res;
if (res->fbs_len > 0) {
res->fbs = alloc(&temp_arena, res->fbs_len * sizeof(*res->fbs));
if (res->fbs == 0) {
return 0;
}
}
if (res->crtcs_len > 0) {
res->crtcs = alloc(
&temp_arena,
res->crtcs_len * sizeof(*res->crtcs)
);
if (res->crtcs == 0) {
return 0;
}
}
if (res->connectors_len > 0) {
res->connectors = alloc(
&temp_arena,
res->connectors_len * sizeof(*res->connectors)
);
if (res->connectors == 0) {
return 0;
}
}
if (res->encoders_len > 0) {
res->encoders = alloc(
&temp_arena,
res->encoders_len * sizeof(*res->encoders)
);
if (res->encoders == 0) {
return 0;
}
}
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_RESOURCES,
sizeof(*res),
(char *)res
);
if (error != 0) {
return 0;
}
} while (
prev_res.fbs_len < res->fbs_len ||
prev_res.crtcs_len < res->crtcs_len ||
prev_res.connectors_len < res->connectors_len ||
prev_res.encoders_len < res->encoders_len
);
*arena = temp_arena;
return res;
}
enum main_error {
MAIN_ERROR_NONE = 0,
MAIN_ERROR_MMAP,
MAIN_ERROR_OPEN_CARD0,
MAIN_ERROR_DRM_GET_RESOURCES,
};
i32 main(i32 argc, char **argv) {
i64 arena_size = 2000 * 4096;
char *mem = mmap(
0,
arena_size,
PROT_WRITE | PROT_READ,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (mem == 0) {
return MAIN_ERROR_MMAP;
}
struct arena arena = { .start = mem, .end = mem + arena_size };
i32 card_fd = open("/dev/dri/card0", O_RDWR, 0);
if (card_fd < 0) {
return MAIN_ERROR_OPEN_CARD0;
}
struct drm_mode_resources *res = drm_mode_get_resources(&arena, card_fd);
if (res == 0) {
return MAIN_ERROR_DRM_GET_RESOURCES;
}
return MAIN_ERROR_NONE;
}
To get the card resources we first need to allocate arrays to store ids for
frame buffers, crtcs, connectors, and encoders. We make an initial ioctl
call to discover the size of each array,
then we allocate each array. Finally, we make a second ioctl
call to populate the arrays with data.
Unfortunately, it may be the case that the array sizes changed between the two
ioctl
calls such that they no longer
fit in the space we allocated. In this case the OS will not have populated the
arrays, but instead will have simply modified the length entries.
We'll have to discard the allocated memory and start again if this happens.
Only once the second ioctl
call has successfully populated the arrays do we commit the memory allocations
by setting the permanent arena
equal to our temp_arena
.
With our video card resources successfully retrieved, we need to find a valid connector.
enum drm_ioctl {
DRM_IOCTL_MODE_GET_RESOURCES = 0xa0,
DRM_IOCTL_MODE_GET_CONNECTOR = 0xa7,
};
enum drm_mode {
DRM_MODE_CONNECTED = 1,
};
struct drm_mode_modeinfo {
u32 clock;
u16 hdisplay;
u16 hsync_start;
u16 hsync_end;
u16 htotal;
u16 hskew;
u16 vdisplay;
u16 vsync_start;
u16 vsync_end;
u16 vtotal;
u16 vscan;
u32 vrefresh;
u32 flags;
u32 type;
char name[32];
};
struct drm_mode_connector {
u32 *encoders;
struct drm_mode_modeinfo *modes;
u32 *props;
u64 *prop_values;
u32 modes_len;
u32 props_len;
u32 encoders_len;
u32 encoder_id;
u32 connector_id;
u32 connector_type;
u32 connector_type_id;
u32 connection;
u32 mm_width;
u32 mm_height;
u32 subpixel;
u32 pad;
};
static struct drm_mode_connector *drm_mode_get_connector(
struct arena *arena,
i32 fd,
u32 connector_id
) {
struct drm_mode_connector prev_conn;
struct drm_mode_connector *conn;
struct arena temp_arena;
i32 error;
do {
temp_arena = *arena;
conn = alloc(&temp_arena, sizeof(*conn));
if (conn == 0) {
return 0;
}
conn->connector_id = connector_id;
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_CONNECTOR,
sizeof(*conn),
(char *)conn
);
if (error != 0) {
return 0;
}
prev_conn = *conn;
if (conn->props_len > 0) {
conn->props = alloc(
&temp_arena,
conn->props_len * sizeof(*conn->props)
);
conn->prop_values = alloc(
&temp_arena,
conn->props_len * sizeof(*conn->prop_values)
);
if (conn->props == 0 || conn->prop_values == 0) {
return 0;
}
}
if (conn->modes_len > 0) {
conn->modes = alloc(
&temp_arena,
conn->modes_len * sizeof(*conn->modes)
);
if (conn->modes == 0) {
return 0;
}
}
if (conn->encoders_len > 0) {
conn->encoders = alloc(
&temp_arena,
conn->encoders_len * sizeof(*conn->encoders)
);
if (conn->encoders == 0) {
return 0;
}
}
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_CONNECTOR,
sizeof(*conn),
(char *)conn
);
if (error != 0) {
return 0;
}
} while (
prev_conn.props_len < conn->props_len ||
prev_conn.modes_len < conn->modes_len ||
prev_conn.encoders_len < conn->encoders_len
);
*arena = temp_arena;
return conn;
}
The drm_mode_get_connector
function works in an almost identical way to
drm_mode_get_resources
.
To find a connector to draw to we’ll need to loop over the available connectors until we hit one that is connected and has at least one valid display mode.
enum main_error {
// ...
MAIN_ERROR_DRM_FIND_CONNECTOR,
};
u32 conn_index;
struct drm_mode_connector *conn = 0;
for (conn_index = 0; conn_index < res->connectors_len; ++conn_index) {
conn = drm_mode_get_connector(
&arena,
card_fd,
res->connectors[conn_index]
);
if (conn == 0) {
continue;
}
if (conn->connection == DRM_MODE_CONNECTED && conn->modes_len != 0) {
break;
}
}
if (conn_index == res->connectors_len || conn == 0) {
return MAIN_ERROR_DRM_FIND_CONNECTOR;
}
We can retreive an encoder in a similar manner.
enum drm_ioctl {
DRM_IOCTL_MODE_GET_RESOURCES = 0xa0,
DRM_IOCTL_MODE_GET_CONNECTOR = 0xa7,
DRM_IOCTL_MODE_GET_ENCODER = 0xa6,
};
struct drm_mode_encoder {
u32 encoder_id;
u32 encoder_type;
u32 crtc_id;
u32 possible_crtcs;
u32 possible_clones;
};
static struct drm_mode_encoder *drm_mode_get_encoder(
struct arena *arena,
i32 fd,
u32 encoder_id
) {
struct arena temp_arena = *arena;
struct drm_mode_encoder *enc = 0;
enc = alloc(&temp_arena, sizeof(*enc));
if (enc == 0) {
return 0;
}
enc->encoder_id = encoder_id;
i32 error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_ENCODER,
sizeof(*enc),
(char *)enc
);
if (error != 0) {
return 0;
}
*arena = temp_arena;
return enc;
}
enum main_error {
// ...
MAIN_ERROR_DRM_GET_ENCODER,
};
struct drm_mode_encoder *enc = drm_mode_get_encoder(
&arena,
card_fd,
conn->encoder_id
);
if (enc == 0) {
return MAIN_ERROR_DRM_GET_ENCODER;
}
In order to draw pixels to the screen we will be creating a “dumb buffer” that contains an array of 32 bit pixels sized according to the resolution of the display mode.
Dumb buffers are so named because they ask the driver to naively copy pixel
data from CPU memory to the display and do not take
advantage of graphics acceleration hardware. Drawing to the screen with GPU
hardware requires specific ioctl
APIs that differ significantly depending on
the type of graphics card used. Such GPU APIs involve a lot more work to set
up than the simple pixel arrays that we will use.
enum drm_ioctl {
DRM_IOCTL_MODE_GET_RESOURCES = 0xa0,
DRM_IOCTL_MODE_GET_CONNECTOR = 0xa7,
DRM_IOCTL_MODE_GET_ENCODER = 0xa6,
DRM_IOCTL_MODE_ADD_FB = 0xae,
DRM_IOCTL_MODE_CREATE_DUMB = 0xb2,
DRM_IOCTL_MODE_MAP_DUMB = 0xb3,
};
struct drm_mode_create_dumb {
u32 height;
u32 width;
u32 bpp;
u32 flags;
u32 handle;
u32 pitch;
u64 size;
};
struct drm_mode_map_dumb {
u32 handle;
u32 pad;
i64 offset;
};
struct drm_mode_fb_cmd {
u32 fb_id;
u32 width;
u32 height;
u32 pitch;
u32 bpp;
u32 depth;
u32 handle;
};
struct drm_mode_dumb_buffer {
u32 width;
u32 height;
u32 stride;
u32 handle;
u32 fb_id;
u32 *map;
u64 size;
};
static struct drm_mode_dumb_buffer *drm_mode_create_dumb_buffer(
struct arena *arena,
i32 fd,
u32 width,
u32 height
) {
struct drm_mode_create_dumb creq = {
.width = width,
.height = height,
.bpp = 32,
};
i32 error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_CREATE_DUMB,
sizeof(creq),
(char *)&creq
);
if (error != 0) {
return 0;
}
struct drm_mode_fb_cmd fb_cmd = {
.width = width,
.height = height,
.pitch = creq.pitch,
.bpp = 32,
.depth = 24,
.handle = creq.handle,
};
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_ADD_FB,
sizeof(fb_cmd),
(char *)&fb_cmd
);
if (error != 0) {
return 0;
}
struct drm_mode_map_dumb mreq = { .handle = creq.handle };
error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_MAP_DUMB,
sizeof(mreq),
(char *)&mreq
);
if (error != 0) {
return 0;
}
u32 *mem = mmap(
0,
(i64)creq.size,
PROT_READ | PROT_WRITE,
MAP_SHARED,
fd,
mreq.offset
);
if (mem == 0) {
return 0;
}
struct drm_mode_dumb_buffer *buf = alloc(arena, sizeof(*buf));
buf->width = width;
buf->height = height;
buf->stride = creq.pitch / sizeof(u32);
buf->size = creq.size / sizeof(u32);
buf->handle = creq.handle;
buf->map = mem;
buf->fb_id = fb_cmd.fb_id;
for (u64 i = 0; i < buf->size; ++i) {
buf->map[i] = 0;
}
return buf;
}
enum main_error {
// ...
MAIN_ERROR_DRM_CREATE_DUMB_BUFFER,
};
struct drm_mode_dumb_buffer *buf = drm_mode_create_dumb_buffer(
&arena,
card_fd,
conn->modes[0].hdisplay,
conn->modes[0].vdisplay
);
if (buf == 0) {
return MAIN_ERROR_DRM_CREATE_DUMB_BUFFER;
}
The last step before we can see pixels on the screen is to get a CRTC and attach the dumb frame buffer that we just created.
enum drm_ioctl {
DRM_IOCTL_MODE_GET_RESOURCES = 0xa0,
DRM_IOCTL_MODE_GET_CONNECTOR = 0xa7,
DRM_IOCTL_MODE_GET_ENCODER = 0xa6,
DRM_IOCTL_MODE_ADD_FB = 0xae,
DRM_IOCTL_MODE_CREATE_DUMB = 0xb2,
DRM_IOCTL_MODE_MAP_DUMB = 0xb3,
DRM_IOCTL_MODE_GET_CRTC = 0xa1,
DRM_IOCTL_MODE_SET_CRTC = 0xa2,
};
struct drm_mode_crtc {
u32 *set_connectors;
u32 connectors_len;
u32 crtc_id;
u32 fb_id;
u32 x;
u32 y;
u32 gamma_size;
u32 mode_valid;
struct drm_mode_modeinfo mode;
};
static struct drm_mode_crtc *drm_mode_get_crtc(
struct arena *arena,
i32 fd,
u32 crtc_id
) {
struct arena temp_arena = *arena;
struct drm_mode_crtc *crtc = 0;
crtc = alloc(&temp_arena, sizeof(*crtc));
if (crtc == 0) {
return 0;
}
crtc->crtc_id = crtc_id;
i32 error = ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_GET_CRTC,
sizeof(*crtc),
(char *)crtc
);
if (error != 0) {
return 0;
}
*arena = temp_arena;
return crtc;
}
static i32 drm_mode_set_crtc(
i32 fd,
struct drm_mode_crtc *crtc,
u32 *connectors,
u32 connectors_len,
u32 fb_id
) {
crtc->set_connectors = connectors;
crtc->connectors_len = connectors_len;
crtc->fb_id = fb_id;
return ioctl(
fd,
IOCTL_RDWR,
IOCTL_DRM,
DRM_IOCTL_MODE_SET_CRTC,
sizeof(*crtc),
(char *)crtc
);
}
enum main_error {
// ...
MAIN_ERROR_DRM_GET_CRTC,
MAIN_ERROR_DRM_SET_CRTC,
};
struct drm_mode_crtc *crtc = drm_mode_get_crtc(
&arena,
card_fd,
enc->crtc_id
);
if (crtc == 0) {
return MAIN_ERROR_DRM_GET_CRTC;
}
crtc->mode = conn->modes[0];
i32 error = drm_mode_set_crtc(
card_fd,
crtc,
&conn->connector_id,
1,
buf->fb_id
);
if (error != 0) {
return MAIN_ERROR_DRM_SET_CRTC;
}
Finally, we can add our keyboard event loop back in and insert code to write color values to the dumb buffer pixels.
struct timespec last, now;
error = clock_gettime(CLOCK_MONOTONIC, &last);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
i32 keyboards[32];
i32 keyboards_len = open_keyboards(
arena,
keyboards,
sizeof(keyboards) / sizeof(*keyboards)
);
if (keyboards_len <= 0) {
return MAIN_ERROR_OPEN_KEYBOARD;
}
struct input_event keyboard_events[32];
struct pollfd keyboard_pollfds[32];
for (i32 i = 0; i < keyboards_len; ++i) {
keyboard_pollfds[i].fd = keyboards[i];
keyboard_pollfds[i].events = POLLIN;
}
u32 color = 0;
while (1) {
i64 events = poll(keyboard_pollfds, keyboards_len, 0);
if (events < 0) {
return MAIN_ERROR_POLL;
}
for (i32 i = 0; i < keyboards_len; ++i) {
if (keyboard_pollfds[i].revents == 0) {
continue;
}
i32 keyboard_fd = keyboard_pollfds[i].fd;
i64 len = read(
keyboard_fd,
(char *)keyboard_events,
sizeof(keyboard_events)
);
if (len < 0) {
return MAIN_ERROR_READ_KEYBOARD;
}
for (i32 i = 0; i < len / (i64)sizeof(*keyboard_events); ++i) {
struct input_event *keyboard_event = &keyboard_events[i];
if (keyboard_event->type == 1 && keyboard_event->value == 1) {
switch (keyboard_event->code) {
case KEY_ESC:
return MAIN_ERROR_NONE;
default:
continue;
}
}
}
}
i32 error = clock_gettime(CLOCK_MONOTONIC, &now);
if (error) {
return MAIN_ERROR_CLOCK_GETTIME;
}
if (time_since_ns(&now, &last) >= 100L * 1000L * 1000L) {
last = now;
color += 5;
for (u32 i = 0; i < buf->size; ++i) {
buf->map[i] = color;
}
}
}
The program should fill the screen with a color that slowly changes from
black to blue and exits when the user presses ESC
.
Gaming at last: drawing the game state
We are going to create a game that loosely resembles Snake or the LightCycle
game from Tron. The player will constantly be
moving in one of the four cardinal directions, leaving a trail behind them. If
they ever hit the trail or if they hit the edge of the screen, then the game is
over. The direction of movement can be controlled by pressing the W
, A
,
S
, D
keys. As with all of the programs above, the game will exit if the user
presses ESC
.
We don’t know the resolution of the screen that our game will run on, so we will
use a 90x90 grid of squares for internal game logic. Each grid square will store
0
to indicate that it is empty or 1
to indicate that the player has left a
trail there.
The game state will update 30
times per second and on each update the player
will move 1
square in the direction of their current velocity. The game will
also render to the screen each update. The maximum integer multiple of a 90x90
grid that fits inside the screen resolution will be computed and then a scaled
version of the game board will be drawn into the dumb buffer memory.
enum color {
COLOR_BLUE = 0x0000ff,
COLOR_GRAY = 0xededed
};
struct game_state {
i32 x;
i32 y;
i32 vx;
i32 vy;
i32 dead;
char board[90 * 90];
};
static void update_game(struct game_state *state) {
state->board[state->y * 90 + state->x] = 1;
state->y += state->vy;
state->x += state->vx;
if (
state->board[state->y * 90 + state->x] != 0 ||
state->x == 89 ||
state->x == 0 ||
state->y == 89 ||
state->y == 0
) {
state->dead = 1;
}
state->board[state->y * 90 + state->x] = 1;
}
static void draw_game(
struct drm_mode_dumb_buffer *buf,
struct game_state *state,
u32 x,
u32 y,
u32 scale
) {
for (u32 i = 0; i < 90; ++i) {
for (u32 yoff = 0; yoff < scale; ++yoff) {
u32 cy = cy = y + i * scale + yoff;
for (u32 j = 0; j < 90; ++j) {
for (u32 xoff = 0; xoff < scale; ++xoff) {
u32 cx = x + j * scale + xoff;
u32 pixel_index = cy * buf->stride + cx;
if (state->board[i * 90 + j] == 0) {
buf->map[pixel_index] = (u32)COLOR_GRAY;
} else {
buf->map[pixel_index] = (u32)COLOR_BLUE;
}
}
}
}
}
}
enum main_error {
// ...
MAIN_ERROR_OPEN_KEYBOARD,
MAIN_ERROR_READ_KEYBOARD,
MAIN_ERROR_CLOCK_GETTIME,
MAIN_ERROR_POLL,
};
u32 square_len = (buf->height > buf->width) ? buf->width : buf->height;
u32 scale = square_len / 90;
u32 board_size = square_len - (square_len % 90);
u32 board_x = (buf->width / 2) - (board_size / 2);
u32 board_y = (buf->height / 2) - (board_size / 2);
struct game_state game_state;
clear_game(&game_state);
while (1) {
poll(keyboard_pollfds, keyboards_len, 0);
for (i32 i = 0; i < keyboards_len; ++i) {
if (keyboard_pollfds[i].revents == 0) {
continue;
}
i32 keyboard_fd = keyboard_pollfds[i].fd;
i64 len = read(
keyboard_fd,
(char *)keyboard_events,
sizeof(keyboard_events)
);
if (len < 0) {
return MAIN_ERROR_READ_KEYBOARD;
}
for (i32 j = 0; j < len / (i64)sizeof(*keyboard_events); ++j) {
struct input_event *keyboard_event = &keyboard_events[j];
if (keyboard_event->type != 1 || keyboard_event->value != 1) {
continue;
}
switch (keyboard_event->code) {
case KEY_ESC:
return MAIN_ERROR_NONE;
case KEY_A:
if (game_state.vx <= 0) {
game_state.vx = -1;
game_state.vy = 0;
}
break;
case KEY_D:
if (game_state.vx >= 0) {
game_state.vx = 1;
game_state.vy = 0;
}
break;
case KEY_W:
if (game_state.vy <= 0) {
game_state.vx = 0;
game_state.vy = -1;
}
break;
case KEY_S:
if (game_state.vy >= 0) {
game_state.vx = 0;
game_state.vy = 1;
}
break;
default:
continue;
}
}
}
error = clock_gettime(CLOCK_MONOTONIC, &now);
if (error !=0) {
return MAIN_ERROR_CLOCK_GETTIME;
}
if (time_since_ns(&now, &last) > 33L * 1000L * 1000L) {
last = now;
update_game(&game_state);
draw_game(buf, &game_state, board_x, board_y, scale);
if (game_state.dead) {
clear_game(&game_state);
}
}
}
Enjoy DumbCycling!
Avoiding partial updates: double buffering
It’s good practice to draw game updates into a buffer that isn’t currently being
displayed to the screen so that a partial screen state is never visible.
To accomplish this we simply need to create a second dumb frame buffer and
swap the buffers every frame by calling drm_mode_set_crtc
.
u32 buf_index = 0;
struct drm_mode_dumb_buffer *bufs[2];
bufs[0] = drm_mode_create_dumb_buffer(
&arena, card_fd, conn->modes[0].hdisplay, conn->modes[0].vdisplay
);
bufs[1] = drm_mode_create_dumb_buffer(
&arena, card_fd, conn->modes[0].hdisplay, conn->modes[0].vdisplay
);
if (bufs[0] == 0 || bufs[1] == 0) {
return MAIN_ERROR_DRM_CREATE_DUMB_BUFFER;
}
i32 error = drm_mode_set_crtc(
card_fd,
crtc,
&conn->connector_id,
1,
bufs[buf_index]->fb_id
);
if (error != 0) {
return MAIN_ERROR_DRM_SET_CRTC;
}
buf_index ^= 1;
u32 width = bufs[0]->width;
u32 height = bufs[0]->height;
u32 square_len = (height > width) ? width : height;
u32 scale = square_len / 90;
u32 board_size = square_len - (square_len % 90);
u32 board_x = (width / 2) - (board_size / 2);
u32 board_y = (height / 2) - (board_size / 2);
if (time_since_ns(&now, &last) >= 33L * 1000L * 1000L) {
last = now;
update_game(&game_state);
draw_game(bufs[buf_index], &game_state, board_x, board_y, scale);
if (game_state.dead) {
clear_game(&game_state);
}
error = drm_mode_set_crtc(
card_fd,
crtc,
&conn->connector_id,
1,
bufs[buf_index]->fb_id
);
if (error != 0) {
return MAIN_ERROR_DRM_SET_CRTC;
}
buf_index ^= 1;
}
Now you can enjoy DumbCycling without screen tears!