My thoughts on writing a Minecraft server from scratch (in Bash)

For the past year or so, I've been thinking about writing a Minecraft server in Bash as a thought excercise. I once tried that before with the Classic protocol (the one from 2009), but I quickly realized there wasn't really a way to properly parse binary data in bash. Take the following code sample:

function a() { read -n 2 uwu echo "$uwu" | xxd }

This would read two bytes into a variable, and then pass them to `xxd`, which should show the hexdump of the data.

Picture 1 - bash's lack of support for nullbytes

Everything's great, until we pass a nullbyte (0x00). Not only does Bash ignore nullbytes in strings, it also doesn't present any way to detect that a nullbyte has occured. Considering that the protocol I'm trying to implement is strictly binary, this can severely mangle the data.




One rainy evening in late January, I've had a revelation. What if I reversed the order of that function? If the binary data never reaches a variable (or, more precisely, a substitution), and just stays inside a pipe, can it pass nullbytes around?

Picture 2 - reading nullbytes with xxd

The answer is yes! After some iterations, I decided to use `dd` passed to `xxd` instead of just `xxd`, because this way I can finetune how many bytes to read.

# the $len variable is assigned earlier, basing on a similar read function a=$(dd count=$len bs=1 status=none | xxd -p)

This gave me a hex string, on which I could do pattern matching, pattern replace, data extraction... and more. Sending out responses could be done analogically, using xxd's Reverse switch.

`ncat` is used for listening on Minecraft's default TCP port. It launches the main shell script (`mc.sh`) after it receives an incoming connection.

The Protocol Is Not Really Good, Actually

Note: the following section contains mostly my ramblings about implementing number conversion routines in Bash; If this does not interest you, feel free to skip it.

The first thing one should implement for a Minecraft server to function would be the Server List Ping packet - not because it's required (heck, your server can just not reply to it properly, and you'd still be able to join the game), but because it's the easiest to tackle first. It helps to familiarize yourself with core protocol concepts, such as data types types:

VarInts and VarLongs

Most data types were trivial to implement, but some gave me more of a fight than others - notably the IEEE754 floating point numbers (more on them later), and so-called VarInt/VarLong numbers. Those may be familar to those acquainted with the MQTT protocol, as they're just a modified version of the LEB128 encoding.

LEB128 is a compression scheme for integers. By splitting a byte into 1 signalling bit and 7 data bits, the scheme stores the number length. If the 1st bit is 0, then this byte is the last one; else, then there's another byte after this one. Great scheme if most of your numbers are either between 0 and 127 or 256 and 16383, otherwise it's `buy one byte, get one free` situation, because numbers that would otherwise fit in a byte get pushed out to the next one by a single bit.

Picture 3 - explanation of basic LEB128 in a drawing form; red bits are signalling bits, green bits are data bits. Input value is 0xFF (256), output value is 0xFF01 # from src/int.sh # int2varint(int) function int2varint() { local a local b local c local out out=$(printf '%02x' "$1") if [[ $1 -lt 128 ]]; then : elif [[ $1 -lt 16384 ]]; then a=$(($1%128)) b=$(($1/128)) out=$(printf "%02x" $((a+128)))$(printf "%02x" $b) elif [[ $1 -lt $((128*128*128)) ]]; then a=$(($1%128)) c=$((($1/128)%128)) b=$(($1/16384)) out=$(printf "%02x" $((a+128)))$(printf "%02x" $((c+128)))$(printf "%02x" $b) fi echo -n "$out" }

I've had problems translating the reference implementation to Bash, so instead I played with the protocol enough to write my own from scratch. I figured out that it was basically a modulo and a division in a trenchcoat, which I used to my advantage in the code snippet above.

I took a more contemporary approach on the decoder, using an AND, and then multiplying the result - similarly to how the reference did it.

LEB128 definitely wasn't the hardest or the most annoying to implement (that one goes to IEEE754 floating point); I still don't like how it is sprinkled in random places inside the protocol, interleaved with regular ints (and longs), and in some cases even signed shorts.

IEEE 754 Floating Point numbers

I'm not a math person. When I see the exponential notation spewed out by Python, I scream and run. This may be the main cause of why I hated implementing these floating point converters. I won't be going too deep into specifics of how this format works - instead, I recommend you check out this wikipedia page.

The basic implementation requires a loop, inside of which there's a negative power applied to the result; Bash doesn't natively support negative powers, which sent me on a trip to find a utility that does.

A suggestion I found while duckduckgoing was to use perl, but I consider that cheating. Alternatively, tried using `bc`, but it seems that either it doesn't support powers at all, or the busybox version does not. Bummer.

When I was about to give up, I got reminded that Kate once made a plot program in awk. Surely, awk has powers? ~~Maybe even super cow powers?~~ It turns out that it does!

$ echo '' | awk '{print (2**-1)}' 0.5

With this knowledge, I scribbled a working implementation and attached it to data decoded from the Player Move packet. In a trial run, the client sent around 50-100 packets like that, each one with three doubles (X, Y, Z). It turned out that the conversion function was so slow, that the server wasn't done with that workload after multiple minutes - something rather unacceptable for a real-time game.

The easiest solution to lowering the response time would be lowering the amount of calls to external binaries, such as awk. As most of my workload was already inside a bash `for` loop, I just moved the loop inside `awk`, which has saved me literally tens of calls to awk.

# (...) asdf=$(cut -c 13- <<< $val | sed -E 's/./,&/g;s/,//' | tr -d 'n' | awk -F , '{ power_count=-1 x=0; for(i=1; i<=NF; i++) { x=(x + ($i * (2 ** power_count))) power_count=power_count-1; } print(x+1) }') # (...)

The conversion is still quite slow (it takes ~10ms on my Xeon E5-2680v2), but this is to be expected with bash scripts. For a cheap comparsion, the old version took around 350ms, but I don't have solid measurements to prove that. ~~still, 35x faster, woo!~~

"Position" data type

Finally, something made up by Mojang themselves! Position is a 64-bit Long value, where three coordinates are stored alongside each other: X gets 26 most significant bits, Z gets 26 middle bits, Y gets 12 least significant bits. I'm not the biggest fan of weird data types like this one, but it was crazy easy to implement in Bash, because it has all the needed bitshift operators.

The worst part about this data type is that it doesn't actually get used much. Around half of the packets store X, Y and Z coordinates as separate Double vaules. This means that:

I kinda see the reasoning as to why it's like that, but I still don't like the current state. Normal server communication uses zlib anyways, and you realistically won't ever need more than two (or maybe three max) digits of decimal precision to describe a position within a block.

Named Binary Tag

Lastly, there's the NBT format, also an internal thing made by Mojang Hatsune Miku herself. NBT is like JSON, but for binary data. Not unlike JSON, it gets abused to store arbitrary data beyond spec - for example, Mojang stores bitstreams of variable length as an array of Longs; if such array isn't long-aligned, or even byte-aligned, the last Long is padded with zeroes.

At one point I've had a NBT parser implementation implemented almost fully, but I decided it was not worth the hassle to finish it. The code is currently lost, due to my extensive use of tmpfs as a project directory, and a system crash.

Writing the actual server

With all the math out of the way, here comes the *actually fun* part. I documented some of my adventures on Twitter, but that thread was merely a glimpse on the actual development process. Also, let's assume that we already have the Server Ping packet out of our way, and it's a matter of actually making the game joinable now.

To allow a client to join your server, it has to complete the handshake process and send a few extra packets (chunks, player position, inventory, join game). Two biggest obstacles on that course were the Join Game packet, and the data structure within the Chunk packets.

Join game

The join game sends some initial metadata: player's entity ID, gamemode, some information about the world and, since ~1.16, a "Dimension codec". This is a NBT Compound, containing the following fields:

Picture 4 - What if a client gets an empty NBT Compound? It helpfully dumps out all the required values

The Dimension codec part was a major pain to implement. For my purposes, I decided to retrieve that NBT field from a vanilla server. It's the only binary blob in this implementation, and while it could be reimplemented, I don't see any reason to reimplement something that I don't necessarily need (or want) to customize.

Chunk Data And Update Light

At first glance, this packet looks huge and scary! If you have the table from the link above open side by side with this article, I invite you to imagine that every one of those huge BitSet fields is actually just `0x00`, and that you don't need to send the Block Entity field at all. This leaves us with X, Y, heightmaps (which are fancily encoded repetitions of `b000000010`, and could be virtually anything), and the ominous Data field. Less scary, right?

What's the Data?

The Data field is actually an array of Chunk Section. A Chunk Section is 16 by 16 by 16 blocks, and multiple sections can be stacked together to create a Chunk. For our purposes, this array only has a single element, just to simplify the code a bit.

A Chunk Section contains a block count, a block states container, and a biome container. Both of these containers use palette structure to encode possible block values - this means that before the real block data, server has to define a mapping from the "local" block IDs, to "global" block IDs. This aims to squish as much data as possible inside a small space - a block definition can be as small as 4 bits.

As in my opinion the wiki page doesn't explain it well enough to quickly comprehend, here's another drawing: