Embedding Binaries in C(++)
Five ways to bake any file as a buffer into your executable

When programming, it can be desirable to embed the data of arbitrary binary files directly in the final executable of your application. This is great to:
- Provide fallbacks for fonts/textures/… in case files can't be accessed
- Simplify code and making it more robust (by omitting file I/O)
- Distribute self-contained executables
Here is a list of some of the ways you can achieve this (comfortably) in C and C++.
Table of Contents
1. Convert to C Code
By converting a binary file to to a properly formatted char
array, we can simply
include the resulting code in the source code. Programs to do this are xxd
,
convert
or bin2h
.
Example
In the case of xxd
, running xxd -i file.ext
will output C code:
unsigned char file_ext[] = { 0x58, 0x61, 0x59, 0x62, 0x58, 0x37, 0x70, 0x78, 0x32, 0x4e, 0x35, 0x70, 0x41, 0x59, 0x56, 0x39, 0x0a, /* ... */ }; unsigned int file_ext_len = 1234;
If you want to have e.g. a folder of binary files always be at the ready for embedding, you could include the following in your build script:
# generate a C char array for all files in the "res" folder for i in $(ls "res") do xxd -i "res/${i}" | sed -e 1d -e '$d' | sed -e '$d' > "inc/${i}" done
By using some sed
operations and by keeping the generated file under the same
name as the binary, we can #include
the file in a way that reflects my inclusion
of glsl shaders in source code:
unsigned char buffer[] = { #include "texture.png" };
Pros & Cons
- ✅ Should work everywhere
- ✅ Binaries can be included with their original name
- ✅ Buffer as a real array (i.e.
sizeof(buffer)
works) - ❌ Adds a build dependency
- ❌ Adds a precompilation step
- ❌ Generated files are larger in size than the binaries
- ❌ Slows down build times
Improving build times with strliteral
Parsing hex-formatted char arrays from a program like xxd
like above can be slow
for large binaries. It turns out that parsing string literals containing escaped
byte values is much faster1, which is the approach of the tool strliteral.
Since it's contained in a single C file, you can also trivially compile it as part of your build system, which eliminates the disadvantage of an external build dependency.
2. Use the linker
Instead of getting the binary into a compilable format, we can go one step further and "compile" it directly. The output is an object file with predefined symbols that can be linked against.
Programs (or linkers) that can do this are ld
, objcopy
or bin2coff
, bin2obj
on
Windows.
Example
The basic usage with ld
:
ld -r -b binary data.bin -o data.o clang -o main main.c data.o
Can be generalized in your build script like this:
OBJECT_FILES=() for i in $(ls "res") do cd "res" OBJECT_FILES+="${i%.*}.o " ld -r -b binary "${i}" -o "../${i%.*}.o" # OR # objcopy --input binary --output elf64-x86-64 "${i}" "../${i%.*}.o" cd ".." done clang -o main main.c ${OBJECT_FILES[*]}
This will generate symbols in the .o
file that can be accessed like this:
extern const unsigned char _binary_file_ext_start[]; extern const unsigned char _binary_file_ext_end[]; extern const unsigned char _binary_file_ext_size; // NOTE: access with (size_t)&_binary_file_ext_size
Both ld
and objcopy
do not include a way to change these symbol names when
generating the object files, so to make usage in your code a bit more
comfortable, you can define some macros to help you:
#define BINARY_INCLUDE(file, ext) \ extern const unsigned char _binary_##file##_##ext##_start[]; \ extern const unsigned char _binary_##file##_##ext##_end[] #define BINARY_BUFFER(file, ext) _binary_##file##_##ext##_start #define BINARY_BUFFER_SIZE(file, ext) _binary_##file##_##ext##_end - _binary_##file##_##ext##_start
Which makes usage look like this:
BINARY_INCLUDE(data, bin); // filename & ext separated by a comma without quotes int main() { unsigned char* my_buffer = BINARY_BUFFER(data, bin); unsigned int my_buffer_size = BINARY_BUFFER_SIZE(data, bin); }
Pros & Cons
- ✅ No added build dependency (since we already depended on having linker)
- ✅ Faster build times than first option
- ✅ Can specify different types (not just char)
- ✅ Smaller filesizes compared to first option
- ✅ Can be cross-platform…
- ❌ …but may require a different tool for each platform
- ❌ Adds a precompilation step (and arguably more complex than first option)
- ❌ Memory always
const
(i.e. needs amemcpy
to mutate it) - ❌ No real array, just a pointer and size (i.e.
sizeof(buffer)
doesn't work) - ❌ No access to
extern
data or size at compile-time (only after linking) - ❌ Arguably worse ergonomics:
MY_INCLUDE(file, ext)
vs.#include "file.ext"
3. Inline Assembly using .incbin
.incbin
is a GNU directive that can be used in asm
blocks to basically perform
the linking step from before inside the application code:
#define BINARY_ASM_INCLUDE(filename, buffername) \ __asm__(".section .rodata\n" \ ".global " #buffername "\n" \ ".type " #buffername ", @object\n" \ ".align 4\n" \ #buffername":\n" \ ".incbin " #filename "\n" \ #buffername"_end:\n" \ ".global "#buffername"_size\n" \ ".type "#buffername"_size, @object\n" \ ".align 4\n" \ #buffername"_size:\n" \ ".int "#buffername"_end - "#buffername"\n"\ ); \ extern const unsigned char buffername []; \ extern const unsigned char* buffername##_end; \ extern int buffername##_size
Usage code becomes:
BINARY_ASM_INCLUDE("image.png", image_buf); int main() { int width, height, nrChannels; unsigned char* image_data = stbi_load_from_memory(image_buf, image_buf_size, &width, &height, &nrChannels, 0); }
Pros & Cons
Same as the linker option, except…
- ✅ Choose names of buffer and size
- ✅ Better ergonomics: Use buffer and size directly
- ✅ No precompilation step
- ❌ Not cross-platform (GCC & Clang support
.incbin
)
4. Use a library
The library incbin actually uses the previous approach by default and aims to be cross-platform. In case of MSVC, it falls back to using the first option by providing a tool that needs to be compiled and included in your build step2.
The usage code looks basically like this:
#define INCBIN_PREFIX // remove prefix from variables #define INCBIN_STYLE INCBIN_STYLE_SNAKE // data instead of Data #include "incbin.h" INCBIN(song, "music.mp3"); // defines song_data, song_end and song_size
Pros & Cons
Same as the .incbin
option, except…
- ✅ Can be cross-platform
- ✅ No precompilation step…
- ❌ …except for MSVC
- ❌ Adds a dependency
5. Using #embed
A new #embed
directive has been introduced to C233 and
C++264.
It's still too early for me to really use this, but usage-wise, it is supposed to be similar to the first approach:
static const unsigned char embedded_texture[] = { #embed "texture.png" };
This would be the best and fastest option, since it does not introduce a new
preprocessing step and skips the code generation and parsing step. However,
implementation of #embed
in current compilers is not yet wide spread, so it may
not be an option for you.
Pros & Cons
- ✅ Fastest & easiest way
- ❌ Requires modern compiler support
6. Resources
Footnotes
Apparently this is due to fact that the MSVC compiler doesn't support
an .incbin
equivalent in its inline assembly
Comments