Hello Worlds
As a first post for this site, I thought it would be fun to go through the process of looking at a few “Hello, world!” programs in detail, staying closer to the “lower” levels of the software stack. I was inspired by a talk I watched a couple months ago given by Steven Rostedt on Learning the Linux Kernel with Tracing. There’s a lot that you can uncover and learn, even starting with only a “Hello, world!” program.
ARM Assembly
Earlier this summer I wrote a toy Intel 8086 instruction decoder and emulator in Zig, so I’ve become somewhat familiar with the Intel x86 and AMD64 instruction set variants. But I haven’t had the occasion to spend time looking at any ARM assembly. Since my laptop is an Apple M1 Pro MacBook with an ARM processor, I have just the right hardware to play around with this!
A wonderful video came up recently on my YouTube recommendations. Among other topics, it goes over what a minimal “Hello, World!” program in AArch64 assembly on MacOS looks like.
.global _main
.align 4
_main:
mov X0, #1
adr X1, greeting
mov X2, #13
mov X16, #4
svc #0x80
mov X0, #0
mov X16, #1
svc #0x80
greeting: .ascii "Hello, world!"
I recommend watching the video for a more in-depth explanation, but given the short length of the program, it is straightforward to walk though.
- The two
svc #0x80
instructions in the program are used to make syscalls, so the first 4 instructions are used to set up the necessary arguments for the first syscall. - Register
X0
stores 1, to indicate that we’re writing to standard out. - Register
X1
stores the memory address of the string we’re writing. - Register
X2
stores the length of this string, 13. - Register
X16
is used to indicate which specific syscall to make. The value 4 indicates that this is thewrite
syscall. - After the first syscall returns, we put the return value 0 into register
X0
and the value 1 into registerX16
to indicate that the second syscall in the final instruction will beexit
.
Aside from some minor differences like register names and using svc
to make the syscall, this is actually pretty similar to x64 assembly.
To run this on a MacBook, we save this as a file hello.s
then use the assembler as
to assemble it, and the linker ld
to link the object file.
as -o hello.o hello.s
ld -o hello hello.o
The result is the executable hello
, which does as expected when we run it:
$ ./hello
Hello, world!
What’s actually in the executable? I took a look at the file sizes and was a little surprised to see that it was a little over 16KiB, even though the assembly instructions on their own were only 177B.
$ ls -l
total 56
-rwxr-xr-x@ 1 cat staff 16872 Sep 5 22:21 hello
-rw-r--r--@ 1 cat staff 432 Sep 5 22:21 hello.o
-rw-r--r--@ 1 cat staff 177 Sep 5 22:20 hello.s
And looking at the contents with xxd
, I noticed that most of the file is full of 0 bytes. Why is that? The format of the file is Mach-O , which I think of as MacOS’s counterpart to Linux/Unix’s ELF file format.
$ file hello
hello: Mach-O 64-bit executable arm64
Unsurprisingly, we can’t just stuff some instructions into a file and expect to be able to execute it somehow. There’s some amount of program metadata and that needs to be specified so that the operating system can make sense of things.
I was aware of the ELF format prior to this, but didn’t know anything about Mach-O, so I went down a rabbit-hole looking for information. Unfortunately, there doesn’t appear to be a readily publicly available canonical specification from Apple, although there are various forms of older documentation on some archive sites.
However, I did find an awesome series of blog posts from Gregory Anders on Exploring Mach-O. In them, he writes a Zig program for parsing the Mach-O format and examines the structure of an even more minimal ARM assembly program (one that just exits without even writing “Hello, World!”). In part 3 of the series he also answered the question I had about why there were so many 0 bytes in the file (page alignment).
As part of my search, I also came across this post from Kevin Boone on manually constructing a minimally sized “Hello, World” ELF for x64 Linux, getting it down to 384 bytes! It could be fun to do a similar exercise with the Mach-O format to see how small of an executable we could manually construct.
JVM bytecode
Switching gears, let’s look at a Hello World program in a higher level language.
I’ve been writing Java code professionally for some time now, but haven’t spent any significant time exploring the internals of the Java Virtual Machine. What actually goes on in there and how accessible is it? Fortunately, OpenJDK is open source, and the JVM specification is well documented, so we have the information available to find out how it works.
Let’s start with Java’s famous “Hello, world!” program, saved as the file HelloWord.java
.
class HelloWorld {
public static void main(String args[])
{
System.out.println("Hello, World!");
}
}
We can then use the Java compiler to produce the HelloWorld.class
file.
javac HelloWorld.java
Since the contents is binary data, we use xxd
to inspect its contents.
$ xxd HelloWorld.class
00000000: cafe babe 0000 0042 001d 0a00 0200 0307 .......B........
00000010: 0004 0c00 0500 0601 0010 6a61 7661 2f6c ..........java/l
00000020: 616e 672f 4f62 6a65 6374 0100 063c 696e ang/Object...<in
00000030: 6974 3e01 0003 2829 5609 0008 0009 0700 it>...()V.......
00000040: 0a0c 000b 000c 0100 106a 6176 612f 6c61 .........java/la
00000050: 6e67 2f53 7973 7465 6d01 0003 6f75 7401 ng/System...out.
00000060: 0015 4c6a 6176 612f 696f 2f50 7269 6e74 ..Ljava/io/Print
00000070: 5374 7265 616d 3b08 000e 0100 0d48 656c Stream;......Hel
00000080: 6c6f 2c20 576f 726c 6421 0a00 1000 1107 lo, World!......
00000090: 0012 0c00 1300 1401 0013 6a61 7661 2f69 ..........java/i
000000a0: 6f2f 5072 696e 7453 7472 6561 6d01 0007 o/PrintStream...
000000b0: 7072 696e 746c 6e01 0015 284c 6a61 7661 println...(Ljava
000000c0: 2f6c 616e 672f 5374 7269 6e67 3b29 5607 /lang/String;)V.
000000d0: 0016 0100 0a48 656c 6c6f 576f 726c 6401 .....HelloWorld.
000000e0: 0004 436f 6465 0100 0f4c 696e 654e 756d ..Code...LineNum
000000f0: 6265 7254 6162 6c65 0100 046d 6169 6e01 berTable...main.
00000100: 0016 285b 4c6a 6176 612f 6c61 6e67 2f53 ..([Ljava/lang/S
00000110: 7472 696e 673b 2956 0100 0a53 6f75 7263 tring;)V...Sourc
00000120: 6546 696c 6501 000f 4865 6c6c 6f57 6f72 eFile...HelloWor
00000130: 6c64 2e6a 6176 6100 2000 1500 0200 0000 ld.java. .......
00000140: 0000 0200 0000 0500 0600 0100 1700 0000 ................
00000150: 1d00 0100 0100 0000 052a b700 01b1 0000 .........*......
00000160: 0001 0018 0000 0006 0001 0000 0001 0009 ................
00000170: 0019 001a 0001 0017 0000 0025 0002 0001 ...........%....
00000180: 0000 0009 b200 0712 0db6 000f b100 0000 ................
00000190: 0100 1800 0000 0a00 0200 0000 0400 0800 ................
000001a0: 0500 0100 1b00 0000 0200 1c ...........
This is the binary format that gets loaded by the JVM’s class loader. We can think of this as Java’s analogue to MacOS’s Mach-O and Linux/Unix’s ELF file formats. The class File Format Oracle docs specify how this data is laid out:
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Funnily enough, cafe babe
is the hex magic number at the start that identifies this as a Java class file. Some other things are easily visible, like 0x0042
(decimal 66) is the major version of Java I used to compile this (Java SE 22). The constant_pool
section has a lot of entries, including our Hello, World!
string. The methods
section should contain the bytecode for JVM instructions.
But what are the instructions here? We can use the javap
tool to disassemble the class file and see the JVM bytecode operations.
$ javap -c HelloWorld.class
Compiled from "HelloWorld.java"
class HelloWorld {
HelloWorld();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #13 // String Hello, World!
5: invokevirtual #15 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
}
These are the instructions used in our program, along with their corresponding bytecodes:
JVM Instruction | Hex Bytecode |
---|---|
aload_0 |
2a |
invokespecial |
b7 |
return |
b1 |
getstatic |
b2 |
ldc |
12 |
invokevirtual |
b6 |
In the HelloWorld.class
hex dump above, we can see that the series of instructions for object initialization occurs at bytes 0x0000015a - 0x0000015e
. Similarly, for the main
method, the instruction bytecode is at bytes 0x00000185 - 0x0000018c
.
Looking at the sequence of instructions, it’s a looks different from ARM assembly. The ldc
instruction loads our “Hello, World!” string constant onto the stack, and the invokevirtual
instruction calls the println
method, which presumably does the work of making the appropriate syscall to write it out.
So the Java Virtual Machine loads this data and executes it, but how how does that loading happen? And what’s the ARM machine code that gets executed on the processor? Can I step through that code as it is interpreted?
Stepping through the JVM
OpenJDK’s Hotspot interpreter is written in C++ and lives here. I want to step through this code (and through the assembly) in a debugger as our program is loaded and executed. Let’s compile the the project with debug symbols so we can do that.
The OpenJDK source is on GitHub, so we can clone it with git:
git clone https://github.com/openjdk/jdk.git
In order to get the build to work, I had to install the full XCode (not just xcode-select
command line tools, although the build documentation claimed otherwise) and also have a recent enough JDK to use as a boot JDK (I used OpenJDK 22 via Homebrew for this purpose) . After those requirements were sorted, the standard configure and make steps succeeded without any issues.
bash configure --enable-debug
make
3 minutes 37 seconds later, the build is done. That was easier than expected!
To step through the code, we can use lldb
, which is the LLVM project’s debugger and is installed with the xcode-select
tools. We first load the binary and adjust some settings so we can see the disassembly and a suitable number of lines of source code:
$ lldb
(lldb) target create build/macosx-aarch64-server-fastdebug/jdk/bin/java
Current executable set to '/Users/cat/Code/jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java' (arm64).
(lldb) settings set stop-disassembly-display always
(lldb) settings set stop-disassembly-count 15
(lldb) settings set stop-line-count-before 7
(lldb) settings set stop-line-count-after 8
We then need to find some good breakpoints to stop at. This part was non-trivial for me, since I’m not experienced with reading and navigating a complicated C++ codebase. However, it looks like TemplateInterpreter::initialize_code
might be a good entrypoint.
To load and run the class, I copied my HelloWorld.class
into the directory and passed the name as a parameter to the lldb run
command. We can see the C++ code along with corresponding assembly, and can step through the code from here. Looking at the stack trace at the end, we can see some of the functions involved in initializing the JVM.
(lldb) b initialize_code
Breakpoint 5: where = libjvm.dylib`TemplateInterpreter::initialize_code() + 20 at templateInterpreter.cpp:60:3, address = 0x000000010327ed30
(lldb) r HelloWorld
Process 98847 launched: '/Users/cat/Code/jdk/build/macosx-aarch64-server-fastdebug/jdk/bin/java' (arm64)
Process 98847 stopped
* thread #3, stop reason = breakpoint 5.1
frame #0: 0x000000010327ed30 libjvm.dylib`TemplateInterpreter::initialize_code() at templateInterpreter.cpp:60:3 [opt]
53 int max_aligned_codelets = 280;
54 int max_aligned_bytes = checked_cast<int>(max_aligned_codelets * (HeapWordSize + CodeEntryAlignment));
55 _code = new StubQueue(new InterpreterCodeletInterface, code_size + max_aligned_bytes, nullptr,
56 "Interpreter");
57 }
58
59 void TemplateInterpreter::initialize_code() {
-> 60 AbstractInterpreter::initialize();
61
62 TemplateTable::initialize();
63
64 // generate interpreter
65 { ResourceMark rm;
66 TraceTime timer("Interpreter generation", TRACETIME_LOG(Info, startuptime));
67 TemplateInterpreterGenerator g;
68 // Free the unused memory not occupied by the interpreter and the stubs
libjvm.dylib`TemplateInterpreter::initialize_code:
-> 0x10327ed30 <+20>: bl 0x102142ac4 ; AbstractInterpreter::initialize at abstractInterpreter.cpp:56
0x10327ed34 <+24>: bl 0x10328cd44 ; TemplateTable::initialize at templateTable.cpp:219
0x10327ed38 <+28>: adrp x0, 1445
0x10327ed3c <+32>: add x0, x0, #0x620 ; Thread::_thr_current
0x10327ed40 <+36>: ldr x8, [x0]
0x10327ed44 <+40>: blr x8
0x10327ed48 <+44>: ldr x2, [x0]
0x10327ed4c <+48>: adrp x20, 1516
0x10327ed50 <+52>: add x20, x20, #0xec0 ; DebuggingContext::_enabled
0x10327ed54 <+56>: ldr w8, [x20]
0x10327ed58 <+60>: cmp w8, #0x0
0x10327ed5c <+64>: ccmp x2, #0x0, #0x0, le
0x10327ed60 <+68>: b.eq 0x10327eeb8 ; <+412> at templateInterpreter.cpp
0x10327ed64 <+72>: mov x19, x0
0x10327ed68 <+76>: ldr x1, [x2, #0x340]
Target 0: (java) stopped.
(lldb) bt
* thread #3, stop reason = breakpoint 5.1
* frame #0: 0x000000010327ed30 libjvm.dylib`TemplateInterpreter::initialize_code() at templateInterpreter.cpp:60:3 [opt]
frame #1: 0x0000000102980940 libjvm.dylib`interpreter_init_code() at interpreter.cpp:142:3 [opt]
frame #2: 0x0000000102951154 libjvm.dylib`init_globals2() at init.cpp:164:3 [opt]
frame #3: 0x00000001032bdec4 libjvm.dylib`Threads::create_vm(args=<unavailable>, canTryAgain=0x000000017008ee97) at threads.cpp:569:12 [opt]
frame #4: 0x0000000102ab88d0 libjvm.dylib`::JNI_CreateJavaVM(JavaVM **, void **, void *) [inlined] JNI_CreateJavaVM_inner(vm=0x000000017008ef40, penv=0x000000017008ef38, args=0x000000017008ef48) at jni.cpp:3582:12 [opt]
frame #5: 0x0000000102ab8880 libjvm.dylib`JNI_CreateJavaVM(vm=0x000000017008ef40, penv=0x000000017008ef38, args=0x000000017008ef48) at jni.cpp:3673:14 [opt]
frame #6: 0x0000000100492508 libjli.dylib`JavaMain [inlined] InitializeJVM(pvm=0x000000017008ef40, penv=0x000000017008ef38, ifn=<unavailable>) at java.c:1592:9 [opt]
frame #7: 0x0000000100492454 libjli.dylib`JavaMain(_args=<unavailable>) at java.c:505:10 [opt]
frame #8: 0x00000001004954b0 libjli.dylib`ThreadJavaMain(args=<unavailable>) at java_md_macosx.m:720:29 [opt]
frame #9: 0x000000018ec49f94 libsystem_pthread.dylib`_pthread_start + 136
I had hoped to find something like an interpreter loop in the code with a big switch statement that dispatches to the handlers for each bytecode instruction. That would make things easy to step through, but unsurprisingly things aren’t quite that simple. According to the documentation of the runtime overview, the code for the interpreter is generated via a templatized design and loaded into memory at startup for optimization. We could step through this at the assembly level, but it’s quite difficult to figure out what’s going on that way.
However, that documentation did point out that there’s a flag you can pass to the JVM that will print out the generated template table! This very nearly corresponds to the machine code that will be executed when we run a given JVM instruction.
# For my jdk that I built
build/macosx-aarch64-server-fastdebug/jdk/bin/java -XX:+PrintInterpreter
# If you aren't running a debug build, you'll probably need this unlock flag
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInterpreter
This prints out a big table for each instruction, but here are three of the instructions relevant to our HelloWorld class:
----------------------------------------------------------------------
ldc 18 ldc [0x000000010bc2f280, 0x000000010bc2f4c0] 576 bytes
[MachCode]
0x000000010bc2f280: 808e 1ff8 | 0900 0014 | 808e 1fbc | 0700 0014 | 800e 1ffc | 0500 0014 | 9f8e 1ff8 | 808e 1ff8
0x000000010bc2f2a0: 0200 0014 | 808e 1ff8 | c106 4039 | a283 5ef8 | 4208 40f9 | 4204 40f9 | 4008 40f9 | 2310 0091
0x000000010bc2f2c0: 0360 238b | 63fc df08 | 7f90 01f1 | a000 0054 | 7f9c 01f1 | 6000 0054 | 7f1c 00f1 | 6104 0054
0x000000010bc2f2e0: 0100 80d2 | b683 1bf8 | a803 5ff8 | 8800 00b4 | c1d5 bbd4 | 629e e603 | 0100 0000 | e003 1caa
0x000000010bc2f300: 2801 0010 | 88fb 01f9 | 94f7 01f9 | 9dff 01f9 | e833 bfa9 | 08ee 84d2 | 8864 a0f2 | 2800 c0f2
0x000000010bc2f320: 0001 3fd6 | e833 c1a8 | feff ff10 | 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9 | a800 00b4
0x000000010bc2f340: 0810 88d2 | 4878 a1f2 | 2800 c0f2 | 0001 1fd6 | 8023 42f9 | 9f23 02f9 | b6e3 7ba9 | b86f 388b
0x000000010bc2f360: 808e 1ff8 | 5300 0014 | 7f10 00f1 | a100 0054 | 410c 01ab | 2050 40bd | 808e 1fbc | 4d00 0014
0x000000010bc2f380: 7f0c 00f1 | a100 0054 | 410c 01ab | 2050 40b9 | 808e 1ff8 | 4700 0014 | 4102 80d2 | b683 1bf8
0x000000010bc2f3a0: a803 5ff8 | 8800 00b4 | c1d5 bbd4 | 629e e603 | 0100 0000 | e003 1caa | 2801 0010 | 88fb 01f9
0x000000010bc2f3c0: 94f7 01f9 | 9dff 01f9 | e833 bfa9 | 88c7 85d2 | 8864 a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8
0x000000010bc2f3e0: feff ff10 | 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9 | a800 00b4 | 0810 88d2 | 4878 a1f2
0x000000010bc2f400: 2800 c0f2 | 0001 1fd6 | 8023 42f9 | 9f23 02f9 | b6e3 7ba9 | b86f 388b | 8227 42f9 | 9f27 02f9
0x000000010bc2f420: e303 02aa | 633c 0012 | 427c 1c53 | 5f10 0071 | 8100 0054 | 0068 63b8 | 808e 1ff8 | 1d00 0014
0x000000010bc2f440: 5f18 0071 | 8100 0054 | 0068 63bc | 808e 1fbc | 1800 0014 | 5f0c 0071 | 8100 0054 | 0068 a378
0x000000010bc2f460: 808e 1ff8 | 1300 0014 | 5f00 0071 | 8100 0054 | 0068 a338 | 808e 1ff8 | 0e00 0014 | 5f08 0071
0x000000010bc2f480: 8100 0054 | 0068 6378 | 808e 1ff8 | 0900 0014 | 5f04 0071 | 8100 0054 | 0068 a338 | 808e 1ff8
0x000000010bc2f4a0: 0400 0014 | c1d5 bbd4 | 7ae4 f303 | 0100 0000 | c82e 4038 | 0901 2411 | a95a 69f8 | 2001 1fd6
[/MachCode]
----------------------------------------------------------------------
invokevirtual 182 invokevirtual [0x000000010d795200, 0x000000010d7956f0] 1264 bytes
[MachCode]
0x000000010d795200: 808e 1ff8 | 0900 0014 | 808e 1fbc | 0700 0014 | 800e 1ffc | 0500 0014 | 9f8e 1ff8 | 808e 1ff8
0x000000010d795220: 0200 0014 | 808e 1ff8 | c312 4078 | e203 7bb2 | 637c 029b | 421f 40f9 | 4220 0091 | 4260 238b
0x000000010d795240: 535c 0091 | 73fe df08 | 7fda 02f1 | c004 0054 | d316 80d2 | e103 13aa | b683 1bf8 | a803 5ff8
0x000000010d795260: 8800 00b4 | c1d5 bbd4 | 625e 9c05 | 0100 0000 | e003 1caa | 2801 0010 | 88fb 01f9 | 94f7 01f9
0x000000010d795280: 9dff 01f9 | e833 bfa9 | 08ae 92d2 | 489b a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8 | feff ff10
0x000000010d7952a0: 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9 | a800 00b4 | 0810 80d2 | 08af a1f2 | 2800 c0f2
0x000000010d7952c0: 0001 1fd6 | b6e3 7ba9 | b86f 388b | c312 4078 | e203 7bb2 | 637c 029b | 421f 40f9 | 4220 0091
0x000000010d7952e0: 4260 238b | 4354 4039 | 6300 0036 | 4c00 40f9 | 0200 0014 | 4c10 4079 | b683 1bf8 | 4950 4039
0x000000010d795300: 4224 4079 | 886e 228b | 0281 5ff8 | 0877 91d2 | 48b9 a0f2 | 2800 c0f2 | 1e79 69f8 | e30c 0036
0x000000010d795320: 5f00 40f9 | a003 5ef8 | e000 00b4 | 0804 40f9 | 0805 00b1 | 4200 0054 | 0804 00f9 | 00c0 0091
0x000000010d795340: a003 1ef8 | a003 5ef8 | c00a 00b4 | 0800 5d38 | 1f2d 00f1 | 610a 0054 | 0400 40f9 | 8400 00d1
0x000000010d795360: 9f08 00f1 | 0820 0091 | 6b09 0054 | 8409 40f9 | 845c 4079 | 0804 40f9 | 8400 08cb | 8404 00d1
0x000000010d795380: 886e 248b | 0401 40f9 | a400 00b5 | 0808 40f9 | 0801 40b2 | 0808 00f9 | 1600 0014 | 8408 40b9
0x000000010d7953a0: 841c 5bd2 | 0808 40f9 | 8400 08ca | 9ff4 7ef2 | 0002 0054 | e401 0837 | 8801 00b4 | 1f05 00f1
0x000000010d7953c0: 4001 0054 | 8400 08ca | 0808 40f9 | 8400 08ca | 9ff4 7ef2 | e000 0054 | 0808 40f9 | 0801 7fb2
0x000000010d7953e0: 0808 00f9 | 0300 0014 | 0408 00f9 | 84f8 7f92 | 0400 40f9 | 8408 00d1 | 9f08 00f1 | 0860 0091
0x000000010d795400: ab04 0054 | 8409 40f9 | 845c 4079 | 080c 40f9 | 8400 08cb | 8404 00d1 | 886e 248b | 0401 40f9
0x000000010d795420: a400 00b5 | 0810 40f9 | 0801 40b2 | 0810 00f9 | 1600 0014 | 8408 40b9 | 841c 5bd2 | 0810 40f9
0x000000010d795440: 8400 08ca | 9ff4 7ef2 | 0002 0054 | e401 0837 | 8801 00b4 | 1f05 00f1 | 4001 0054 | 8400 08ca
0x000000010d795460: 0810 40f9 | 8400 08ca | 9ff4 7ef2 | e000 0054 | 0810 40f9 | 0801 7fb2 | 0810 00f9 | 0300 0014
0x000000010d795480: 0410 00f9 | 84f8 7f92 | 0400 40f9 | 8410 00d1 | 08a0 0091 | e003 08aa | 000c 048b | a003 1ef8
0x000000010d7954a0: f303 0091 | 8802 1dcb | 08fd 4393 | a803 1ff8 | 8835 40f9 | 0001 1fd6 | 4008 40b9 | 001c 5bd2
0x000000010d7954c0: b803 5ef8 | 3805 00b4 | 030b 40f9 | 1f00 03eb | c100 0054 | 080f 40f9 | 0805 00b1 | 4200 0054
0x000000010d7954e0: 080f 00f9 | 1f00 0014 | 6302 00b4 | 0313 40f9 | 1f00 03eb | c100 0054 | 0817 40f9 | 0805 00b1
0x000000010d795500: 4200 0054 | 0817 00f9 | 1600 0014 | c300 00b4 | 0807 40f9 | 0805 00b1 | 4200 0054 | 0807 00f9
0x000000010d795520: 1000 0014 | 0013 00f9 | e303 40b2 | 0317 00f9 | 0c00 0014 | 0813 40f9 | 1f00 08eb | c100 0054
0x000000010d795540: 0817 40f9 | 0805 00b1 | 4200 0054 | 0817 00f9 | 0400 0014 | 000b 00f9 | e303 40b2 | 030f 00f9
0x000000010d795560: 18c3 0091 | b803 1ef8 | 0c6c 2c8b | 8cf1 40f9 | a303 5ef8 | c30a 00b4 | 6800 5d38 | 1f2d 00f1
0x000000010d795580: 610a 0054 | 6400 40f9 | 8400 00d1 | 9f08 00f1 | 6820 0091 | 6b09 0054 | 8409 40f9 | 845c 4079
0x000000010d7955a0: 6804 40f9 | 8400 08cb | 8404 00d1 | 886e 248b | 0401 40f9 | a400 00b5 | 6808 40f9 | 0801 40b2
0x000000010d7955c0: 6808 00f9 | 1600 0014 | 8408 40b9 | 841c 5bd2 | 6808 40f9 | 8400 08ca | 9ff4 7ef2 | 0002 0054
0x000000010d7955e0: e401 0837 | 8801 00b4 | 1f05 00f1 | 4001 0054 | 8400 08ca | 6808 40f9 | 8400 08ca | 9ff4 7ef2
0x000000010d795600: e000 0054 | 6808 40f9 | 0801 7fb2 | 6808 00f9 | 0300 0014 | 6408 00f9 | 84f8 7f92 | 6400 40f9
0x000000010d795620: 8408 00d1 | 9f08 00f1 | 6860 0091 | ab04 0054 | 8409 40f9 | 845c 4079 | 680c 40f9 | 8400 08cb
0x000000010d795640: 8404 00d1 | 886e 248b | 0401 40f9 | a400 00b5 | 6810 40f9 | 0801 40b2 | 6810 00f9 | 1600 0014
0x000000010d795660: 8408 40b9 | 841c 5bd2 | 6810 40f9 | 8400 08ca | 9ff4 7ef2 | 0002 0054 | e401 0837 | 8801 00b4
0x000000010d795680: 1f05 00f1 | 4001 0054 | 8400 08ca | 6810 40f9 | 8400 08ca | 9ff4 7ef2 | e000 0054 | 6810 40f9
0x000000010d7956a0: 0801 7fb2 | 6810 00f9 | 0300 0014 | 6410 00f9 | 84f8 7f92 | 6400 40f9 | 8410 00d1 | 68a0 0091
0x000000010d7956c0: e303 08aa | 630c 048b | a303 1ef8 | f303 0091 | 8802 1dcb | 08fd 4393 | a803 1ff8 | 8835 40f9
0x000000010d7956e0: 0001 1fd6 | c1d5 bbd4 | e29b 9205 | 0100 0000
[/MachCode]
----------------------------------------------------------------------
return 177 return [0x000000010bc37e00, 0x000000010bc38200] 1024 bytes
[MachCode]
0x000000010bc37e00: 808e 1ff8 | 0900 0014 | 808e 1fbc | 0700 0014 | 800e 1ffc | 0500 0014 | 9f8e 1ff8 | 808e 1ff8
0x000000010bc37e20: 0200 0014 | 808e 1ff8 | bf3a 03d5 | 884b 42f9 | e804 0036 | 88cf 42f9 | ff63 28eb | 6900 0054
0x000000010bc37e40: e803 0091 | 88cf 02f9 | b683 1bf8 | a803 5ff8 | 8800 00b4 | c1d5 bbd4 | 629e e603 | 0100 0000
0x000000010bc37e60: e003 1caa | 2801 0010 | 88fb 01f9 | 94f7 01f9 | 9dff 01f9 | e833 bfa9 | 88ae 9dd2 | 8864 a0f2
0x000000010bc37e80: 2800 c0f2 | 0001 3fd6 | e833 c1a8 | feff ff10 | 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9
0x000000010bc37ea0: a800 00b4 | 0810 88d2 | 4878 a1f2 | 2800 c0f2 | 0001 1fd6 | b6e3 7ba9 | b86f 388b | 88cf 42f9
0x000000010bc37ec0: ff63 28eb | 4300 0054 | 9fcf 02f9 | 884b 42f9 | bf03 08eb | 4800 0054 | ee01 0054 | 0800 0010
0x000000010bc37ee0: 88fb 01f9 | 94f7 01f9 | 9dff 01f9 | e003 1caa | e833 bfa9 | 8821 9ed2 | 8864 a0f2 | 2800 c0f2
0x000000010bc37f00: 0001 3fd6 | e833 c1a8 | 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8327 5339 | 9f27 1339 | a183 5ef8
0x000000010bc37f20: 2218 40f9 | e20a 2836 | e30f 00b5 | a183 01d1 | 2004 40f9 | 2004 00b5 | b683 1bf8 | a803 5ff8
0x000000010bc37f40: 8800 00b4 | c1d5 bbd4 | 629e e603 | 0100 0000 | e003 1caa | 2801 0010 | 88fb 01f9 | 94f7 01f9
0x000000010bc37f60: 9dff 01f9 | e833 bfa9 | 0891 95d2 | 8864 a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8 | feff ff10
0x000000010bc37f80: 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9 | a800 00b4 | 0810 88d2 | 4878 a1f2 | 2800 c0f2
0x000000010bc37fa0: 0001 1fd6 | b6e3 7ba9 | b86f 388b | c1d5 bbd4 | e2db dc03 | 0100 0000 | b683 1bf8 | 2304 40f9
0x000000010bc37fc0: 3f04 00f9 | 8203 47b9 | 5f40 1c71 | 8a00 0054 | c1d5 bbd4 | 5b70 ed03 | 0100 0000 | 8203 47b9
0x000000010bc37fe0: 4220 0051 | 846b 62f8 | 7f00 04eb | 0103 0054 | 9f6b 22f8 | 8203 07b9 | 4420 0051 | 846b 64f8
0x000000010bc38000: 7f00 04eb | 2002 0054 | 6000 40f9 | 6001 0837 | 8000 0036 | c1d5 bbd4 | 7070 ed03 | 0100 0000
0x000000010bc38020: 0400 40b2 | e803 00aa | 64fc a8c8 | 1f01 00eb | e8d3 00b2 | a000 0054 | 836b 22f8 | 4220 0011
0x000000010bc38040: 8203 07b9 | 0200 0014 | 0a00 0014 | 2304 00f9 | e003 01aa | e833 bfa9 | 881f 95d2 | 8864 a0f2
0x000000010bc38060: 2800 c0f2 | 0001 3fd6 | e833 c1a8 | 0400 0014 | 88d7 42f9 | 0805 00d1 | 88d7 02f9 | b683 5bf8
0x000000010bc38080: a103 5bf8 | a16f 218b | b343 01d1 | 2400 0014 | b683 1bf8 | a803 5ff8 | 8800 00b4 | c1d5 bbd4
0x000000010bc380a0: 629e e603 | 0100 0000 | e003 1caa | 2801 0010 | 88fb 01f9 | 94f7 01f9 | 9dff 01f9 | e833 bfa9
0x000000010bc380c0: 0891 95d2 | 8864 a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8 | feff ff10 | 9ff7 01f9 | 9fff 01f9
0x000000010bc380e0: 9ffb 01f9 | 8807 40f9 | a800 00b4 | 0810 88d2 | 4878 a1f2 | 2800 c0f2 | 0001 1fd6 | b6e3 7ba9
0x000000010bc38100: b86f 388b | c1d5 bbd4 | e2db dc03 | 0100 0000 | 2804 40f9 | e8fb ffb5 | 2140 0091 | 3f00 13eb
0x000000010bc38120: 81ff ff54 | a983 5ff8 | 8823 45b9 | 1f0d 0071 | 6005 0054 | 889b 42f9 | 3f01 08eb | 0905 0054
0x000000010bc38140: e003 1caa | e833 bfa9 | 0865 82d2 | e872 a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8 | b683 1bf8
0x000000010bc38160: a803 5ff8 | 8800 00b4 | c1d5 bbd4 | 629e e603 | 0100 0000 | e003 1caa | 2801 0010 | 88fb 01f9
0x000000010bc38180: 94f7 01f9 | 9dff 01f9 | e833 bfa9 | 88b9 8bd2 | 8864 a0f2 | 2800 c0f2 | 0001 3fd6 | e833 c1a8
0x000000010bc381a0: feff ff10 | 9ff7 01f9 | 9fff 01f9 | 9ffb 01f9 | 8807 40f9 | a800 00b4 | 0810 88d2 | 4878 a1f2
0x000000010bc381c0: 2800 c0f2 | 0001 1fd6 | b6e3 7ba9 | b86f 388b | c1d5 bbd4 | e2db dc03 | 0100 0000 | f403 09aa
0x000000010bc381e0: bf03 0091 | fd7b c1a8 | 9fee 7c92 | c003 5fd6 | c1d5 bbd4 | e2db dc03 | 0100 0000 | 1f20 03d5
[/MachCode]
----------------------------------------------------------------------
It’s nice that we can see this, but it’s not super enlightening on its own. We can try disassembling these bytes to turn it into ARM assembly, but even these individual JVM instructions are kind of a lot, which is not at all surprising. The Hotspot interpreter is a very complicated piece of machinery that does many things. But at least I’ve now got a start in poking around at its internals. It’ll be fun to keep exploring how it works and learn some more about its low level features.
It would also be fun to do a similar exercise for a more complicated Lua program that exercises features beyond simply printing “Hello, World!”. The official Lua and even LuaJIT codebases are a heck of a lot smaller than the Hotspot interpreter’s, and the Lua language is much simpler than Java.
eBPF virtual machine
The first two Hello World programs are executed in user space, and interact with the operating system kernel to write output via a syscall API. But what if we want our hello world to be triggered from within kernel space?
Fortunately, the eBPF system gives us a reasonably straightforward way to do this. No need to modify and recompile the operating system kernel.
The Linux kernel (and now also Windows, it seems) has a special virtual machine inside it for running programs that can be used to extend the behavior of the kernel at runtime. This is a current hot topic in Linux development, and this functionality ends up being very useful for observability, networking, and security purposes.
I’m executing the following on my x86_64 miniPC running Void Linux. As far as necessary setup, I installed bcc
, libbpf
, and bpftool
from the Void package repository with xbps
. I also needed to mount tracefs.
sudo mount -t tracefs
I wanted to directly compile my eBPF program using clang
, load it into the kernel, and attach it to kprobe
tracing events using the bpftool
utility. The first two things I was able to do, but unfortunately bpftool
doesn’t appear to support the third yet, although you can attach to various other events.
So for the time being, we’ll go through BCC to attach the program, which is a library that makes writing and running eBPF programs a lot more convenient.
Here’s the code to run:
#!/usr/bin/python3
from bcc import BPF
program = r"""
int hello(void *ctx) {
bpf_trace_printk("Hello World!");
return 0;
}
"""
b = BPF(text=program)
syscall = b.get_syscall_fnname("execve")
b.attach_kprobe(event=syscall, fn_name="hello")
b.trace_print()
We save this to a file hello-ebpf.py
, make it executable, and run it. We’ll see a “Hello World!” message for each execve
syscall that is made on the system.
$ sudo ./hello-ebpf.py
b' <...>-5284 [000] ...21 2871675.440787: bpf_trace_printk: Hello World!'
b' <...>-5284 [000] ...21 2871675.441308: bpf_trace_printk: Hello World!'
b' <...>-5285 [008] ...21 2871675.924771: bpf_trace_printk: Hello World!'
b' sh-5286 [010] ...21 2871675.925285: bpf_trace_printk: Hello World!'
b' <...>-5289 [000] ...21 2871676.443473: bpf_trace_printk: Hello World!'
b' <...>-5289 [000] ...21 2871676.443929: bpf_trace_printk: Hello World!'
b' <...>-5290 [008] ...21 2871676.925181: bpf_trace_printk: Hello World!'
b' sh-5291 [000] ...21 2871676.925829: bpf_trace_printk: Hello World!'
b' <...>-5294 [001] ...21 2871677.446174: bpf_trace_printk: Hello World!'
b' runsv-5294 [003] ...21 2871677.446950: bpf_trace_printk: Hello World!'
b' <...>-5295 [000] ...21 2871677.925436: bpf_trace_printk: Hello World!'
b' sh-5296 [003] ...21 2871677.925907: bpf_trace_printk: Hello World!'
b' <...>-5299 [001] ...21 2871678.449200: bpf_trace_printk: Hello World!'
b' <...>-5299 [001] ...21 2871678.449756: bpf_trace_printk: Hello World!'
b' <...>-5300 [003] ...21 2871678.925725: bpf_trace_printk: Hello World!'
b' sh-5301 [008] ...21 2871678.926562: bpf_trace_printk: Hello World!'
b' runsv-5304 [001] ...21 2871679.451916: bpf_trace_printk: Hello World!'
b' runsv-5304 [000] ...21 2871679.452639: bpf_trace_printk: Hello World!'
This is cool! Each time a new program is executed on my system, it logs a line.
The BCC library did all the heavy lifting of compiling our 4 lines of BPF C in the python string, loading it into the kernel, attaching it to the appropriate event, and reading the messages that are generated. It will even take care of detaching and unloading the program when we quit the python program.
Can we look at the actual program that gets executed in the kernel? While the python program is still running, we can use the bpftool
utility to look into this.
$ sudo bpftool prog list
70: kprobe name hello tag f1db4e564ad5219a gpl
loaded_at 2024-09-02T11:58:27-0700 uid 0
xlated 104B jited 79B memlock 4096B
btf_id 252
So we see there’s a program named hello
attached to the kprobe
event, along with some information about the size of the program and when it was loaded. We can pin this so that it doesn’t get unloaded when we stop the python program.
$ sudo bpftool prog pin id 70 /sys/fs/bpf/hello
We can then also look at the eBPF bytecode with bpftool
:
$ sudo bpftool prog dump xlated id 70
int hello(void * ctx):
; int hello(void *ctx) {
0: (b7) r1 = 560229490
; ({ char _fmt[] = "Hello World!"; bpf_trace_printk_(_fmt, sizeof(_fmt)); });
1: (63) *(u32 *)(r10 -8) = r1
2: (18) r1 = 0x6f57206f6c6c6548
4: (7b) *(u64 *)(r10 -16) = r1
5: (b7) r1 = 0
6: (73) *(u8 *)(r10 -4) = r1
7: (bf) r1 = r10
;
8: (07) r1 += -16
; ({ char _fmt[] = "Hello World!"; bpf_trace_printk_(_fmt, sizeof(_fmt)); });
9: (b7) r2 = 13
10: (85) call bpf_trace_printk#-91472
; return 0;
11: (b7) r0 = 0
12: (95) exit
Very cool! But what do these mean? The unofficial eBPF spec explains the specific opcodes used and the official documentation tells us more about the usage of the registers and calling convention.
Going through, step-by-step:
- The first instruction (with opcode
b7
) is a move instruction and moves the immediate value560229490
(not sure why this is in decimal representation; the hex representation is0x21646c72
) to the destination registerr1
. This represents the characters “rld!”. - The second instruction (with opcode
63
) stores the value on the stack (r10
contains the address of the stack). - The next set of instructions are similar and store
0x6f57206f6c6c6548
(this time it’s in hex and not decimal for some reason), which represents “Hello Wo”, tor1
and then moves that value onto the stack. - Instructions 5 and 6 store the null terminator character onto the stack to end the string.
- Instructions 7 and 8 are used to load register
r1
with the address of the string on the stack (r10-16
). - The length of the string, 13, is moved into register
r2
, since that register will hold the second parameter to thebpf_trace_printk
call we’ll make. - We then call
bpf_trace_printk
. - When that call returns, the return value of 0 is loaded into register
r0
and we exit.
Pretty straightforward!
It turns out that this eBPF bytecode actually gets just in time (JIT) compiled when loaded into the kernel rather than interpreted like JVM bytecode (although the Hotspot compiler does JIT compile JVM bytecode sometimes too). We can also look at the disassembly of the JIT code. Unsurprisingly, it ends up being a nearly direct translation of the eBPF bytecode into x86-64 assembly.
$ sudo bpftool prog dump jited id 70
int hello(void * ctx):
bpf_prog_f1db4e564ad5219a_hello:
; int hello(void *ctx) {
0: endbr64
4: nopl 0x0(%rax,%rax,1)
9: xchg %ax,%ax
b: push %rbp
c: mov %rsp,%rbp
f: endbr64
13: sub $0x10,%rsp
1a: mov $0x21646c72,%edi
; ({ char _fmt[] = "Hello World!"; bpf_trace_printk_(_fmt, sizeof(_fmt)); });
1f: mov %edi,-0x8(%rbp)
22: movabs $0x6f57206f6c6c6548,%rdi
2c: mov %rdi,-0x10(%rbp)
30: xor %edi,%edi
32: mov %dil,-0x4(%rbp)
36: mov %rbp,%rdi
;
39: add $0xfffffffffffffff0,%rdi
; ({ char _fmt[] = "Hello World!"; bpf_trace_printk_(_fmt, sizeof(_fmt)); });
3d: mov $0xd,%esi
42: call 0xfffffffff5fbc6fc
; return 0;
47: xor %eax,%eax
49: leave
4a: jmp 0xfffffffff6c7f327
When we’re done inspecting this program, we can delete our pinned version.
$ sudo rm /sys/fs/bpf/hello
Of course, this is only a trivial example of what’s possible using eBPF. For much more detailed exposition on what’s possible with eBPF, Liz Rice’s Learning eBPF and Brendan Gregg’s BPF Performance Tools are excellent resources. I also enjoyed the eBPF: Unlocking the Kernel documentary, which does a good job of conveying the significance of the project while also showing the human side of the story from the people involved in its development.
As a next step for exploration, we can try to look into how the eBPF system actually works holistically, and not just how eBPF code runs. There are a lot of interesting things that happen in the system. For example, when the bytecode gets loaded, there’s a verification process that occurs to ensure that the code won’t cause the kernel to crash! How the heck does that work? Lots of rabbit holes to explore!