You may have noticed, I’m currently running experiments where I have a lot of small things. So many, that I computed I had an overhead of 16 bytes apiece (12 bytes per state and 4 bytes for a pointer to it), and was too lazy to improve my handling to reduce that to 8 bytes at the cost of having to implement a hash-table myself.
Today I mused about this and looked at my memory profiler. Much to my surprise (because I’m an idiot and don’t think), I spent much more memory than anticipated. I did some thinking and after a couple minutes figured out, that storing a pointer into a 64 bit memory heap using 4 bytes may be information theory-challenging.
I therefore looked up the sizes of objects in a 64 bit Java VM, and found after some searching this guide, which tells me that everything gets twice as expensive when using 64 bit addresses; that is, a pointer is not (obviously) 64 bits or 8 bytes, causing the overhead of an object to be 16 bytes and the overhead of an array 24 bytes (the link says 16 bytes for the object and 4 bytes for the length, but it is wrong, at least according to YourKit and the newest Hotspot, and the overhead is consistently 8 bytes). In addition objects need to be a multiple of 8 bytes (I’d thought this was 4 bytes in 32-bit architectures and more or less neglected this fact). This makes my overhead instead 24-31 bytes for each object (instead of 12 before).
Additionally, I’d mentally given the hash-table a load-factor of 2, causing it to use 1-2 pointers for each state, but naturally, that is not how load-factors and math works. Instead, we use 2-4 pointers, or 16-32 bytes rather than the 4-8 bytes I mentally used. I did not care too much about this, as I assumed that my improved solution would use the same amount of memory to store integers instead of pointers.
Thus my total overhead is not 20-24 bytes, but rather a discouraging 40-63 bytes. Considering the fact that my data is 26-30 bytes, this is a lot. A lot a lot, and enough to actually look at. I have a chart illustrating how much this is compared to my real data:
This shows the memory consumption per state and overhead as a function of the number of states. The actual consumption grows a little as the number of states increase because of the representation exploiting the actual number of states to represent integers more efficiently when we only have a few of them.
The blue line at the bottom is the actual memory used to represent my states. The red one above it is the amount of memory needed inclusive overhead for storing each state in an array (the 24-31 bytes above). The purple line just above the blue line shows the difference between the two, i.e., the actual overhead per state, which clocks in at 26-27 bytes, or just above the amount of memory used for the actual state!
The green line adds to this 24 bytes (the average between 16 and 32 bytes) for storing the states in a hash-table, and the cyan line crossing the red shows the difference between the blue and green line. Thus, the total overhead is almost twice that of my actual data.
I have an idea for an improved data-structure, where instead of storing data in separate arrays, I instead store them in a single array. Due to the nature of my representation, I do not actually need a length-counter, so this would remove the 24 bytes overhead of a Java array entirely. We could use a 64 bit pointer (really long indexing into the shared array) to address each state, effectively moving the green line down and replacing the red one in the chart above. We can actually be a bit more clever. Instead of using 64 bits for indexing, we can just use 32 bits. This superficially only allows us to address 2 or 4 giga-bytes, but if we align everything at 4 or 8 byte boundaries, we get an additional 2 or 3 bits for addressing, and can use 8-32 giga-bytes depending on which we choose and the actual addressing scheme. This halves the overhead of the hash-table as well, bring us to a situation like this:
Still, the blue one is the total used for the actual data and the green line is the old total amount of memory. The total amount of memory with the new storage scheme is indicated with orange, and the overhead with this scheme is still cyan (i.e., it is the total overhead, the line at around 50 bytes in the first chart). We see than the overhead is now significantly below the actual data, and the gain is an impressive 36 bytes per state (in my calculations before looking into the real data sizes, I neglected the alignment for the old solution but not in the new one and regarded pointers and integers as the same, leading to a saving of just 8 bytes from 16 to 8).
At integers, I spend 12 bytes for the hash-table and 4 for alignment (for a total of 16) and for longs, I spend 24 for the hash table. Increasing the alignment to every 16 bytes and using integers would use around 12 + 16/2 = 20 bytes in the average case. As long as my states are 25-32 bytes, this makes no difference, though, and I can in fact increase the alignment to every 32 bytes with no cost at all, allowing me to address up to 128 giga-bytes using this scheme – without increasing the overhead noticeably as long as my states stay below 32 bytes, which I expect them to do.
Alternatively, I can just switch to using longs for addressing, removing any padding and putting the overhead at an easily controllable 24 bytes per state. Still, quite a lot, but at still around 26-28 bytes better than the current implementation. Now, I just have to see if I really want to implement my own hash-table…
Stupid 64 bit machines, forcing me to think!
Edit: using the -XX:+UseCompressedOops option suggested by spand in the comments, I can cut off around 24 bytes per state, decreasing the gain by moving to another data-structure to 12 bytes, which is too little (at least for now) to be worth the effort.
Time person of the year 2006, Nobel Peace Prize winner 2012.