The GRAPHEMES_UTF8 row-cells getter inferred its required byte
accumulator from utf8CodepointSequenceLength, which stores the
value in u3. Multi-scalar clusters longer than seven UTF-8 bytes
could overflow that accumulator before the capacity check, causing
wrong probe sizes and allowing optimized builds to write past a
caller-provided buffer.
Use usize for the required byte count so probing and capacity
checks match the later encode loop. Extend the render C API test
to cover the short combining cluster, an eight-byte flag cluster,
a longer family emoji, exact-size success, and the
cap == needed - 1 no-write boundary.