72 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
8a1780a800 Fix ARM64 Linux GCC build error caused by simde/arm/neon.h 2026-06-24 01:29:19 +00:00
Wukuyon
9d336be0d3 Combine ascii_mask/counts check with errors vector in simd-string-impl.h
Initialize errors by xoring ascii_mask with the movemask result and
setting the scalar mismatch into the chunk_is_invalid vector.

Removes another conditional branch.
Improves unicode benchmark performance by about 3%.
2025-10-23 22:37:33 -06:00
Wukuyon
1c869d3629 Refine test for overlapping UTF-8 sequences in simd-string-impl.h
Replace a shift_right_by_one_byte call with comparison operations.
Improves unicode benchmark performance by about 1%.
2025-10-23 22:37:33 -06:00
Wukuyon
65890de60d Fix UTF-8 overlong and special range checks in simd-string-impl.h
Modified `start_classification` in `utf8_decode_to_esc` in `simd-string-impl.h`, which now:

Rejects `0xC0`, `0xC1` and `0xF5..0xFF` lead bytes in UTF-8 subsequences.

Enforces special ranges for the second subsequence bytes after `0xE0`, `0xED`, `0xF0` and `0xF4` bytes to prevent overlong sequences, surrogates, and code points above U+10FFFF.

Accumulates UTF-8 validation errors in a single vector to avoid many conditional branches.

Worsens unicode benchmark performance by about 4%.
2025-10-23 22:37:33 -06:00
Kovid Goyal
d36a64087e Bump Go to 1.23
We need this because Go < 1.23 produces binaries that dont work on
modern OpenBSD because OpenBSD decided to remove syscall() from their
libc. Mad buggers, who removes functions from libc breaking all
binaries!!

Also increase minimum macOS version to 11.0 as Go 1.23 requires that
2024-08-24 08:06:02 +05:30
Kovid Goyal
eb07307370 Ignore pedantic warnings from simde headers 2024-04-30 09:54:14 +05:30
Kovid Goyal
393169f79d Fix #7225 2024-03-14 20:55:05 +05:30
Kovid Goyal
daeaf65d7e fix compiler warning 2024-02-25 11:17:26 +05:30
Kovid Goyal
f4f06222d4 ... 2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8 Use a fast SIMD implementation to XOR data going into the disk cache 2024-02-25 09:57:43 +05:30
Kovid Goyal
1db7ac5f6b Use our new shift by n functions to improve function to zero last N bytes
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal
e77a970ca1 Also implement arbitrary byte shift for 128 bit registers 2024-02-25 09:57:43 +05:30
Kovid Goyal
a7c06b38e6 We dont actually need vzeroupper at start of function
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal
0a1eb038a5 Implement functions for arbitrary byte shifts in vector registers 2024-02-25 09:57:42 +05:30
Kovid Goyal
eb1e3b33b4 Fix test failure on some systems
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal
b021e9b648 Do the default func test last so we can see what the failure is on more explicitly 2024-02-25 09:57:42 +05:30
Kovid Goyal
1acd223f45 ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
f48e4ffd5e Port aligned load based find algorithm to C 2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3 Functions to get bytes to first match ignoring leading bytes 2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d ... 2024-02-25 09:57:42 +05:30
Kovid Goyal
493fc900e9 Fix build on ARM 2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a Code to easily compare SIMD and scalar decode in a live instance
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d Better vector registers to pre-calculate before the loop 2024-02-25 09:57:41 +05:30
Kovid Goyal
920b8a2496 Use VZEROUPPER in avx functions
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96 const away to glory 2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d A further 5% speedup for UTF-8 decoding
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1 No longer need to abort after dealing with trailing bytes 2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274 Allow unbounded output in UTF8Decoder
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a Use unaligned stores
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00 Ignore another warning on some compiler versions in simde 2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928 Use a custom movmask for ARM rather than the one from simde
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2 Use aligned loads when finding either of two bytes
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25 ... 2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8 Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit 2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457 Cleanup KITTY_NO_SIMD compilation 2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023 Move finding byte code into separate functions
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc Dont build any SIMD code when the target is neither ARM64 nor x86/amd64 2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c Get universal builds working again
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6 Build only the SIMD code with SIMD compiler flags 2024-02-25 09:57:38 +05:30
Kovid Goyal
0e4c49a0d6 Fix building on macOS ARM 2024-02-25 09:57:35 +05:30
Kovid Goyal
e783eccc97 fix handling of bits from high byte of 4 byte sequences 2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4 DRYer 2024-02-25 09:57:35 +05:30
Kovid Goyal
67d22b0ec6 Avoid multiple branches for checking for trailing sequence 2024-02-25 09:57:34 +05:30
Kovid Goyal
79f99bb3ad Make print_register useable without full debug 2024-02-25 09:57:34 +05:30
Kovid Goyal
fa3579656b More invalid utf-8 tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
8a10fcaf5a More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
4c8b8caead Handle trailing incomplete sequences 2024-02-25 09:57:34 +05:30
Kovid Goyal
4238fedee7 More tests 2024-02-25 09:57:34 +05:30
Kovid Goyal
b0dcdf74bd More tests and micro-optimize switch to ASCII fast path 2024-02-25 09:57:34 +05:30
Kovid Goyal
a63d62fb4e ... 2024-02-25 09:57:34 +05:30