copilot-swe-agent[bot]
8a1780a800
Fix ARM64 Linux GCC build error caused by simde/arm/neon.h
2026-06-24 01:29:19 +00:00
Wukuyon
9d336be0d3
Combine ascii_mask/counts check with errors vector in simd-string-impl.h
...
Initialize errors by xoring ascii_mask with the movemask result and
setting the scalar mismatch into the chunk_is_invalid vector.
Removes another conditional branch.
Improves unicode benchmark performance by about 3%.
2025-10-23 22:37:33 -06:00
Wukuyon
1c869d3629
Refine test for overlapping UTF-8 sequences in simd-string-impl.h
...
Replace a shift_right_by_one_byte call with comparison operations.
Improves unicode benchmark performance by about 1%.
2025-10-23 22:37:33 -06:00
Wukuyon
65890de60d
Fix UTF-8 overlong and special range checks in simd-string-impl.h
...
Modified `start_classification` in `utf8_decode_to_esc` in `simd-string-impl.h`, which now:
Rejects `0xC0`, `0xC1` and `0xF5..0xFF` lead bytes in UTF-8 subsequences.
Enforces special ranges for the second subsequence bytes after `0xE0`, `0xED`, `0xF0` and `0xF4` bytes to prevent overlong sequences, surrogates, and code points above U+10FFFF.
Accumulates UTF-8 validation errors in a single vector to avoid many conditional branches.
Worsens unicode benchmark performance by about 4%.
2025-10-23 22:37:33 -06:00
Kovid Goyal
d36a64087e
Bump Go to 1.23
...
We need this because Go < 1.23 produces binaries that dont work on
modern OpenBSD because OpenBSD decided to remove syscall() from their
libc. Mad buggers, who removes functions from libc breaking all
binaries!!
Also increase minimum macOS version to 11.0 as Go 1.23 requires that
2024-08-24 08:06:02 +05:30
Kovid Goyal
eb07307370
Ignore pedantic warnings from simde headers
2024-04-30 09:54:14 +05:30
Kovid Goyal
393169f79d
Fix #7225
2024-03-14 20:55:05 +05:30
Kovid Goyal
daeaf65d7e
fix compiler warning
2024-02-25 11:17:26 +05:30
Kovid Goyal
f4f06222d4
...
2024-02-25 09:57:44 +05:30
Kovid Goyal
ad3ab877f8
Use a fast SIMD implementation to XOR data going into the disk cache
2024-02-25 09:57:43 +05:30
Kovid Goyal
1db7ac5f6b
Use our new shift by n functions to improve function to zero last N bytes
...
Benchmark neutral but cleaner code using one less vector register and equal
number of operations.
2024-02-25 09:57:43 +05:30
Kovid Goyal
e77a970ca1
Also implement arbitrary byte shift for 128 bit registers
2024-02-25 09:57:43 +05:30
Kovid Goyal
a7c06b38e6
We dont actually need vzeroupper at start of function
...
GCC emits vzeroupper automatically when compiling with native
optimizations but we still need it otherwise
2024-02-25 09:57:43 +05:30
Kovid Goyal
0a1eb038a5
Implement functions for arbitrary byte shifts in vector registers
2024-02-25 09:57:42 +05:30
Kovid Goyal
eb1e3b33b4
Fix test failure on some systems
...
Broken ass compilers strike again
2024-02-25 09:57:42 +05:30
Kovid Goyal
b021e9b648
Do the default func test last so we can see what the failure is on more explicitly
2024-02-25 09:57:42 +05:30
Kovid Goyal
1acd223f45
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
f48e4ffd5e
Port aligned load based find algorithm to C
2024-02-25 09:57:42 +05:30
Kovid Goyal
36773c09d3
Functions to get bytes to first match ignoring leading bytes
2024-02-25 09:57:42 +05:30
Kovid Goyal
687340003d
...
2024-02-25 09:57:42 +05:30
Kovid Goyal
493fc900e9
Fix build on ARM
2024-02-25 09:57:41 +05:30
Kovid Goyal
f1fe0bf40a
Code to easily compare SIMD and scalar decode in a live instance
...
Also remove -mtune=intel as it fails with clang
2024-02-25 09:57:41 +05:30
Kovid Goyal
d5f34c401d
Better vector registers to pre-calculate before the loop
2024-02-25 09:57:41 +05:30
Kovid Goyal
920b8a2496
Use VZEROUPPER in avx functions
...
See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf
2024-02-25 09:57:40 +05:30
Kovid Goyal
d4c4805f96
const away to glory
2024-02-25 09:57:40 +05:30
Kovid Goyal
6cdc7ac91d
A further 5% speedup for UTF-8 decoding
...
Achieved by decoding in larger chunks thereby amortizing the cost
of creating various constant vectors over larger chunks.
2024-02-25 09:57:40 +05:30
Kovid Goyal
0bccada9d1
No longer need to abort after dealing with trailing bytes
2024-02-25 09:57:40 +05:30
Kovid Goyal
9cb9373274
Allow unbounded output in UTF8Decoder
...
This will allow us to eventually decode more than a single
vector's worth in a fast inner loop
2024-02-25 09:57:39 +05:30
Kovid Goyal
d987ffe49a
Use unaligned stores
...
Makes no measurable difference in the benchmark. And will eventually
allow us to process larger chunks of data without need to reset a bunch
of vector registers to constant values each time.
2024-02-25 09:57:39 +05:30
Kovid Goyal
131716da00
Ignore another warning on some compiler versions in simde
2024-02-25 09:57:39 +05:30
Kovid Goyal
4d35fc2928
Use a custom movmask for ARM rather than the one from simde
...
Supposedly faster, not that I can measure it, but...
Also gives neater code, so keep it.
2024-02-25 09:57:39 +05:30
Kovid Goyal
9bca415af2
Use aligned loads when finding either of two bytes
...
No measurable performance improvement, but neater algorithm anyway.
2024-02-25 09:57:39 +05:30
Kovid Goyal
60bc8e6c25
...
2024-02-25 09:57:39 +05:30
Kovid Goyal
8aa1b112b8
Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit
2024-02-25 09:57:39 +05:30
Kovid Goyal
0bd47d8457
Cleanup KITTY_NO_SIMD compilation
2024-02-25 09:57:39 +05:30
Kovid Goyal
fcbda63023
Move finding byte code into separate functions
...
movemask() is inefficient on ARM64 this will allow us to use a dedicated
implementation for finding bytes on that platform
2024-02-25 09:57:38 +05:30
Kovid Goyal
73342411bc
Dont build any SIMD code when the target is neither ARM64 nor x86/amd64
2024-02-25 09:57:38 +05:30
Kovid Goyal
8dd6f9b07c
Get universal builds working again
...
Now we use lipo and build individually so we can pass the correct
compiler flags per arch
2024-02-25 09:57:38 +05:30
Kovid Goyal
7e77a196e6
Build only the SIMD code with SIMD compiler flags
2024-02-25 09:57:38 +05:30
Kovid Goyal
0e4c49a0d6
Fix building on macOS ARM
2024-02-25 09:57:35 +05:30
Kovid Goyal
e783eccc97
fix handling of bits from high byte of 4 byte sequences
2024-02-25 09:57:35 +05:30
Kovid Goyal
7e6459a5e4
DRYer
2024-02-25 09:57:35 +05:30
Kovid Goyal
67d22b0ec6
Avoid multiple branches for checking for trailing sequence
2024-02-25 09:57:34 +05:30
Kovid Goyal
79f99bb3ad
Make print_register useable without full debug
2024-02-25 09:57:34 +05:30
Kovid Goyal
fa3579656b
More invalid utf-8 tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
8a10fcaf5a
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
4c8b8caead
Handle trailing incomplete sequences
2024-02-25 09:57:34 +05:30
Kovid Goyal
4238fedee7
More tests
2024-02-25 09:57:34 +05:30
Kovid Goyal
b0dcdf74bd
More tests and micro-optimize switch to ASCII fast path
2024-02-25 09:57:34 +05:30
Kovid Goyal
a63d62fb4e
...
2024-02-25 09:57:34 +05:30