kovidgoyal-kitty

mirror of https://github.com/kovidgoyal/kitty.git synced 2026-07-03 11:12:30 +08:00

Author	SHA1	Message	Date
copilot-swe-agent[bot]	8a1780a800	Fix ARM64 Linux GCC build error caused by simde/arm/neon.h	2026-06-24 01:29:19 +00:00
Wukuyon	9d336be0d3	Combine ascii_mask/counts check with errors vector in simd-string-impl.h Initialize errors by xoring ascii_mask with the movemask result and setting the scalar mismatch into the chunk_is_invalid vector. Removes another conditional branch. Improves unicode benchmark performance by about 3%.	2025-10-23 22:37:33 -06:00
Wukuyon	1c869d3629	Refine test for overlapping UTF-8 sequences in simd-string-impl.h Replace a shift_right_by_one_byte call with comparison operations. Improves unicode benchmark performance by about 1%.	2025-10-23 22:37:33 -06:00
Wukuyon	65890de60d	Fix UTF-8 overlong and special range checks in simd-string-impl.h Modified `start_classification` in `utf8_decode_to_esc` in `simd-string-impl.h`, which now: Rejects `0xC0`, `0xC1` and `0xF5..0xFF` lead bytes in UTF-8 subsequences. Enforces special ranges for the second subsequence bytes after `0xE0`, `0xED`, `0xF0` and `0xF4` bytes to prevent overlong sequences, surrogates, and code points above U+10FFFF. Accumulates UTF-8 validation errors in a single vector to avoid many conditional branches. Worsens unicode benchmark performance by about 4%.	2025-10-23 22:37:33 -06:00
Kovid Goyal	d36a64087e	Bump Go to 1.23 We need this because Go < 1.23 produces binaries that dont work on modern OpenBSD because OpenBSD decided to remove syscall() from their libc. Mad buggers, who removes functions from libc breaking all binaries!! Also increase minimum macOS version to 11.0 as Go 1.23 requires that	2024-08-24 08:06:02 +05:30
Kovid Goyal	eb07307370	Ignore pedantic warnings from simde headers	2024-04-30 09:54:14 +05:30
Kovid Goyal	393169f79d	Fix #7225	2024-03-14 20:55:05 +05:30
Kovid Goyal	daeaf65d7e	fix compiler warning	2024-02-25 11:17:26 +05:30
Kovid Goyal	f4f06222d4	...	2024-02-25 09:57:44 +05:30
Kovid Goyal	ad3ab877f8	Use a fast SIMD implementation to XOR data going into the disk cache	2024-02-25 09:57:43 +05:30
Kovid Goyal	1db7ac5f6b	Use our new shift by n functions to improve function to zero last N bytes Benchmark neutral but cleaner code using one less vector register and equal number of operations.	2024-02-25 09:57:43 +05:30
Kovid Goyal	e77a970ca1	Also implement arbitrary byte shift for 128 bit registers	2024-02-25 09:57:43 +05:30
Kovid Goyal	a7c06b38e6	We dont actually need vzeroupper at start of function GCC emits vzeroupper automatically when compiling with native optimizations but we still need it otherwise	2024-02-25 09:57:43 +05:30
Kovid Goyal	0a1eb038a5	Implement functions for arbitrary byte shifts in vector registers	2024-02-25 09:57:42 +05:30
Kovid Goyal	eb1e3b33b4	Fix test failure on some systems Broken ass compilers strike again	2024-02-25 09:57:42 +05:30
Kovid Goyal	b021e9b648	Do the default func test last so we can see what the failure is on more explicitly	2024-02-25 09:57:42 +05:30
Kovid Goyal	1acd223f45	...	2024-02-25 09:57:42 +05:30
Kovid Goyal	f48e4ffd5e	Port aligned load based find algorithm to C	2024-02-25 09:57:42 +05:30
Kovid Goyal	36773c09d3	Functions to get bytes to first match ignoring leading bytes	2024-02-25 09:57:42 +05:30
Kovid Goyal	687340003d	...	2024-02-25 09:57:42 +05:30
Kovid Goyal	493fc900e9	Fix build on ARM	2024-02-25 09:57:41 +05:30
Kovid Goyal	f1fe0bf40a	Code to easily compare SIMD and scalar decode in a live instance Also remove -mtune=intel as it fails with clang	2024-02-25 09:57:41 +05:30
Kovid Goyal	d5f34c401d	Better vector registers to pre-calculate before the loop	2024-02-25 09:57:41 +05:30
Kovid Goyal	920b8a2496	Use VZEROUPPER in avx functions See https://www.intel.com/content/dam/develop/external/us/en/documents/11mc12-avoiding-2bavx-sse-2btransition-2bpenalties-2brh-2bfinal-809104.pdf	2024-02-25 09:57:40 +05:30
Kovid Goyal	d4c4805f96	const away to glory	2024-02-25 09:57:40 +05:30
Kovid Goyal	6cdc7ac91d	A further 5% speedup for UTF-8 decoding Achieved by decoding in larger chunks thereby amortizing the cost of creating various constant vectors over larger chunks.	2024-02-25 09:57:40 +05:30
Kovid Goyal	0bccada9d1	No longer need to abort after dealing with trailing bytes	2024-02-25 09:57:40 +05:30
Kovid Goyal	9cb9373274	Allow unbounded output in UTF8Decoder This will allow us to eventually decode more than a single vector's worth in a fast inner loop	2024-02-25 09:57:39 +05:30
Kovid Goyal	d987ffe49a	Use unaligned stores Makes no measurable difference in the benchmark. And will eventually allow us to process larger chunks of data without need to reset a bunch of vector registers to constant values each time.	2024-02-25 09:57:39 +05:30
Kovid Goyal	131716da00	Ignore another warning on some compiler versions in simde	2024-02-25 09:57:39 +05:30
Kovid Goyal	4d35fc2928	Use a custom movmask for ARM rather than the one from simde Supposedly faster, not that I can measure it, but... Also gives neater code, so keep it.	2024-02-25 09:57:39 +05:30
Kovid Goyal	9bca415af2	Use aligned loads when finding either of two bytes No measurable performance improvement, but neater algorithm anyway.	2024-02-25 09:57:39 +05:30
Kovid Goyal	60bc8e6c25	...	2024-02-25 09:57:39 +05:30
Kovid Goyal	8aa1b112b8	Turns out the simde implementation of movemask is not slow enough to compensate for the speed bump from 256 bit	2024-02-25 09:57:39 +05:30
Kovid Goyal	0bd47d8457	Cleanup KITTY_NO_SIMD compilation	2024-02-25 09:57:39 +05:30
Kovid Goyal	fcbda63023	Move finding byte code into separate functions movemask() is inefficient on ARM64 this will allow us to use a dedicated implementation for finding bytes on that platform	2024-02-25 09:57:38 +05:30
Kovid Goyal	73342411bc	Dont build any SIMD code when the target is neither ARM64 nor x86/amd64	2024-02-25 09:57:38 +05:30
Kovid Goyal	8dd6f9b07c	Get universal builds working again Now we use lipo and build individually so we can pass the correct compiler flags per arch	2024-02-25 09:57:38 +05:30
Kovid Goyal	7e77a196e6	Build only the SIMD code with SIMD compiler flags	2024-02-25 09:57:38 +05:30
Kovid Goyal	0e4c49a0d6	Fix building on macOS ARM	2024-02-25 09:57:35 +05:30
Kovid Goyal	e783eccc97	fix handling of bits from high byte of 4 byte sequences	2024-02-25 09:57:35 +05:30
Kovid Goyal	7e6459a5e4	DRYer	2024-02-25 09:57:35 +05:30
Kovid Goyal	67d22b0ec6	Avoid multiple branches for checking for trailing sequence	2024-02-25 09:57:34 +05:30
Kovid Goyal	79f99bb3ad	Make print_register useable without full debug	2024-02-25 09:57:34 +05:30
Kovid Goyal	fa3579656b	More invalid utf-8 tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	8a10fcaf5a	More tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	4c8b8caead	Handle trailing incomplete sequences	2024-02-25 09:57:34 +05:30
Kovid Goyal	4238fedee7	More tests	2024-02-25 09:57:34 +05:30
Kovid Goyal	b0dcdf74bd	More tests and micro-optimize switch to ASCII fast path	2024-02-25 09:57:34 +05:30
Kovid Goyal	a63d62fb4e	...	2024-02-25 09:57:34 +05:30

1 2

72 Commits