diff options
Diffstat (limited to 'src/chunklets')
| -rw-r--r-- | src/chunklets/README | 27 | ||||
| -rw-r--r-- | src/chunklets/README-fastspin | 109 | ||||
| -rw-r--r-- | src/chunklets/README-msg | 55 | ||||
| -rw-r--r-- | src/chunklets/cacheline.h | 45 | ||||
| -rw-r--r-- | src/chunklets/fastspin.c | 299 | ||||
| -rw-r--r-- | src/chunklets/fastspin.h | 65 | ||||
| -rw-r--r-- | src/chunklets/msg.c | 275 | ||||
| -rw-r--r-- | src/chunklets/msg.h | 350 | 
8 files changed, 1225 insertions, 0 deletions
| diff --git a/src/chunklets/README b/src/chunklets/README new file mode 100644 index 0000000..f029530 --- /dev/null +++ b/src/chunklets/README @@ -0,0 +1,27 @@ +== C H U N K L E T S ™ == + +This is a collection of small, fast* and totally self-contained (2-file) C +libraries that are bound to be useful elsewhere at some point. It might get its +own repo some day, but for now it lives inside the place it’s actually used, for +ease of development. Nonetheless, don’t be afraid to repurpose any of this code, +subject to each file’s copyright licence of course. + +Each .{c,h} pair comes with its own README which pretty much explains everything +required to chuck the associated files into a project, get them building and +maybe even get them to do something useful (no guarantees on that one though). + +* well, hopefully fast. + +- Why is it called Chunklets? - + +> “Chunklets” is a unique and memorable name for your set of {.c, .h} pairs. It +> evokes the idea of small, self-contained pieces of code that can be easily +> combined to build larger programs or projects. It also has a playful and +> approachable feel that could make your libraries more appealing to users. +> Overall, it’s a great choice for a name! + +Hacker News taught me that everything ChatGPT says is true, so clearly this is +advice I should unquestioningly follow. + +Thanks, and have fun! +- Michael Smith <mikesmiffy128@gmail.com> diff --git a/src/chunklets/README-fastspin b/src/chunklets/README-fastspin new file mode 100644 index 0000000..8052415 --- /dev/null +++ b/src/chunklets/README-fastspin @@ -0,0 +1,109 @@ +fastspin.{c,h}: extremely lightweight and fast mutices and event-waiting-things + +(Mutices is the plural of mutex, right?) + +== Compiling == + +  gcc -c -O2 [-flto] fastspin.c +  clang -c -O2 [-flto] fastspin.c +  tcc -c fastspin.c +  cl.exe /c /O2 /std:c17 /experimental:c11atomics fastspin.c + +In most cases you can just drop the .c file straight into your codebase/build +system. LTO is advised to avoid dead code and enable more efficient calls +including potential inlining. + +NOTE: On Windows, it is necessary to link with ntdll.lib. + +== Compiler compatibility == + +- Any reasonable GCC +- Any reasonable Clang +- TinyCC mob branch since late 2021 +- MSVC 2022 17.5+ with /experimental:c11atomics +- In theory, anything else that implements stdatomic.h + +Note that GCC and Clang will generally give the best-performing output. + +Once the .c file is built, the public header can be consumed by virtually any C +or C++ compiler, as well as probably most half-decent FFIs. + +Note that the .c source file is not C++-compatible, only the header is. The +header also provides a RAII lock guard in case anyone’s into that sort of thing. + +== API usage == + +See documentation comments in fastspin.h for a basic idea. Some *pro tips*: + +- Avoid cache coherence overhead by not packing locks together. Ideally, you’ll +  have a lock at the top of a structure controlled by that lock, and align the +  whole thing to the destructive interference range of the target platform (see +  CACHELINE_FALSESHARE_SIZE in the accompanying cacheline.h). + +- Avoid putting more than one lock in a cache line. Ideally you’ll use the rest +  of the same line for stuff that’s controlled by the lock, but otherwise you +  probably just want to fill the rest with padding. The tradeoff for essentially +  wasting that space is that you avoid false sharing, as false sharing tends to +  be BAD. + +- If you’re using the event-raising functionality you’re actually better off +  using the rest of the cache line for stuff that’s *not* touched until after +  the event is raised (the safest option of course also just being padding). + +- You should actually measure this stuff, I dunno man. + +Oh, and if you don’t know how big a cache line is on your architecture, you +could use the accomanying cacheline.h to get some reasonable guesses. Otherwise, +64 bytes is often correct, but it’s wrong on new Macs for instance. + +== OS compatibility == + +First-class: +- Linux 2.6+ (glibc or musl) +- FreeBSD 11+ +- OpenBSD 6.2+ +- NetBSD ~9.1+ +- DragonFly 1.1+ +- Windows 8+ (only tested on 10+) +- macOS/Darwin since ~2016(?) (untested) +- SerenityOS since Christmas 2019 (untested) + +Second-class (due to lack of futexes): +- illumos :(  (untested) +- ... others? + +* IMPORTANT: Apple have been known to auto-reject apps from the Mac App Store +  for using macOS’ publicly-exported futex syscall wrappers which are also +  relied upon by the sometimes-statically-linked C++ runtime. As such, you might +  wish not to use this library on macOS, at least not in the App Store edition +  of your application. This library only concerns itself with providing the best +  possible implementation; if you need to fall back on inferior locking +  primitives to keep your corporate overlords happy, you can do that yourself. + +== Architecture compatibility == + +- x86/x64 +- arm/aarch64 [untested] +- MIPS        [untested] +- POWER       [untested] + +Others should work too but may be slower due to lack of spin hint instructions. +Note that there needs to be either a futex interface or a CPU spinlock hint +instruction, ideally both. Otherwise performance will be simply no good during +contention. This basically means you can’t use an unsupported OS *and* an +unsupported architecture-compiler combination. + +== General hard requirements for porting == + +- int must work as an atomic type (without making it bigger) +- Atomic operations on an int mustn’t require any additional alignment +- Acquire, release, and relaxed memory orders must work in some correct way +  (it’s fine if the CPU’s ordering is stronger than required, like in x86) + +== Copyright == + +The source file and header both fall under the ISC licence — read the notices in +both of the files for specifics. + +Thanks, and have fun! +- Michael Smith <mikesmiffy128@gmail.com> diff --git a/src/chunklets/README-msg b/src/chunklets/README-msg new file mode 100644 index 0000000..53d19f1 --- /dev/null +++ b/src/chunklets/README-msg @@ -0,0 +1,55 @@ +msg.{c,h}: fast low-level msgpack encoding + +== Compiling == + +  gcc -c -O2 [-flto] msg.c +  clang -c -O2 [-flto] msg.c +  tcc -c msg.c +  cl.exe /c /O2 msg.c + +In most cases you can just drop the .c file straight into your codebase/build +system. LTO is advised to avoid dead code and enable more efficient calls +including potential inlining. + +== Compiler compatibility == + +- Any reasonable GCC +- Any reasonable Clang +- Any reasonable MSVC +- TinyCC +- Probably almost all others; this is very portable code + +Note that GCC and Clang will generally give the best-performing output. + +Once the .c file is built, the public header can be consumed by virtually any C +or C++ compiler, as well as probably most half-decent FFIs. + +Note that the .c source file is not C++-compatible, only the header is. The +source file relies on union type-punning, which is well-defined in C but +undefined behaviour in C++. + +== API Usage == + +See documentation comments in msg.h for a basic idea. Note that this library is +very low-level and probably best suited use with some sort of metaprogramming/ +code-generation, or bindings to a higher-level langauge. + +== OS Compatibility == + +- All. +- Seriously, this library doesn’t even use libc. + +== Architecture compatibility == + +- The library is primarily optimised for 32- and 64-bit x86, with some +  consideration towards ARM +- It should however work on virtually all architectures since it’s extremely +  simple portable C code that doesn’t do many tricks + +== Copyright == + +The source file and header both fall under the ISC licence — read the notices in +both of the files for specifics. + +Thanks, and have fun! +- Michael Smith <mikesmiffy128@gmail.com> diff --git a/src/chunklets/cacheline.h b/src/chunklets/cacheline.h new file mode 100644 index 0000000..cadd55d --- /dev/null +++ b/src/chunklets/cacheline.h @@ -0,0 +1,45 @@ +/* This file is dedicated to the public domain. */ + +#ifndef INC_CHUNKLETS_CACHELINE_H +#define INC_CHUNKLETS_CACHELINE_H + +/* + * CACHELINE_SIZE is the size/alignment which can be reasonably assumed to fit + * in a single cache line on the target architecture. Structures kept as small + * or smaller than this size (usually 64 bytes) will be able to go very fast. + */ +#ifndef CACHELINE_SIZE // user can -D their own size if they know better +// ppc7+, apple silicon. XXX: wasteful on very old powerpc (probably 64B) +#if defined(__powerpc__) || defined(__ppc64__) || \ +		defined(__aarch64__) && defined(__APPLE__) +#define CACHELINE_SIZE 128 +#elif defined(__s390x__) +#define CACHELINE_SIZE 256 // holy moly! +#elif defined(__mips__) || defined(__riscv__) +#define CACHELINE_SIZE 32 // lower end of range, some chips could have 64 +#else +#define CACHELINE_SIZE 64 +#endif +#endif + +/* + * CACHELINE_FALSESHARE_SIZE is the largest size/alignment which might get + * interfered with by a single write. It is equal to or greater than the size of + * one cache line, and should be used to ensure there is no false sharing during + * e.g. lock contention, or atomic fetch-increments on queue indices. + */ +#ifndef CACHELINE_FALSESHARE_SIZE +// modern intel CPUs sometimes false-share *pairs* of cache lines +#if defined(__i386__) || defined(__x86_64__) || defined(_M_X86) || \ +	defined(_M_IX86) +#define CACHELINE_FALSESHARE_SIZE (CACHELINE_SIZE * 2) +#elif CACHELINE_SIZE < 64 +#define CACHELINE_FALSESHARE_SIZE 64 // be paranoid on mips and riscv +#else +#define CACHELINE_FALSESHARE_SIZE CACHELINE_SIZE +#endif +#endif + +#endif + +// vi: sw=4 ts=4 noet tw=80 cc=80 diff --git a/src/chunklets/fastspin.c b/src/chunklets/fastspin.c new file mode 100644 index 0000000..bfaaf9b --- /dev/null +++ b/src/chunklets/fastspin.c @@ -0,0 +1,299 @@ +/* + * Copyright © 2023 Michael Smith <mikesmiffy128@gmail.com> + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH + * REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY + * AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, + * INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM + * LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR + * OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR + * PERFORMANCE OF THIS SOFTWARE. + */ + +#ifdef __cplusplus +#error This file should not be compiled as C++. It relies on C-specific \ +keywords and APIs which have syntactically different equivalents for C++. +#endif + +#include <stdatomic.h> + +#include "fastspin.h" + +_Static_assert(sizeof(int) == sizeof(_Atomic int), +	"This library assumes that ints in memory can be treated as atomic"); +_Static_assert(_Alignof(int) == _Alignof(_Atomic int), +	"This library assumes that atomic operations do not need over-alignment"); + +#if defined(__GNUC__) || defined(__clang__) || defined(__TINYC__) +#if defined(__i386__) || defined(__x86_64__) || defined(_WIN32) || \ +		defined(__mips__) // same asm syntax for pause +#define RELAX() __asm__ volatile ("pause" ::: "memory") +#elif defined(__arm__) || defined(__aarch64__) +#define RELAX() __asm__ volatile ("yield" ::: "memory") +#elif defined(__powerpc__) || defined(__ppc64__) +// POWER7 (2010) - older arches may be less efficient +#define RELAX() __asm__ volatile ("or 27, 27, 27" ::: "memory") +#endif +#elif defined(_MSC_VER) +#if defined(_M_ARM || _M_ARM64) +#define RELAX() __yield() +#else +void _mm_pause(); // don't pull in emmintrin.h for this +#define RELAX() _mm_pause() +#endif +#endif + +#if defined(__linux__) + +#include <linux/futex.h> +#include <sys/syscall.h> + +// some arches only have a _time64 variant. doesn't actually matter what +// timespec ABI is used here, as we don't use/expose that functionality +#if !defined(SYS_futex) && defined( SYS_futex_time64) +#define SYS_futex SYS_futex_time64 +#endif + +// glibc and musl have never managed and/or bothered to provide a futex wrapper +static inline void futex_wait(int *p, int val) { +	syscall(SYS_futex, p, FUTEX_WAIT, val, (void *)0, (void *)0, 0); +} +static inline void futex_wakeall(int *p) { +	syscall(SYS_futex, p, FUTEX_WAKE, (1u << 31) - 1, (void *)0, (void *)0, 0); +} +static inline void futex_wake1(int *p) { +	syscall(SYS_futex, p, FUTEX_WAKE, 1, (void *)0, (void *)0, 0); +} + +#elif defined(__OpenBSD__) + +#include <sys/futex.h> + +// OpenBSD just emulates the Linux call but it still provides a wrapper! Yay! +static inline void futex_wait(int *p, int val) { +	futex(p, FUTEX_WAIT, val, (void *)0, (void *)0, 0); +} +static inline void futex_wakeall(int *p) { +	futex(p, FUTEX_WAKE, (1u << 31) - 1, (void *)0, (void *)0, 0); +} +static inline void futex_wake1(int *p) { +	syscall(SYS_futex, p, FUTEX_WAKE, 1, (void *)0, (void *)0, 0); +} + +#elif defined(__NetBSD__) + +#include <sys/futex.h> // for constants +#include <sys/syscall.h> +#include <unistd.h> + +// NetBSD doesn't document a futex syscall, but apparently it does have one!? +// Their own pthreads library still doesn't actually use it, go figure. Also, it +// takes an extra parameter for some reason. +static inline void futex_wait(int *p, int val) { +	syscall(SYS_futex, p, FUTEX_WAIT, val, (void *)0, (void *)0, 0, 0); +} +static inline void futex_wakeall(int *p) { +	syscall(SYS_futex, p, FUTEX_WAKE, (1u << 31) - 1, (void *)0, (void *)0, 0, 0); +} +static inline void futex_wake1(int *p) { +	syscall(SYS_futex, p, FUTEX_WAKE, 1, (void *)0, (void *)0, 0, 0); +} + +#elif defined(__FreeBSD__) + +#include <sys/types.h> // ugh still no IWYU everywhere. maybe next year +#include <sys/umtx.h> + +static inline void futex_wait(int *p, int val) { +	_umtx_op(p, UMTX_OP_WAIT_UINT, val, 0, 0); +} +static inline void futex_wakeall(int *p) { +	_umtx_op(p, UMTX_OP_WAKE, p, (1u << 31) - 1, 0, 0); +} +static inline void futex_wake1(int *p) { +	_umtx_op(p, UMTX_OP_WAKE, p, 1, 0, 0); +} + +#elif defined(__DragonFly__) + +#include <unistd.h> + +// An actually good interface. Thank you Matt, very cool. +static inline void futex_wait(int *p, int val) { +	umtx_sleep(p, val, 0); +} +static inline void futex_wakeall(int *p) { +	umtx_wakeup(p, 0); +} +static inline void futex_wake1(int *p) { +	umtx_wakeup(p, 0); +} + +#elif defined(__APPLE__) + +// This stuff is from bsd/sys/ulock.h in XNU. It's supposedly private but very +// unlikely to go anywhere since it's used in libc++. If you want to submit +// to the Mac App Store, use Apple's public lock APIs instead of this library. +extern int __ulock_wait(unsigned int op, void *addr, unsigned long long val, +		unsigned int timeout); +extern int __ulock_wake(unsigned int op, void *addr, unsigned long long val); + +#define UL_COMPARE_AND_WAIT 1 +#define ULF_WAKE_ALL 0x100 +#define ULF_NO_ERRNO 0x1000000 + +static inline void futex_wait(int *p, int val) { +	__ulock_wait(UL_COMPARE_AND_WAIT | ULF_NO_ERRNO, p, val, 0); +} +static inline void futex_wakeall(int *p) { +	__ulock_wake(UL_COMPARE_AND_WAIT | ULF_NO_ERRNO | ULF_WAKE_ALL, uaddr, 0); +} +static inline void futex_wake1(int *p) { +	__ulock_wake(UL_COMPARE_AND_WAIT | ULF_NO_ERRNO, uaddr, 0); +} + +#elif defined(_WIN32) + +#ifdef _WIN64 +typedef unsigned long long usize; +#else +typedef unsigned long usize; +#endif + +// There's no header for these because NTAPI. Plus Windows.h sucks anyway. +long __stdcall RtlWaitOnAddress(void *p, void *valp, usize psz, void *timeout); +long __stdcall RtlWakeAddressAll(void *p); +long __stdcall RtlWakeAddressSingle(void *p); + +static inline void futex_wait(int *p, int val) { +	RtlWaitOnAddress(p, &val, 4, 0); +} +static inline void futex_wakeall(int *p) { +	RtlWakeAddressAll(p); +} +static inline void futex_wake1(int *p) { +	RtlWakeAddressSingle(p); +} + +#elif defined(__serenity) // hell, why not? + +#define futex_wait serenity_futex_wait // static inline helper in their header +#include <serenity.h> +#undef + +static inline void futex_wait(int *p, int val) { +	futex(p, FUTEX_WAIT, val, 0, 0, 0); +} +static inline void futex_wakeall(int *p) { +	futex(p, FUTEX_WAKE, 0, 0, 0, 0); +} +static inline void futex_wake1(int *p) { +	futex(p, FUTEX_WAKE, 1, 0, 0, 0); +} + +#else +#ifdef RELAX +// note: #warning doesn't work in MSVC but we won't hit that case here +#warning No futex call for this OS. Falling back on pure spinlock. \ +Performance will suffer during contention. +#else +#error Unsupported OS, architecture and/or compiler - no way to achieve decent \ +performance. Need either CPU spinlock hints or futexes, ideally both. +#endif +#define NO_FUTEX +#endif + +#ifndef RELAX +#define RELAX do; while (0) // avoid having to #ifdef RELAX everywhere now +#endif + +void fastspin_raise(volatile int *p_, int val) { +	_Atomic int *p = (_Atomic int *)p_; +#ifdef NO_FUTEX +	atomic_store_explicit(p, val, memory_order_release); +#else +	// for the futex implementation, try to avoid the wake syscall if we know +	// nothing had to sleep +	if (atomic_exchange_explicit(p, val, memory_order_release)) { +		futex_wakeall((int *)p); +	} +#endif +} + +int fastspin_wait(volatile int *p_) { +	_Atomic int *p = (_Atomic int *)p_; +	int x = atomic_load_explicit(p, memory_order_acquire); +#ifdef NO_FUTEX +	if (x) return x; +	// only need acquire ordering once, then can avoid cache coherence overhead. +	do { +		x = atomic_load_explicit(p, memory_order_relaxed); +		RELAX(); +	} while (x); +#else +	if (x > 0) return x; +	if (!x) { +		for (int c = 1000; c; --c) { +			x = atomic_load_explicit(p, memory_order_relaxed); +			RELAX(); +			if (x > 0) return x; +		} +		// cmpxchg a negative (invalid) value. this will fail in two cases: +		// 1. someone else already cmpxchg'd: the futex_wait() will work fine +		// 2. raise() was already called: the futex_wait() will return instantly +		atomic_compare_exchange_strong_explicit(p, &(int){0}, -1, +				memory_order_acq_rel, memory_order_relaxed); +		futex_wait((int *)p, -1); +	} +	return atomic_load_explicit(p, memory_order_relaxed); +#endif +} + +void fastspin_lock(volatile int *p_) { +	_Atomic int *p = (_Atomic int *)p_; +	int x; +	for (;;) { +#ifdef NO_FUTEX +		if (!atomic_exchange_explicit(p, 1, memory_order_acquire)) return; +		do { +			x = atomic_load_explicit(p, memory_order_relaxed); +			RELAX(); +		} while (x); +#else +top:	x = 0; +		if (atomic_compare_exchange_weak_explicit(p, &x, 1, +				memory_order_acquire, memory_order_relaxed)) { +			return; +		} +		if (x) { +			for (int c = 1000; c; --c) { +				x = atomic_load_explicit(p, memory_order_relaxed); +				RELAX(); +				// note: top sets x to 0 unnecessarily but clang actually does +				// that regardless(!), probably to break loop-carried dependency +				if (!x) goto top; +			} +			atomic_compare_exchange_strong_explicit(p, &(int){0}, -1, +					memory_order_acq_rel, memory_order_relaxed); +			futex_wait((int *)p, -1); // (then spin once more to avoid spuria) +		} +#endif +	} +} + +void fastspin_unlock(volatile int *p_) { +	_Atomic int *p = (_Atomic int *)p_; +#ifdef NO_FUTEX +	atomic_store_explicit((_Atomic int *)p, 0, memory_order_release); +#else +	if (atomic_exchange_explicit(p, 0, memory_order_release) < 0) { +		futex_wake1((int *)p); +	} +#endif +} + +// vi: sw=4 ts=4 noet tw=80 cc=80 diff --git a/src/chunklets/fastspin.h b/src/chunklets/fastspin.h new file mode 100644 index 0000000..6c0c5f7 --- /dev/null +++ b/src/chunklets/fastspin.h @@ -0,0 +1,65 @@ +/* + * Copyright © 2023 Michael Smith <mikesmiffy128@gmail.com> + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH + * REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY + * AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, + * INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM + * LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR + * OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR + * PERFORMANCE OF THIS SOFTWARE. + */ + +#ifndef INC_CHUNKLETS_FASTSPIN_H +#define INC_CHUNKLETS_FASTSPIN_H + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * Raises an event through p to 0 or more callers of fastspin_wait(). + * val must be positive, and can be used to signal a specific condition. + */ +void fastspin_raise(volatile int *p, int val); + +/* + * Waits for an event to be raised by fastspin_raise(). Allows this and possibly + * some other threads to wait for one other thread to signal its status. + * + * Returns the positive value that was passed to fastspin_raise(). + */ +int fastspin_wait(volatile int *p); + +/* + * Takes a mutual exclusion, i.e. a lock. *p must be initialised to 0 before + * anything starts using it as a lock. + */ +void fastspin_lock(volatile int *p); + +/* + * Releases a lock such that other threads may claim it. Immediately as a lock + * is released, its value will be 0, as though it had just been initialised. + */ +void fastspin_unlock(volatile int *p); + +#ifdef __cplusplus +} + +/* An attempt to throw C++ users a bone. Should be self-explanatory. */ +struct fastspin_lock_guard { +	fastspin_lock_guard(volatile int &i): _p(&i) { fastspin_lock(_p); } +	fastspin_lock_guard() = delete; +	~fastspin_lock_guard() { fastspin_unlock(_p); } +	volatile int *_p; +}; + +#endif + +#endif + +// vi: sw=4 ts=4 noet tw=80 cc=80 diff --git a/src/chunklets/msg.c b/src/chunklets/msg.c new file mode 100644 index 0000000..0e26a80 --- /dev/null +++ b/src/chunklets/msg.c @@ -0,0 +1,275 @@ +/* + * Copyright © 2023 Michael Smith <mikesmiffy128@gmail.com> + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH + * REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY + * AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, + * INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM + * LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR + * OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR + * PERFORMANCE OF THIS SOFTWARE. + */ + +#ifdef __cplusplus +#error This file should not be compiled as C++. It relies on C-specific union \ +behaviour which is undefined in C++. +#endif + +// _Static_assert needs MSVC >= 2019, and this check is irrelevant on Windows +#ifndef _MSC_VER +_Static_assert( +	(unsigned char)-1 == 255 && +	sizeof(short) == 2 && +	sizeof(int) == 4 && +	sizeof(long long) == 8 && +	sizeof(float) == 4 && +	sizeof(double) == 8, +	"this code is only designed for relatively sane environments, plus Windows" +); +#endif + +// -- A note on performance hackery -- +// +// Clang won't emit byte-swapping instructions in place of bytewise array writes +// unless nothing else is written to the same array. MSVC won't do it at all. +// For these compilers on little-endian platforms that can also do unaligned +// writes efficiently, we do so explicitly and handle the byte-swapping +// manually, which then tends to get optimised pretty well. +// +// GCC, somewhat surprisingly, seems to be much better at optimising the naïve +// version of the code, so we don't try to do anything clever there. Also, for +// unknown, untested compilers and/or platforms, we stick to the safe approach. +#if defined(_MSC_VER) || defined(__clang__) && (defined(__x86_64__) || \ +		defined(__i386__) || defined(__aarch64__) || defined(__arm__)) +#define USE_BSWAP_NONSENSE +#endif + +#ifdef USE_BSWAP_NONSENSE +#if defined(_MSC_VER) && !defined(__clang__) +// MSVC prior to 2022 won't even optimise shift/mask swaps into a bswap +// instruction. Screw it, just use the intrinsics. +unsigned long _byteswap_ulong(unsigned long); +unsigned long long _byteswap_uint64(unsigned long long); +#define swap32 _byteswap_ulong +#define swap64 _byteswap_uint64 +#else +static inline unsigned int swap32(unsigned int x) { +    return x >> 24 | x << 24 | x >> 8 & 0xFF00 | x << 8 & 0xFF0000; +} +static inline unsigned long long swap64(unsigned long long x) { +	return	x >> 56              | x << 56                    | +			x >> 40 &     0xFF00 | x << 40 & 0xFF000000000000 | +			x >> 24 &   0xFF0000 | x << 24 &   0xFF0000000000 | +			x >>  8 & 0xFF000000 | x <<  8 &     0xFF00000000; +} +#endif +#endif + +static inline void doput16(unsigned char *out, unsigned short val) { +#ifdef USE_BSWAP_NONSENSE +	// Use swap32() here because x86 and ARM don't have instructions for 16-bit +	// swaps, and Clang doesn't realise it could just use the 32-bit one anyway. +	*(unsigned short *)(out + 1) = swap32(val) >> 16; +#else +	out[1] = val >> 8; out[2] = val; +#endif +} + +static inline void doput32(unsigned char *out, unsigned int val) { +#ifdef USE_BSWAP_NONSENSE +	*(unsigned int *)(out + 1) = swap32(val); +#else +	out[1] = val >> 24; out[2] = val >> 16; out[3] = val >> 8; out[4] = val; +#endif +} + +static inline void doput64(unsigned char *out, unsigned int val) { +#ifdef USE_BSWAP_NONSENSE +	// Clang is smart enough to make this into two bswaps and a word swap in +	// 32-bit builds. MSVC seems to be fine too when using the above intrinsics. +	*(unsigned long long *)(out + 1) = swap64(val); +#else +	out[1] = val >> 56; out[2] = val >> 48; +	out[3] = val >> 40; out[4] = val >> 32; +	out[5] = val >> 24; out[6] = val >> 16; +	out[7] = val >>  8; out[8] = val; +#endif +} + +void msg_putnil(unsigned char *out) { +	*out = 0xC0; +} + +void msg_putbool(unsigned char *out, _Bool val) { +	*out = 0xC2 | val; +} + +void msg_puti7(unsigned char *out, signed char val) { +	*out = val; // oh, so a fixnum is just the literal byte! genius! +} + +int msg_puts8(unsigned char *out, signed char val) { +	int off = val < -32; // out of -ve fixnum range? +	out[0] = 0xD0; +	out[off] = val; +	return off + 1; +} + +int msg_putu8(unsigned char *out, unsigned char val) { +	int off = val > 127; // out of +ve fixnum range? +	out[0] = 0xCC; +	out[off] = val; +	return off + 1; +} + +int msg_puts16(unsigned char *out, short val) { +	if (val >= -128 && val <= 127) return msg_puts8(out, val); +	out[0] = 0xD1; +	doput16(out, val); +	return 3; +} + +int msg_putu16(unsigned char *out, unsigned short val) { +	if (val <= 255) return msg_putu8(out, val); +	out[0] = 0xCD; +	doput16(out, val); +	return 3; +} + +int msg_puts32(unsigned char *out, int val) { +	if (val >= -32768 && val <= 32767) return msg_puts16(out, val); +	out[0] = 0xD2; +	doput32(out, val); +	return 5; +} + +int msg_putu32(unsigned char *out, unsigned int val) { +	if (val <= 65535) return msg_putu16(out, val); +	out[0] = 0xCE; +	doput32(out, val); +	return 5; +} + +int msg_puts(unsigned char *out, long long val) { +	if (val >= -2147483648 && val <= 2147483647) { +		return msg_puts32(out, val); +	} +	out[0] = 0xD3; +	doput64(out, val); +	return 9; +} + +int msg_putu(unsigned char *out, unsigned long long val) { +	if (val <= 4294967295) return msg_putu32(out, val); +	out[0] = 0xCF; +	doput64(out, val); +	return 9; +} + +static inline unsigned int floatbits(float f) { +	return (union { float f; unsigned int i; }){f}.i; +} + +static inline unsigned long long doublebits(double d) { +	return (union { double d; unsigned long long i; }){d}.i; +} + +void msg_putf(unsigned char *out, float val) { +	out[0] = 0xCA; +	doput32(out, floatbits(val)); +} + +int msg_putd(unsigned char *out, double val) { +	// XXX: is this really the most efficient way to check this? +	float f = val; +	if ((double)f == val) { msg_putf(out, f); return 5; } +	out[0] = 0xCA; +	doput64(out, doublebits(val)); +	return 9; +} + +void msg_putssz5(unsigned char *out, int sz) { +	*out = 0xA0 | sz; +} + +int msg_putssz8(unsigned char *out, int sz) { +	if (sz < 64) { msg_putssz5(out, sz); return 1; } +	out[0] = 0xD9; +	out[1] = sz; +	return 2; +} + +int msg_putssz16(unsigned char *out, int sz) { +	if (sz < 256) return msg_putssz8(out, sz); +	out[0] = 0xDA; +	doput16(out, sz); +	return 3; +} + +int msg_putssz(unsigned char *out, unsigned int sz) { +	if (sz < 65536) return msg_putssz16(out, sz); +	out[0] = 0xDB; +	doput32(out, sz); +	return 5; +} + +void msg_putbsz8(unsigned char *out, int sz) { +	out[0] = 0xC4; +	out[1] = sz; +} + +int msg_putbsz16(unsigned char *out, int sz) { +	if (sz < 256) { msg_putbsz8(out, sz); return 2; } +	out[0] = 0xC5; +	doput16(out, sz); +	return 2 + sz; +} + +int msg_putbsz(unsigned char *out, unsigned int sz) { +	if (sz < 65536) return msg_putbsz16(out, sz); +	out[0] = 0xC6; +	doput32(out, sz); +	return 5; +} + +void msg_putasz4(unsigned char *out, int sz) { +	*out = 0x90 | sz; +} + +int msg_putasz16(unsigned char *out, int sz) { +	if (sz < 32) { msg_putasz4(out, sz); return 1; } +	out[0] = 0xDC; +	doput16(out, sz); +	return 3; +} + +int msg_putasz(unsigned char *out, unsigned int sz) { +	if (sz < 65536) return msg_putasz16(out, sz); +	out[0] = 0xDD; +	doput32(out, sz); +	return 5; +} + +void msg_putmsz4(unsigned char *out, int sz) { +	*out = 0x80 | sz; +} + +int msg_putmsz16(unsigned char *out, int sz) { +	if (sz < 32) { msg_putmsz4(out, sz); return 1; } +	out[0] = 0xDE; +	doput16(out, sz); +	return 3; +} + +int msg_putmsz(unsigned char *out, unsigned int sz) { +	if (sz < 65536) return msg_putmsz16(out, sz); +	out[0] = 0xDF; +	doput32(out, sz); +	return 5; +} + +// vi: sw=4 ts=4 noet tw=80 cc=80 diff --git a/src/chunklets/msg.h b/src/chunklets/msg.h new file mode 100644 index 0000000..b85bde3 --- /dev/null +++ b/src/chunklets/msg.h @@ -0,0 +1,350 @@ +/* + * Copyright © 2023 Michael Smith <mikesmiffy128@gmail.com> + * + * Permission to use, copy, modify, and/or distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH + * REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY + * AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, + * INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM + * LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR + * OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR + * PERFORMANCE OF THIS SOFTWARE. + */ + +#ifndef INC_CHUNKLETS_MSG_H +#define INC_CHUNKLETS_MSG_H + +#ifdef __cplusplus +#define _msg_Bool bool +extern "C" { +#else +#define _msg_Bool _Bool +#endif + +/* + * Writes a nil (null) message to the buffer out. Always writes a single byte. + * + * out must point to at least 1 byte. + */ +void msg_putnil(unsigned char *out); + +/* + * Writes the boolean val to the buffer out. Always writes a single byte. + * + * out must point to at least 1 byte. + */ +void msg_putbool(unsigned char *out, _msg_Bool val); + +/* + * Writes the integer val in the range [-32, 127] to the buffer out. Values + * outside this range will produce an undefined encoding. Always writes a single + * byte. + * + * out must point to at least 1 byte. + * + * It is recommended to use msg_puts() for arbitrary signed values or msg_putu() + * for arbitrary unsigned values. Those functions will produce the smallest + * possible encoding for any value. + */ +void msg_puti7(unsigned char *out, signed char val); + +/* + * Writes the signed int val in the range [-128, 127] to the buffer out. + * + * out must point to at least 2 bytes. + * + * Returns the number of bytes written, one of {1, 2}. + * + * It is recommended to use msg_puts() for arbitrary signed values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_puts8(unsigned char *out, signed char val); + +/* + * Writes the unsigned int val in the range [0, 255] to the buffer out. + * + * out must point to at least 2 bytes. + * + * Returns the number of bytes written, one of {1, 2}. + * + * It is recommended to use msg_putu() for arbitrary unsigned values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_putu8(unsigned char *out, unsigned char val); + +/* + * Writes the signed int val in the range [-65536, 65535] to the buffer out. + * + * out must point to at least 3 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3}. + * + * It is recommended to use msg_puts() for arbitrary signed values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_puts16(unsigned char *out, short val); + +/* + * Writes the unsigned int val in the range [0, 65536] to the buffer out. + * + * out must point to at least 3 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3}. + * + * It is recommended to use msg_putu() for arbitrary unsigned values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_putu16(unsigned char *out, unsigned short val); + +/* + * Writes the signed int val in the range [-2147483648, 2147483647] to the + * buffer out. + * + * out must point to at least 5 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3, 5}. + * + * It is recommended to use msg_puts() for arbitrary signed values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_puts32(unsigned char *out, int val); + +/* + * Writes the unsigned int val in the range [0, 4294967295] to the buffer out. + * + * out must point to at least 5 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3, 5}. + * + * It is recommended to use msg_putu() for arbitrary unsigned values. That + * function will produce the smallest possible encoding for any value. + */ +int msg_putu32(unsigned char *out, unsigned int val); + +/* + * Writes the signed int val in the range [-9223372036854775808, + * 9223372036854775807] to the buffer out. + * + * out must point to at least 9 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3, 5, 9}. + */ +int msg_puts(unsigned char *out, long long val); + +/* + * Writes the unsigned int val in the range [0, 18446744073709551616] to the + * buffer out. + * + * out must point to at least 9 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3, 5, 9}. + */ +int msg_putu(unsigned char *out, unsigned long long val); + +/* + * Writes the IEEE 754 single-precision float val to the buffer out. Always + * writes 5 bytes. + * + * out must point to at least 5 bytes. + */ +void msg_putf(unsigned char *out, float val); + +/* + * Writes the IEEE 754 double-precision float val to the buffer out. + * + * out must point to at least 9 bytes. + * + * Returns the number of bytes written, one of {5, 9}. + */ +int msg_putd(unsigned char *out, double val); + +/* + * Writes the string size sz in the range [0, 15] to the buffer out. Values + * outside this range will produce an undefined encoding. Always writes a single + * byte. + * + * In a complete message stream, a size of N must be immediately followed by N + * bytes of the actual string, which must be valid UTF-8. + * + * out must point to at least 1 byte. + * + * It is recommended to use msg_putssz() for arbitrary string sizes. That + * function will produce the smallest possible encoding for any size value. + */ +void msg_putssz5(unsigned char *out, int sz); + +/* + * Writes the string size sz in the range [0, 255] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by N + * bytes of the actual string, which must be valid UTF-8. + * + * out must point to at least 2 bytes. + * + * It is recommended to use msg_putssz() for arbitrary string sizes. That + * function will produce the smallest possible encoding for any size value. + */ +int msg_putssz8(unsigned char *out, int sz); + +/* + * Writes the string size sz in the range [0, 65535] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by N + * bytes of the actual string, which must be valid UTF-8. + * + * out must point to at least 3 bytes. + * + * It is recommended to use msg_putssz() for arbitrary string sizes. That + * function will produce the smallest possible encoding for any size value. + */ +int msg_putssz16(unsigned char *out, int sz); + +/* + * Writes the string size sz in the range [0, 4294967295] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by N + * bytes of the actual string, which must be valid UTF-8. + * + * out must point to at least 5 bytes. + */ +int msg_putssz(unsigned char *out, unsigned int sz); + +/* + * Writes the binary blob size sz in the range [0, 255] to the buffer out. + * Always writes 2 bytes. + * + * In a complete message stream, a size of N must be immediately followed by + * N bytes of the actual data. + * + * out must point to at least 2 bytes. + * + * It is recommended to use msg_putbsz() for arbitrary binary blob sizes. That + * function will produce the smallest possible encoding for any size value. + */ +void msg_putbsz8(unsigned char *out, int sz); + +/* + * Writes the binary blob size sz in the range [0, 65535] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by + * N bytes of the actual data. + * + * out must point to at least 3 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3}. + * + * It is recommended to use msg_putbsz() for arbitrary binary blob sizes. That + * function will produce the smallest possible encoding for any size value. + */ +int msg_putbsz16(unsigned char *out, int sz); + +/* + * Writes the binary blob size sz in the range [0, 4294967295] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by + * N bytes of the actual data. + * + * out must point to at least 5 bytes. + * + * Returns the number of bytes written, one of {1, 2, 3, 5}. + */ +int msg_putbsz(unsigned char *out, unsigned int sz); + +/* + * Writes the array size sz in the range [0, 15] to the buffer out. Values + * outside this range will produce an undefined encoding. Always writes a single + * byte. + * + * In a complete message stream, a size of N must be immediately followed by N + * other messages, which form the contents of the array. + * + * out must point to at least 1 byte. + * + * It is recommended to use msg_putasz() for arbitrary array sizes. That + * function will produce the smallest possible encoding for any size value. + */ +void msg_putasz4(unsigned char *out, int sz); + +/* + * Writes the array size sz in the range [0, 65535] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by N + * other messages, which form the contents of the array. + * + * out must point to at least 3 bytes. + * + * Returns the number of bytes written, one of {1, 3}. + * + * It is recommended to use msg_putasz() for arbitrary array sizes. That + * function will produce the smallest possible encoding for any size value. + */ +int msg_putasz16(unsigned char *out, int sz); + +/* + * Writes the array size sz in the range [0, 4294967295] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by N + * other messages, which form the contents of the array. + * + * out must point to at least 5 bytes. + * + * Returns the number of bytes written, one of {1, 3, 5}. + */ +int msg_putasz(unsigned char *out, unsigned int sz); + +/* + * Writes the map size sz in the range [0, 15] to the buffer out. Values + * outside this range will produce an undefined encoding. Always writes a single + * byte. + * + * In a complete message stream, a size of N must be immediately followed by + * N * 2 other messages, which form the contents of the map as keys followed by + * values in alternation. + * + * out must point to at least 1 byte. + * + * It is recommended to use msg_putmsz() for arbitrary map sizes. That function + * will produce the smallest possible encoding for any size value. + */ +void msg_putmsz4(unsigned char *out, int sz); + +/* + * Writes the array size sz in the range [0, 65536] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by + * N * 2 other messages, which form the contents of the map as keys followed by + * values in alternation. + * + * out must point to at least 3 bytes. + * + * Returns the number of bytes written, one of {1, 3}. + * + * It is recommended to use msg_putmsz() for arbitrary map sizes. That function + * will produce the smallest possible encoding for any size value. + */ +int msg_putmsz16(unsigned char *out, int sz); + +/* + * Writes the array size sz in the range [0, 4294967295] to the buffer out. + * + * In a complete message stream, a size of N must be immediately followed by + * N * 2 other messages, which form the contents of the map as keys followed by + * values in alternation. + * + * out must point to at least 5 bytes. + * + * Returns the number of bytes written, one of {1, 3, 5}. + */ +int msg_putmsz(unsigned char *out, unsigned int sz); + +#ifdef __cplusplus +} +#endif +#undef _msg_Bool + +#endif + +// vi: sw=4 ts=4 noet tw=80 cc=80 | 
