Projects
Below is a curated list of my personal projects and open source contributions.
Personal Projects
strcase
Case-insensitive and Unicode aware implementation of the Go standard library’s strings and bytes packages that is fast, accurate, and never allocates memory. Unicode simple folding is used for all matching.
Highly optimized for amd64 (x86-64) and arm64 with assembly implementations of common functions. Vastly outperforms the standard library due to its use of multiplicative lookup tables (at the cost of a slightly larger package / DATA section).
This project has been an absolute labor of love (and a complete monster). Work on it has driven two changes to the Go standard library that have improved the performance of searching for rune (CL 539116) and looking up the Unicode-defined simple case fold of a rune (CL 539116).
fastwalk
The fastest directory traversal library for Go.
Package fastwalk provides a fast parallel version of filepath.WalkDir that is ~2x faster on macOS, ~4x faster on Linux, ~6x faster on Windows, allocates 50% less memory, and requires 25% fewer memory allocations.
As of now, this is my most widely used library and is used by the popular fzf CLI (64k stars on GitHub - for whatever that is worth).
panics
Package panics allows for panics to be safely handled and notified on. It provides helper functions that can safely handle and recover from unhandled panics and an API similar to os/signal for panic notifications. It additionally provides the same stringent notification guarantees as the os/signal package.
The goal of this package is to provide programs a way to centralize panic handling and thus coordinate an orderly shutdown once a panic is detected. It also allows for programs to control how panics and their associated stack trace are logged.
NOTE: I’m undecided as to whether this package is actually a good idea. It was created to replace a significantly more dangerous panic handling system at a prior employer. That said, it is well engineered, gracefully handles the case of a panic occurring at the moment a listener is de-registered, and therefor I’m choosing to showcase it here.
Open Source Contributions
Below is a curated list of my contributions to open source projects.
Golang
- byte,strings: improve IndexRune
performance by ~45%: 539116
- This change improved the performance of searching for a UTF-8 encoded rune (Unicode point) by leveraging the fact that the last byte of a multi-byte rune is significantly more unique than the first byte which has a 78% chance of being: 240, 243, or 244.
- This change was motivated by discoveries made while working on my strcase library.
- unicode: improve SimpleFold performance by 2x for non-foldable code points: 454958
- This change improved the performance by combining the binary search for upper/lower case conversions. Also, motivated by my work on my strcase library.
- bytes, strings: add ASCII fast path to EqualFold: 425459
Mongo Driver
The following PRs were created while I was an employee of MongoDB and were motivated by my frustration that our library for encoding/decoding BSON was up to 6x slower than encoding/decoding JSON using the the Go standard library’s encoding/json package, which has to do considerably more work. During this work I also discovered that the Mongo driver was accidentally serializing all requests when zlib compression was being used (which is fairly common).
Related to this, I feel that DB driver performance and code quality are used by developers when judging the merits of the DB itself, and encoding/decoding performance can have an outsize effect on the overall performance of the system.
- bson: remove use of reflect.Value.MethodByName: #1308
- Any use of
reflect.Value.MethodByName
prevents the Go gc compiler from performing dead code elimination, which can lead to significantly larger binaries.
- Any use of
- bson: improve marshal/unmarshal performance by ~58% and ~29%: #1313
- Result of identifying that lock contention was significant bottleneck.
- bsoncodec/bsonrw: eliminate encoding allocations: #1323
- Eliminated allocations during encoding (which matches the encoding/json package) and fixed a few leaks related to the pooling of re-usable buffers.
- bson/primitive: improve DateTime and ObjectID JSON performance: #1529
- x/mongo/driver: enable parallel zlib compression and improve zstd decompression: #1320
- Fixed a bug where zlib compression was serialized across all goroutines and improved zstd decompression by used a pool of zstd decoders.
FZF
I’m one of the top 10 contributors to the popular fzf
CLI. I have worked closely with it’s owner to improve performance and to help
fzf transition from using external programs (find
, dir
) to walk directories
to using my fastwalk library.
- Increased the performance of parsing ANSI terminal input by ~7.5x: #2368
- This increased parsing throughput from ~25MB/s to ~200MB/s which made
fzf
significantly more usable with large ANSI colored inputs.
- This increased parsing throughput from ~25MB/s to ~200MB/s which made
- Sped up ANSI escape sequence parsing by ~20%: #2927
- Further improvement on the above change.
- Enable cpu, mem, block, and mutex profiling: #2813
HTOP
- Darwin: lazily set process TTY name #93
- Fetching the TTY name was accounting for 95% of htop’s CPU usage on my Mac even when it was not being displayed.
- ProcessList: fix quadratic process removal when scanning #939
- Discovered this while working on a now abandoned rewrite of htop’s hashmap implementation.
CoreDNS
The below changes were the result of investigating some DNS performance issues while working at Lyft.