Parsing and Performance: Combinators

Episode #128 • Dec 7, 2020 • Subscriber-Only

We convert some of our substring parsers to work on lower levels of String abstractions, and unlock huge performance gains. Even better, thanks to our generalized parser we can even piece together multiple parsers that work on different abstraction levels, maximizing performance in the process.

Previous episode
Parsing and Performance: Combinators
Next episode
Locked

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Sign in with GitHub

Introduction

So there really are quite substantial performance gains to be had by dropping to lower and lower abstractions levels. The biggest gain is just by using substring over string, but even using unicode scalars and UTF-8 are big enough to consider using it if possible.

Now how do we apply what we’ve learned to parsers?

Well, so far all of our string parsers have been defined on Substring, and we’ve used lots of string APIs such as removeFirst, prefix and range subscripting. As we have just seen in very clear terms, these operations can be a little slow on Substring because of the extra work that must be done to properly handle traversing over grapheme clusters and normalized characters. The time differences may not seem huge, measured in just a few microseconds, but if you are parsing a multi-megabyte file that can really add up.

So, let’s see what kind of performance gains can be had by switching some of our parsers to work with UTF-8 instead of Substring.

Benchmarking a simple parser


References

Downloads

Get started with our free plan

Our free plan includes 1 subscriber-only episode of your choice, access to 68 free episodes with transcripts and code samples, and weekly updates from our newsletter.

View plans and pricing