Parsing and Performance: Strings

Episode #127 • Nov 30, 2020 • Subscriber-Only

We want to explore the performance of composable parsers, but to do so we must first take a deep dive into the Swift string API. There are multiple abstractions of strings in Swift, each with its own benefits and performance characteristics. We will benchmark them in order to get a scientific basis for comparison, and will describe how to properly write a benchmark.

Previous episode
Parsing and Performance: Strings
Next episode
Locked

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Sign in with GitHub

Introduction

Over the past many weeks we have built up a pretty impressive parser library. Our parser is a very general type that allows you to parse any kind of nebulous input into any kind of well-structured output. It supports lots of interesting forms of composition that allow you to break large problems down into smaller ones.

On top of all of that we were able to build up an impressive library of parsers and higher-order parsers that work on strings. They allowed us to scan values off the fronts of strings in an efficient manner, such as characters, numbers, prefixes and more. And these parsers dealt with a lower-level API than just plain String, called Substring. A Substring is like a view into a string. It’s not the actual string itself, but rather just a representation of a portion of a base string that is stored somewhere else. This means we can consume little bits off the front of a substring while only changing our view of the underlying string, which is a very lightweight thing to do. If we were dealing with raw Strings then we would need to make a whole copy of the string, which can be a very expensive operation.

So it seems that maybe we are ready to close the book on parsers and open source it! But not so fast. There is a very important topic to consider, especially when it comes to parsers, which is performance. Sometimes we need to parse megabytes or even gigabytes of data, and we need to be as efficient as possible when it comes to scanning the input. And although we have taken a huge step by using Substrings instead of Strings, it turns out there is a lot more we can do.

We want to start off by giving everyone a quick deep dive into Swift strings and their performance characteristics. It’s a tricky subject, and there are a few subtle edge cases to think about, but once that’s done we will find that there is an even lower level representation of strings for which parsing is even more efficient than Substring. It’s really quite amazing to see, but it also kind of opens up a whole can of worms that requires more work to wrangle in.

String vs. substring


References

Downloads

Get started with our free plan

Our free plan includes 1 subscriber-only episode of your choice, access to 68 free episodes with transcripts and code samples, and weekly updates from our newsletter.

View plans and pricing