Unlock This Episode
Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.
Introduction
Over the past many weeks we have built up a pretty impressive parser library. Our parser is a very general type that allows you to parse any kind of nebulous input into any kind of well-structured output. It supports lots of interesting forms of composition that allow you to break large problems down into smaller ones.
On top of all of that we were able to build up an impressive library of parsers and higher-order parsers that work on strings. They allowed us to scan values off the fronts of strings in an efficient manner, such as characters, numbers, prefixes and more. And these parsers dealt with a lower-level API than just plain String
, called Substring
. A Substring
is like a view into a string. It’s not the actual string itself, but rather just a representation of a portion of a base string that is stored somewhere else. This means we can consume little bits off the front of a substring while only changing our view of the underlying string, which is a very lightweight thing to do. If we were dealing with raw String
s then we would need to make a whole copy of the string, which can be a very expensive operation.
So it seems that maybe we are ready to close the book on parsers and open source it! But not so fast. There is a very important topic to consider, especially when it comes to parsers, which is performance. Sometimes we need to parse megabytes or even gigabytes of data, and we need to be as efficient as possible when it comes to scanning the input. And although we have taken a huge step by using Substring
s instead of String
s, it turns out there is a lot more we can do.
We want to start off by giving everyone a quick deep dive into Swift strings and their performance characteristics. It’s a tricky subject, and there are a few subtle edge cases to think about, but once that’s done we will find that there is an even lower level representation of strings for which parsing is even more efficient than Substring
. It’s really quite amazing to see, but it also kind of opens up a whole can of worms that requires more work to wrangle in.
Subscribe to Point-Free
Access this episode, plus all past and future episodes when you become a subscriber.
Already a subscriber? Log in
References
swift-benchmark
Google • Friday Mar 13, 2020A Swift library for benchmarking code snippets, similar to google/benchmark.
UTF-8
Michael Ilseman • Wednesday Mar 20, 2019Swift 5 made a fundamental change to the String API, making the preferred encoding UTF-8 instead of UTF-16. This brings many usability and performance improves to Swift strings.
Strings in Swift 4
Ole Begemann • Monday Nov 27, 2017An excerpt from the Advanced Swift that provides a deep discussion of the low-level representations of Swift strings. Although it pre-dates the transition of strings to UTF-8 in Swift 5 it is still a factually correct accounting of how to work with code units in strings.
Swiftʼs Collection Types
Harshil Shah • Wednesday Aug 5, 2020In this episode we explored different representations of strings and their subsequences (i.e. Substring
, UnicodeScalarView
, and UTF8View
), but more generally there are collections and slices. This article gives a nice accounting of the zoo of types in Swift’s collections API.
Improve performance of Collection.removeFirst(_:) where Self == SubSequence
Stephen Celis • Tuesday Jul 28, 2020While researching the string APIs for this episode we stumbled upon a massive inefficiency in how Swift implements removeFirst
on certain collections. This PR fixes the problem and turns the method from an O(n)
operation (where n
is the length of the array) to an O(k)
operation (where k
is the number of elements being removed).