Episode #128: Parsing and Performance: Combinators

Parsing and Performance: Combinators

Episode #128 • Dec 7, 2020 • Subscriber-Only

We convert some of our substring parsers to work on lower levels of String abstractions, and unlock huge performance gains. Even better, thanks to our generalized parser we can even piece together multiple parsers that work on different abstraction levels, maximizing performance in the process.

Previous episode

Parsing and Performance: Strings

Parsing and Performance: Combinators

Introduction
0:05
Benchmarking a simple parser
1:13
Benchmarking a complex parser
17:30
Parsing multiple abstraction levels
33:46
Parsing across abstraction levels
40:47
Benchmarking an even more complex parser
46:28
Next time: even more performance?
49:50

References

Downloads

Next episode

Parsing and Performance: Protocols

Locked

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Sign in with GitHub

Introduction

So there really are quite substantial performance gains to be had by dropping to lower and lower abstractions levels. The biggest gain is just by using substring over string, but even using unicode scalars and UTF-8 are big enough to consider using it if possible.

Now how do we apply what we’ve learned to parsers?

Well, so far all of our string parsers have been defined on Substring, and we’ve used lots of string APIs such as removeFirst, prefix and range subscripting. As we have just seen in very clear terms, these operations can be a little slow on Substring because of the extra work that must be done to properly handle traversing over grapheme clusters and normalized characters. The time differences may not seem huge, measured in just a few microseconds, but if you are parsing a multi-megabyte file that can really add up.

So, let’s see what kind of performance gains can be had by switching some of our parsers to work with UTF-8 instead of Substring.

Benchmarking a simple parser

References

swift-benchmark
Google • Mar 13, 2020
A Swift library for benchmarking code snippets, similar to google/benchmark.
http://github.com/google/swift-benchmark
UTF-8
Michael Ilseman • Mar 20, 2019
Swift 5 made a fundamental change to the String API, making the preferred encoding UTF-8 instead of UTF-16. This brings many usability and performance improves to Swift strings.
https://swift.org/blog/utf8-string/
Strings in Swift 4
Ole Begemann • Nov 27, 2017
An excerpt from the Advanced Swift that provides a deep discussion of the low-level representations of Swift strings. Although it pre-dates the transition of strings to UTF-8 in Swift 5 it is still a factually correct accounting of how to work with code units in strings.
https://oleb.net/blog/2017/11/swift-4-strings/
Improve performance of Collection.removeFirst(_:) where Self == SubSequence
Stephen Celis • Jul 28, 2020
While researching the string APIs for this episode we stumbled upon a massive inefficiency in how Swift implements removeFirst on certain collections. This PR fixes the problem and turns the method from an O(n) operation (where n is the length of the array) to an O(k) operation (where k is the number of elements being removed).
https://github.com/apple/swift/pull/32451

Downloads

Sample code

0128-parsing-performance-pt2

Get started with our free plan

Our free plan includes 1 subscriber-only episode of your choice, access to 72 free episodes with transcripts and code samples, and weekly updates from our newsletter.

Sign up for free →

View plans and pricing