Episode #127: Parsing and Performance: Strings

Introduction

Over the past many weeks we have built up a pretty impressive parser library. Our parser is a very general type that allows you to parse any kind of nebulous input into any kind of well-structured output. It supports lots of interesting forms of composition that allow you to break large problems down into smaller ones.

0:24

On top of all of that we were able to build up an impressive library of parsers and higher-order parsers that work on strings. They allowed us to scan values off the fronts of strings in an efficient manner, such as characters, numbers, prefixes and more. And these parsers dealt with a lower-level API than just plain String, called Substring. A Substring is like a view into a string. It’s not the actual string itself, but rather just a representation of a portion of a base string that is stored somewhere else. This means we can consume little bits off the front of a substring while only changing our view of the underlying string, which is a very lightweight thing to do. If we were dealing with raw Strings then we would need to make a whole copy of the string, which can be a very expensive operation.

1:10

So it seems that maybe we are ready to close the book on parsers and open source it! But not so fast. There is a very important topic to consider, especially when it comes to parsers, which is performance. Sometimes we need to parse megabytes or even gigabytes of data, and we need to be as efficient as possible when it comes to scanning the input. And although we have taken a huge step by using Substrings instead of Strings, it turns out there is a lot more we can do.

1:37

We want to start off by giving everyone a quick deep dive into Swift strings and their performance characteristics. It’s a tricky subject, and there are a few subtle edge cases to think about, but once that’s done we will find that there is an even lower level representation of strings for which parsing is even more efficient than Substring. It’s really quite amazing to see, but it also kind of opens up a whole can of worms that requires more work to wrangle in.

Parsing and Performance: Strings

Previous episode

Generalized Parsing: Part 3

Parsing and Performance: Strings

References

Downloads

Next episode

Parsing and Performance: Combinators

Unlock This Episode

Introduction

String vs. substring

References

swift-benchmark

UTF-8

Strings in Swift 4

Swiftʼs Collection Types

Improve performance of Collection.removeFirst(_:) where Self == SubSequence

Downloads

Sample code

Get started with our free plan