Generalized Parsing: Part 3

Episode #126 • Nov 23, 2020 • Subscriber-Only

Generalizing the parser type has allowed us to parse more types of inputs, but that is only scratching the surface. It also unlocks many new things that were previously impossible to see, including the ability to parse a stream of inputs and stream its output, making our parsers much more performant.

Previous episode
Generalized Parsing: Part 3
Next episode
Locked

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Sign in with GitHub

Introduction

This is pretty impressive. We add just 70 additional lines of parsers to our base library and we have unlocked the ability to very succinctly and expressively parse incoming requests so that we can route them to different parts of our application or web site. There are entire libraries out there devoted to this functionality, and yet here we have discovered it to be just a small corollary to having powerful, generalized parsing library available to us.

The only thing missing from this routing micro-library is a few more combinators for parsing the headers and body of the request, as well as a way to turn a nebulous URLRequest value into one of these RequestData values, but we will leave both of those things as exercises for the viewer.

So I think this is pretty incredible. We have massively generalized our parsing library, all of the parsers we previously wrote still compile and run just like before, but we can now perform parsing tasks on all new types of input that would have previously been impossible.

But as cool as all of that is, we still want to ask the all-important question that we ask at the end of every series of episodes on Point-Free: what’s the point? Because although we have generalized parsing we have also made it a little more complex. Not only do we have to think a bit harder when it comes to writing a general parser and have to have a bit of knowledge of generics and the Collection protocol, but we also sometimes have to give the compiler some extra hints in order for it to figure out the types.

So, is it worth this extra complexity? Instead of generalizing parsers should we have spent a little more time creating more robust parsers that perhaps could have handled the complexities of parsing a raw URLRequest rather than inventing the RequestData type and trying to parse it?

And we of course think its absolutely worth this extra complexity. Generalizing the parser signature has allowed us to parse all new types of input, but that’s only the beginning. The very act of generalizing has opened up all new possibilities that were previously impossible to see. For example:

  • With zero changes to our core parser type we can create a new parser operator that allows us to incrementally parse a stream of data coming in from an outside source, such as standard input, and even incrementally stream output to an outside source, such as a file, standard output, or anything. This can dramatically improve the performance of parsers that need to work on large data sets for which it is unreasonable to bring a large chunk of data into memory and parse it into a large array of data for processing.

  • So that’s incredible, but it gets better. By generalizing we can now see an all new form of composition that sits right next to our beloved map, zip and flatMap operators. This operator has the same shape that we have discovered on Point-Free time and time again, and it will be instrumental in allowing us to take a parser that works on a small piece of data and transform it into a parser that works on a larger, more complex piece of data.

  • And if that weren’t enough, things get even better. This new form of composition turns out to be the key to unlock a new tier of performance in our parsers. We can increase the performance of some of our parsers by as much as 5-10x with minimal changes to the parsers themselves, which makes their performance competitive with hand-rolled parsers and even beats the performance of Apple’s parser helpers, such as the Scanner type.

These are some really big claims we are making. We are saying that by simply generalizing the input type of our parsers we can unlock the ability to stream input into our parsers, uncover new forms of composition, and immediately improve the performance of our parsers, basically for free.

Parsing in a memory-efficient manner


Downloads

Get started with our free plan

Our free plan includes 1 subscriber-only episode of your choice, access to 68 free episodes with transcripts and code samples, and weekly updates from our newsletter.

View plans and pricing