Invertible Parsing: The Problem

Episode #178 • Feb 21, 2022 • Subscriber-Only

We’ve spent many episodes discussing parsing, which turns nebulous blobs of data into well-structured data, but sometimes we need the “inverse” process to turn well-structured data back into nebulous data. This is called “printing” and can be useful for serialization, URL routing and more. This week we begin a journey to build a unified, composable framework for parsers and printers.

Previous episode
Invertible Parsing: The Problem
Next episode
Locked

Unlock This Episode

Our Free plan includes 1 subscriber-only episode of your choice, plus weekly updates from our newsletter.

Sign in with GitHub

Introduction

We are extremely excited to start this next series of episodes. This topic is more than 4 years in the making. We first started experimenting with these concepts before even launching Point-Free and we’ve been developing, re-developing and refining them over and over and over ever since.

Our series of episodes on parsers is some of our most popular content, second only to our Composable Architecture episodes. People in the community have built some really amazing tools with the parsing library we open sourced over a year ago, including music notation parsers, math expression parsers, XML parsers, and even full-on programming language parsers.

Although we have gone really deep into parser topics, including composition, performance, low-level string processing, API ergonomics, error handling, and more, there’s still a huge topic that we haven’t touched upon at all. When building a parser we spend a lot of time getting to know all of the subtleties and edge cases of the input we are trying to parse. This can actually be quite a significant amount of work, and when it’s all done it is quite satisfying to see it turn nebulous blobs of input values into well-structured output values.

But, often you need to turn your well-structured data back into the nebulous blob of data. We call this inverse process “printing”.

For example, after much work you may come up with a way to parse a custom textual format into some well-structured Swift data type that is used in your application. But then, at a later time, you decide to add some editing capabilities to the application. The instant we do this we are confronted with the problem of how to turn values of your first-class Swift data type back into an unstructured string so that we can save that data back to disk or send it to a server.

Many of Apple’s parsers work this way too. Not only can you decode nebulous JSON data into a first-class type, but you can also encode values of the type back into JSON data. Further, not only can date and number formatters parse a string into a date or number, but they can also turn those values back into a formatted string. So even Apple’s APIs recognize that once you have accomplished a parsing problem there is a corresponding printing or formatting problem left to be solved.

So, the problem of trying to reverse the process of parsing is important enough that we should definitely discuss it. Turns out, we can develop the theory of printing in parallel to the theory of parsing. It allows us to tackle the problem of printing in small, bite-sized units and then piece together all the parts to form a large, complex printer.

That already sounds cool, but the most amazing part of this story is that we can build parsers and printers simultaneously. If you are careful enough, then the very act of describing your parser will simultaneously provide you with a printer such that if you parse and then print you get roughly the “same” thing you started with, and similarly if you print and then parse.

The need for printers


References

  • Invertible syntax descriptions: Unifying parsing and pretty printing
    Tillmann Rendel and Klaus Ostermann • Sep 30, 2010
    Note

    Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program using this interface. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms.

    This publication (from 2010!) was the initial inspiration for our parser-printer explorations, and a much less polished version of the code was employed on the Point-Free web site on day one of our launch!

  • Unified Parsing and Printing with Prisms
    Fraser Tweedale • Apr 29, 2016
    Note

    Parsers and pretty printers are commonly defined as separate values, however, the same essential information about how the structured data is represented in a stream must exist in both values. This is therefore a violation of the DRY principle – usually quite an obvious one (a cursory glance at any corresponding FromJSON and ToJSON instances suffices to support this fact). Various methods of unifying parsers and printers have been proposed, most notably Invertible Syntax Descriptions due to Rendel and Ostermann (several Haskell implementations of this approach exist).

    Another approach to the parsing-printing problem using a construct known as a “prism” (a construct Point-Free viewers and library users may better know as a “case path”).

Downloads

Get started with our free plan

Our free plan includes 1 subscriber-only episode of your choice, access to 68 free episodes with transcripts and code samples, and weekly updates from our newsletter.

View plans and pricing