Link Search Menu Expand Document

Table of contents

  1. Parser Monad
    1. Auto Backtracked Alternative
  2. Builder Monad
    1. Text formatting with Builder

Parser Monad

The Parser from Z.Data.Parser is designed for high performance resumable binary parsing and simple textual parsing, such as network protocols, JSON, etc. Write a parser by using basic parsers from Z.Data.Parser such as takeWhile, int, etc.

import qualified Z.Data.Parser as P
import Z.Data.ASCII 

data Date = Date { year :: Int, month :: Int, day :: Int } deriving Show

dateParser :: P.Parser Date
dateParser = do
    y <- P.int
    P.word8 HYPHEN 
    m <- P.int
    P.word8 HYPHEN 
    d <- P.int
    return $ Date y m d

Parser in Z works directly on Bytes:

> P.parse' dateParser "2020-12-12"
Date 2020 12 12
> P.parse' dateParser "2020-JAN-12"
Left ["Z.Data.Parser.Numeric.int","Z.Data.Parser.Base.takeWhile1: no satisfied byte at [74,65,78,45,49,50]"]
> P.parse dateParser "2020-12-12, 08:00"
([44,32,48,56,58,48,48], Right (Date {year = 2020, month = 12, day = 12}))
> P.parseChunk dateParser "2020-"
Partial _
> let (P.Partial f) = P.parseChunk dateParser "2020-"
> let (P.Partial f') = f "05-05"    -- incrementally provide input
> f' ""                             -- push empty chunk to signal EOF
Success Date {year = 2020, month = 5, day = 5}

Binary protocol can use decodePrim/decodePrimLE/decodePrimBE with TypeApplications extension, let’s say you want to implement a MessagePack str format parser:

import           Data.Bits
import           Data.Word
import qualified Z.Data.Parser as P
import qualified Z.Data.Text   as T

msgStr :: P.Parser T.Text
msgStr = do
    tag <- P.anyWord8
    case tag of
        t | t .&. 0xE0 == 0xA0 -> str (t .&. 0x1F)
        0xD9 -> str =<< P.anyWord8
        0xDA -> str =<< P.decodePrimBE @Word16
        0xDB -> str =<< P.decodePrimBE @Word32
        _    -> P.fail' "unknown tag"
  where
    str !l = do
        bs <- P.take (fromIntegral l)
        case T.validateMaybe bs of
            Just t -> return (Str t)
            _  -> P.fail' "illegal UTF8 Bytes"

Comparing to parsec or megaparsec, Parser in Z provides limited error reporting, and do not support using as a monad transformer. But provides an instance of PrimMonad, which allows some limited effects, such as mutable variables and array operations.

Auto Backtracked Alternative

Similar to attoparsec, Parser in Z always backtrack when used with <|> (Alternative instance), that means the failed branch will not consume any input without doing anything special:

import Control.Applicative
...
p = fooParser <|> barParser <|> quxParser

In above code, if any parser failed, the next parser is retried from the beginning of the input. Backtracking is not always needed though, it recommended to use peek or peekMaybe if the syntax or protocol can be parsed as LL(1) grammer since it’s faster than backtracking.

Builder Monad

The Builder from Z.Data.Builder is the reverse process of parsing, i.e. writing Haskell data types to Bytes, aka Writer monad. The usage is very similiar to Parser:

import qualified Z.Data.Builder as B
import Z.Data.ASCII 

data Date = Date { year :: Int, month :: Int, day :: Int } deriving Show

dataBuilder :: Date -> B.Builder ()
dataBuilder (Date y m d) = do
    int' y
    B.word8 HYPHEN 
    int' m
    B.word8 HYPHEN 
    int' d
  where
    int' x | x > 10    = B.int x
           | otherwise = B.word8 DIGIT_0 >> B.int x

Underhood a Builder records a buffer writing function, thus can be composed quickly. Use build/buildText to run a Builder, which produces Bytes and Text respectively:

> B.build (dataBuilder $ Date 2020 11 1)
[50,48,50,48,45,49,49,45,48,49]
> B.buildText (dataBuilder $ Date 2020 11 1)
"2020-11-01"

Binary Builder can be constructed with encodePrim/encodePrimLE/encodePrimBE, let’s still take MessagePack str format as an example:

import           Data.Bits
import           Data.Word
import qualified Z.Data.Builder as B
import qualified Z.Data.Text    as T
import qualified Z.Data.Vector  as V

msgStr :: T.Text -> B.Builder ()
msgStr t = do
    let bs = T.getUTF8Bytes t
    case V.length bs of
        len | len <= 31      ->  B.word8 (0xA0 .|. fromIntegral len)
            | len < 0x100    ->  B.encodePrim (0xD9 :: Word8, fromIntegral len :: Word8)
            | len < 0x10000  ->  B.encodePrim (0xDA :: Word8, BE (fromIntegral len :: Word16))
            | otherwise      ->  B.encodePrim (0xDB :: Word8, BE (fromIntegral len :: Word32))
    B.bytes bs

Note that we directly use Unalign a, Unalign b => Unalign (a, b) instance to write serveral primitive types in a row, The Unalign class provide basic reading and writing facilities to read primitive types from and to raw bytes(with unaligned offset).

Text formatting with Builder

Different from other standard libraries which usually provide printf or similar, in Z directly using Builder to format text is recommended:

-- Similar to print("The result are %d, %d", x, y)
-- If you can ensure all Builders will write UTF-8 encoded bytes,
-- you can use unsafeBuildText to save a validation

B.unsafeBuildText $ do
    "The result are " >> B.double x >> ", " >> B.double y

-- Or use do syntax

B.unsafeBuildText $ do
    "The result are " 
    B.double x 
    ", " 
    B.double y
...

The strength of monadic Builder is that you can reuse all control structure from Control.Monad, such as conditions, loops, etc. Builder () has an IsString instance which can wrap writing literals in UTF-8 encoding, with some modifications:

  • \NUL will be written as \xC0\x80.
  • \xD800 ~ \xDFFF will be encoded in three bytes as normal UTF-8 codepoints.

It’s safe to put an string literal inside a unsafeBuildText as long as you don’t write \0 or \55296 ~ \57343.