Table of contents
Parser Monad
The Parser from Z.Data.Parser is designed for high performance resumable binary parsing and simple textual parsing, such as network protocols, JSON, etc. Write a parser by using basic parsers from Z.Data.Parser such as takeWhile, int, etc.
import qualified Z.Data.Parser as P
import Z.Data.ASCII
data Date = Date { year :: Int, month :: Int, day :: Int } deriving Show
dateParser :: P.Parser Date
dateParser = do
y <- P.int
P.word8 HYPHEN
m <- P.int
P.word8 HYPHEN
d <- P.int
return $ Date y m d
Parser in Z works directly on Bytes:
> P.parse' dateParser "2020-12-12"
Date 2020 12 12
> P.parse' dateParser "2020-JAN-12"
Left ["Z.Data.Parser.Numeric.int","Z.Data.Parser.Base.takeWhile1: no satisfied byte at [74,65,78,45,49,50]"]
> P.parse dateParser "2020-12-12, 08:00"
([44,32,48,56,58,48,48], Right (Date {year = 2020, month = 12, day = 12}))
> P.parseChunk dateParser "2020-"
Partial _
> let (P.Partial f) = P.parseChunk dateParser "2020-"
> let (P.Partial f') = f "05-05" -- incrementally provide input
> f' "" -- push empty chunk to signal EOF
Success Date {year = 2020, month = 5, day = 5}
Binary protocol can use decodePrim/decodePrimLE/decodePrimBE with TypeApplications extension, let’s say you want to implement a MessagePack str format parser:
import Data.Bits
import Data.Word
import qualified Z.Data.Parser as P
import qualified Z.Data.Text as T
msgStr :: P.Parser T.Text
msgStr = do
tag <- P.anyWord8
case tag of
t | t .&. 0xE0 == 0xA0 -> str (t .&. 0x1F)
0xD9 -> str =<< P.anyWord8
0xDA -> str =<< P.decodePrimBE @Word16
0xDB -> str =<< P.decodePrimBE @Word32
_ -> P.fail' "unknown tag"
where
str !l = do
bs <- P.take (fromIntegral l)
case T.validateMaybe bs of
Just t -> return (Str t)
_ -> P.fail' "illegal UTF8 Bytes"
Comparing to parsec or megaparsec, Parser in Z provides limited error reporting, and do not support using as a monad transformer. But provides an instance of PrimMonad, which allows some limited effects, such as mutable variables and array operations.
Auto Backtracked Alternative
Similar to attoparsec, Parser in Z always backtrack when used with <|> (Alternative instance), that means the failed branch will not consume any input without doing anything special:
import Control.Applicative
...
p = fooParser <|> barParser <|> quxParser
In above code, if any parser failed, the next parser is retried from the beginning of the input. Backtracking is not always needed though, it recommended to use peek or peekMaybe if the syntax or protocol can be parsed as LL(1) grammer since it’s faster than backtracking.
Builder Monad
The Builder from Z.Data.Builder is the reverse process of parsing, i.e. writing Haskell data types to Bytes, aka Writer monad. The usage is very similiar to Parser:
import qualified Z.Data.Builder as B
import Z.Data.ASCII
data Date = Date { year :: Int, month :: Int, day :: Int } deriving Show
dataBuilder :: Date -> B.Builder ()
dataBuilder (Date y m d) = do
int' y
B.word8 HYPHEN
int' m
B.word8 HYPHEN
int' d
where
int' x | x > 10 = B.int x
| otherwise = B.word8 DIGIT_0 >> B.int x
Underhood a Builder records a buffer writing function, thus can be composed quickly. Use build/buildText to run a Builder, which produces Bytes and Text respectively:
> B.build (dataBuilder $ Date 2020 11 1)
[50,48,50,48,45,49,49,45,48,49]
> B.buildText (dataBuilder $ Date 2020 11 1)
"2020-11-01"
Binary Builder can be constructed with encodePrim/encodePrimLE/encodePrimBE, let’s still take MessagePack str format as an example:
import Data.Bits
import Data.Word
import qualified Z.Data.Builder as B
import qualified Z.Data.Text as T
import qualified Z.Data.Vector as V
msgStr :: T.Text -> B.Builder ()
msgStr t = do
let bs = T.getUTF8Bytes t
case V.length bs of
len | len <= 31 -> B.word8 (0xA0 .|. fromIntegral len)
| len < 0x100 -> B.encodePrim (0xD9 :: Word8, fromIntegral len :: Word8)
| len < 0x10000 -> B.encodePrim (0xDA :: Word8, BE (fromIntegral len :: Word16))
| otherwise -> B.encodePrim (0xDB :: Word8, BE (fromIntegral len :: Word32))
B.bytes bs
Note that we directly use Unalign a, Unalign b => Unalign (a, b) instance to write serveral primitive types in a row, The Unalign class provide basic reading and writing facilities to read primitive types from and to raw bytes(with unaligned offset).
Text formatting with Builder
Different from other standard libraries which usually provide printf or similar, in Z directly using Builder to format text is recommended:
-- Similar to print("The result are %d, %d", x, y)
-- If you can ensure all Builders will write UTF-8 encoded bytes,
-- you can use unsafeBuildText to save a validation
B.unsafeBuildText $ do
"The result are " >> B.double x >> ", " >> B.double y
-- Or use do syntax
B.unsafeBuildText $ do
"The result are "
B.double x
", "
B.double y
...
The strength of monadic Builder is that you can reuse all control structure from Control.Monad, such as conditions, loops, etc. Builder () has an IsString instance which can wrap writing literals in UTF-8 encoding, with some modifications:
\NULwill be written as\xC0\x80.\xD800~\xDFFFwill be encoded in three bytes as normal UTF-8 codepoints.
It’s safe to put an string literal inside a unsafeBuildText as long as you don’t write \0 or \55296 ~ \57343.