LLM docs

Parser

Parser

This package implements a basic Parser Combinator for Roc which is useful for transforming input into a more useful structure.

Example

For example, say we wanted to parse the following string from in to out:

in = "Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green"
out =
    {
        id: 1,
        requirements: [
            [Blue 3, Red 4],
            [Red 1, Green 2, Blue 6],
            [Green 2],
        ]
    }

We could do this using the following:

Requirement : [ Green U64, Red U64, Blue U64 ]
RequirementSet : List Requirement
Game : { id: U64, requirements: List RequirementSet }

parseGame : Str -> Result Game [ParsingError]
parseGame = \s ->
    green = const Green |> keep digits |> skip (string " green")
    red = const Red |> keep digits |> skip (string " red")
    blue = const Blue |> keep digits |> skip (string " blue")

    requirementSet : Parser _ RequirementSet
    requirementSet = (oneOf [green, red, blue]) |> sepBy (string ", ")

    requirements : Parser _ (List RequirementSet)
    requirements = requirementSet |> sepBy (string "; ")

    game : Parser _ Game
    game =
        const (\id -> \r -> { id, requirements: r })
        |> skip (string "Game ")
        |> keep digits
        |> skip (string ": ")
        |> keep requirements

    when parseStr game s is
        Ok g -> Ok g
        Err (ParsingFailure _) | Err (ParsingIncomplete _) -> Err ParsingError

Parser input a

Opaque type for a parser that will try to parse an a from an input.

As such, a parser can be considered a recipe for a function of the type

input -> Result {val: a, input: input} [ParsingFailure Str]

How a parser is actually implemented internally is not important and this might change between versions; for instance to improve efficiency or error messages on parsing failures.

ParseResult input a

ParseResult input a : Result { val : a, input : input } [ParsingFailure Str]

buildPrimitiveParser : (input -> ParseResult input a) -> Parser input a

Write a custom parser without using provided combintors.

parsePartial : Parser input a, input -> ParseResult input a

Most general way of running a parser.

Can be thought of as turning the recipe of a parser into its actual parsing function and running this function on the given input.

Moat parsers consume part of input when they succeed. This allows you to string parsers together that run one after the other. The part of the input that the first parser did not consume, is used by the next parser. This is why a parser returns on success both the resulting value and the leftover part of the input.

This is mostly useful when creating your own internal parsing building blocks.

parse : Parser input a, input, (input -> Bool) -> Result a [ ParsingFailure Str, ParsingIncomplete input ]

Runs a parser on the given input, expecting it to fully consume the input

The input -> Bool parameter is used to check whether parsing has 'completed', i.e. how to determine if all of the input has been consumed.

For most input types, a parsing run that leaves some unparsed input behind should be considered an error.

fail : Str -> Parser * *

Parser that can never succeed, regardless of the given input. It will always fail with the given error message.

This is mostly useful as a 'base case' if all other parsers in a oneOf or alt have failed, to provide some more descriptive error message.

const : a -> Parser * a

Parser that will always produce the given a, without looking at the actual input. This is useful as a basic building block, especially in combination with map and apply.

parseU32 : Parser (List U8) U32
parseU32 =
    const Num.toU32
    |> keep digits

expect parseStr parseU32 "123" == Ok 123u32

alt : Parser input a, Parser input a -> Parser input a

Try the first parser and (only) if it fails, try the second parser as fallback.

apply : Parser input (a -> b), Parser input a -> Parser input b

Runs a parser building a function, then a parser building a value, and finally returns the result of calling the function with the value.

This is useful if you are building up a structure that requires more parameters than there are variants of map, map2, map3 etc. for.

For instance, the following two are the same:

const (\x, y, z -> Triple x y z)
|> map3 String.digits String.digits String.digits

const (\x -> \y -> \z -> Triple x y z)
|> apply String.digits
|> apply String.digits
|> apply String.digits

Indeed, this is how map, map2, map3 etc. are implemented under the hood.

Currying

Be aware that when using apply, you need to explicitly 'curry' the parameters to the construction function. This means that instead of writing \x, y, z -> ... you'll need to write \x -> \y -> \z -> .... This is because the parameters of the function will be applied one by one as parsing continues.

oneOf : List (Parser input a) -> Parser input a

Try a list of parsers in turn, until one of them succeeds.

color : Parser Utf8 [Red, Green, Blue]
color =
    oneOf [
        const Red |> skip (string "red"),
        const Green |> skip (string "green"),
        const Blue |> skip (string "blue"),
    ]

expect parseStr color "green" == Ok Green

map : Parser input a, (a -> b) -> Parser input b

Transforms the result of parsing into something else, using the given transformation function.

map2 : Parser input a, Parser input b, (a, b -> c) -> Parser input c

Transforms the result of parsing into something else, using the given two-parameter transformation function.

map3 : Parser input a, Parser input b, Parser input c, (a, b, c -> d) -> Parser input d

Transforms the result of parsing into something else, using the given three-parameter transformation function.

If you need transformations with more inputs, take a look at apply.

flatten : Parser input (Result a Str) -> Parser input a

Removes a layer of Result from running the parser.

Use this to map functions that return a result over the parser, where errors are turned into ParsingFailures.

# Parse a number from a List U8
u64 : Parser Utf8 U64
u64 =
    string
    |> map \val ->
        when Str.toU64 val is
            Ok num -> Ok num
            Err _ -> Err "$(val) is not a U64."
    |> flatten

lazy : ({} -> Parser input a) -> Parser input a

Runs a parser lazily

This is (only) useful when dealing with a recursive structure. For instance, consider a type Comment : { message: String, responses: List Comment }. Without lazy, you would ask the compiler to build an infinitely deep parser. (Resulting in a compiler error.)

maybe : Parser input a -> Parser input (Result a [Nothing])

many : Parser input a -> Parser input (List a)

A parser which runs the element parser zero or more times on the input, returning a list containing all the parsed elements.

Also see Parser.oneOrMore.

oneOrMore : Parser input a -> Parser input (List a)

A parser which runs the element parser one or more times on the input, returning a list containing all the parsed elements.

Also see Parser.many.

between : Parser input a, Parser input open, Parser input close -> Parser input a

Runs a parser for an 'opening' delimiter, then your main parser, then the 'closing' delimiter, and only returns the result of your main parser.

Useful to recognize structures surrounded by delimiters (like braces, parentheses, quotes, etc.)

betweenBraces  = \parser -> parser |> between (scalar '[') (scalar ']')

sepBy1 : Parser input a, Parser input sep -> Parser input (List a)

sepBy : Parser input a, Parser input sep -> Parser input (List a)

parseNumbers : Parser (List U8) (List U64)
parseNumbers =
    digits |> sepBy (codeunit ',')

expect parseStr parseNumbers "1,2,3" == Ok [1,2,3]

ignore : Parser input a -> Parser input {}

keep : Parser input (a -> b), Parser input a -> Parser input b

skip : Parser input a, Parser input * -> Parser input a

chompUntil : a -> Parser (List a) (List a) where a implements Eq

Match zero or more codeunits until the it reaches the given codeunit. The given codeunit is not included in the match.

This can be used with Parser.skip to ignore text.

ignoreText : Parser (List U8) U64
ignoreText =
    const (\d -> d)
    |> skip (chompUntil ':')
    |> skip (codeunit ':')
    |> keep digits

expect parseStr ignoreText "ignore preceding text:123" == Ok 123

This can be used with Parser.keep to capture a list of U8 codeunits.

captureText : Parser (List U8) (List U8)
captureText =
    const (\codeunits -> codeunits)
    |> keep (chompUntil ':')
    |> skip (codeunit ':')

expect parseStr captureText "Roc:" == Ok ['R', 'o', 'c']

Use String.strFromUtf8 to turn the results into a Str.

Also see Parser.chompWhile.

chompWhile : (a -> Bool) -> Parser (List a) (List a) where a implements Eq

Match zero or more codeunits until the check returns false. The codeunit that returned false is not included in the match. Note: a chompWhile parser always succeeds!

This can be used with Parser.skip to ignore text. This is useful for chomping whitespace or variable names.

ignoreNumbers : Parser (List U8) Str
ignoreNumbers =
    const (\str -> str)
    |> skip (chompWhile \b -> b >= '0' && b <= '9')
    |> keep (string "TEXT")

expect parseStr ignoreNumbers "0123456789876543210TEXT" == Ok "TEXT"

This can be used with Parser.keep to capture a list of U8 codeunits.

captureNumbers : Parser (List U8) (List U8)
captureNumbers =
    const (\codeunits -> codeunits)
    |> keep (chompWhile \b -> b >= '0' && b <= '9')
    |> skip (string "TEXT")

expect parseStr captureNumbers "123TEXT" == Ok ['1', '2', '3']

Use String.strFromUtf8 to turn the results into a Str.

Also see Parser.chompUntil.