diff --git a/CHANGELOG.md b/CHANGELOG.md index 204c4a9..c6f056d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,10 +8,19 @@ Breaking changes: New features: +- `Parser.String.rest` (#140 by @jamesdbrock) +- `Parser.String.takeN` (#140 by @jamesdbrock) +- `Parser.Token.eof` (#140 by @jamesdbrock) + Bugfixes: +- `Parser.String.eof` Set consumed on success so that this parser combines + correctly with `notFollowedBy eof`. Added a test for this. (#140 by @jamesdbrock) + Other improvements: +- Documentation. (#140 by @jamesdbrock) + ## [v8.1.0](https://github.com/purescript-contrib/purescript-parsing/releases/tag/v8.1.0) - 2022-01-10 Other improvements: README Quick start monadic parsing tutorial diff --git a/README.md b/README.md index e4178f4..217c120 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,8 @@ [![Maintainer: jamesdbrock](https://img.shields.io/badge/maintainer-jamesdbrock-teal.svg)](https://github.com/jamesdbrock) [![Maintainer: robertdp](https://img.shields.io/badge/maintainer-robertdp-teal.svg)](https://github.com/robertdp) -A monadic parser combinator library based on Haskell's [Parsec](https://hackage.haskell.org/package/parsec). +A monadic parser combinator library based on Haskell’s +[Parsec](https://hackage.haskell.org/package/parsec). ## Installation @@ -22,26 +23,41 @@ Here is a basic tutorial introduction to monadic parsing with this package. ### Parsers -A parser turns a string into a data structure. Parsers in this library have the type `Parser s a`, where `s` is the type of the input string, and `a` is the type of the data which the parser will produce on success. `Parser s a` is a monad. It’s defined in the module `Text.Parsing.Parser`. +A parser turns a string into a data structure. Parsers in this library have the type `Parser s a`, where `s` is the type of the input string, and `a` is the type of the data which the parser will produce on success. `Parser s` is a monad. It’s defined in the module `Text.Parsing.Parser`. -Monads can be used to provide context for a computation, and that’s how we use them in monadic parsing. The context provided by the `Parser` monad is *the parser’s current location in the input string*. Parsing starts at the beginning of the input string. +Monads can be used to provide context for a computation, and that’s how we use them in monadic parsing. +The context provided by the `Parser s` monad is __the parser’s current location in the input string__. +Parsing starts at the beginning of the input string. -Parsing requires two more capabilities: *choice* and *failure*. +Parsing requires two more capabilities: __alternative__ and __failure__. -We need *choice* to be able to make decisions about what kind of thing we’re parsing depending on the input which we encouter. This is provided by the `Alt` typeclass instance of the `Parser` monad, particularly the `<|>` operator. That operator will first try the left parser and if that fails, then it will backtrack the input string and try the right parser. +We need __alternative__ to be able to choose what kind of thing we’re parsing depending +on the input which we encounter. This is provided by the `<|>` “alt” +operator of the `Alt` typeclass instance of the `Parser s` monad. +The expression `p_left <|> p_right` will first try the `p_left` parser and if that fails +__and consumes no input__ then it will try the `p_right` parser. -We need *failure* in case the input stream is not parseable. This is provided by the `fail` function, which calls the `throwError` function of the `MonadThrow` typeclass instance of the `Parser` monad. The result of running a parser has type `Either ParseError a`, so if the parse succeeds then the result is `Right a` and if the parse fails then the result is `Left ParseError`. +We need __failure__ in case the input stream is not parseable. This is provided by the `fail` +function, which calls the `throwError` function of the `MonadThrow` typeclass instance of +the `Parser s` monad. - -### Running a parser - -To run a parser, call the function `runParser :: s -> Parser s a -> Either ParseError a` in the `Text.Parsing.Parser` module, and supply it with an input string and a parser. +To run a parser, call the function `runParser :: s -> Parser s a -> Either ParseError a` in +the `Text.Parsing.Parser` module, and supply it with an input string and a parser. +If the parse succeeds then the result is `Right a` and if the parse fails then the +result is `Left ParseError`. ### Primitive parsers -Each type of input string needs primitive parsers. Primitive parsers for input string type `String` are in the `Text.Parsing.Parser.String` module. We can use these primitive parsers to write other `String` parsers. +Each type of input string needs primitive parsers. +Primitive parsers for input string type `String` are in the `Text.Parsing.Parser.String` module. +We can use these primitive parsers to write other `String` parsers. -Here is a parser `ayebee :: Parser String Boolean` which will accept only two input strings: `"ab"` or `"aB"`. It will return `true` if the `b` character is uppercase. It will return `false` if the `b` character is lowercase. It will fail with a `ParseError` if the input string is anything else. This parser is written in terms of the primitive parser `char :: Parser String Char`. +Here is a parser `ayebee :: Parser String Boolean` which will accept only two input +strings: `"ab"` or `"aB"`. +It will return `true` if the `b` character is uppercase. +It will return `false` if the `b` character is lowercase. +It will fail with a `ParseError` if the input string is anything else. +This parser is written in terms of the primitive parser `char :: Parser String Char`. ```purescript ayebee :: Parser String Boolean @@ -61,24 +77,33 @@ and then the parser will succeed and return `Right true`. #### [✨ Run the `ayebee` parser in your browser on *Try PureScript!*](https://try.purescript.org/?github=/purescript-contrib/purescript-parsing/main/docs/examples/QuickStart.purs) -When you write a real parser you will usually want to return a more complicated data structure than a single `Boolean`. See [*Parse, don't validate*](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/). - ### More parsers There are other `String` parsers in the module `Text.Parsing.Parser.Token`, for example the parser `letter :: Parser String Char` which will accept any single alphabetic letter. ### Parser combinators -A parser combinator is a function which takes a parser as an argument and returns a new parser. The `many` combinator, for example, will repeat a parser as many times as it can. So the parser `many letter` will have type `Parser String (Array Char)`. Parser combinators are in this package in the module `Text.Parsing.Parser.Combinators`. +A parser combinator is a function which takes a parser as an argument and returns a new parser. The `many` combinator, for example, will repeat a parser as many times as it can. So the parser `many letter` will have type `Parser String (Array Boolean)`. Running that parser + +```purescript +runParser "aBabaB" (many ayebee) +``` + +will return `Right [true, false, true]`. + +Parser combinators are in this package in the module `Text.Parsing.Parser.Combinators`. ## Further reading -Here is the original short classic [FUNCTIONAL PEARLS *Monadic Parsing in Haskell*](https://www.cs.nott.ac.uk/~pszgmh/pearl.pdf) by Graham Hutton and Erik Meijer. +Here is the original short classic [FUNCTIONAL PEARLS *Monadic Parsing in Haskell*](https://www.cs.nott.ac.uk/~pszgmh/pearl.pdf) by Graham Hutton and Erik Meijer. [*Revisiting Monadic Parsing in Haskell*](https://vaibhavsagar.com/blog/2018/02/04/revisiting-monadic-parsing-haskell/) by Vaibhav Sagar is a reflection on the Hutton, Meijer FUNCTIONAL PEARL. [*Parse, don't validate*](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/) by Alexis King is about what it means to “parse” something, without any mention of monads. +[*Parsec: “try a <|> b” considered harmful*](http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/) by Edward Z. Yang is about how to decide when to backtrack +from a failed alternative. + There are lots of other great monadic parsing tutorials on the internet. ## Related Packages diff --git a/src/Text/Parsing/Parser.purs b/src/Text/Parsing/Parser.purs index aafd63e..98cfee1 100644 --- a/src/Text/Parsing/Parser.purs +++ b/src/Text/Parsing/Parser.purs @@ -51,6 +51,18 @@ derive instance ordParseError :: Ord ParseError -- | Contains the remaining input and current position. data ParseState s = ParseState s Position Boolean +-- ParseState constructor has three parameters, +-- s: the remaining input +-- Position: the current position +-- Boolean: the consumed flag. +-- +-- The consumed flag is used to implement the rule for `alt` that +-- * If the left parser fails *without consuming any input*, then backtrack and try the right parser. +-- * If the left parser fails and consumes input, then fail immediately. +-- +-- https://hackage.haskell.org/package/parsec/docs/Text-Parsec.html#v:try +-- +-- http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/ -- | The Parser monad transformer. -- | @@ -105,12 +117,25 @@ derive newtype instance monadStateParserT :: Monad m => MonadState (ParseState s derive newtype instance monadThrowParserT :: Monad m => MonadThrow ParseError (ParserT s m) derive newtype instance monadErrorParserT :: Monad m => MonadError ParseError (ParserT s m) +-- | The alternative `Alt` instance provides the `alt` combinator `<|>`. +-- | +-- | The expression `p_left <|> p_right` will first try the `p_left` parser and if that fails +-- | __and consumes no input__ then it will try the `p_right` parser. +-- | +-- | While we are parsing down the `p_left` branch we may reach a point where +-- | we know this is the correct branch, but we cannot parse further. At +-- | that point we want to fail the entire parse instead of trying the `p_right` +-- | branch. To control the point at which we commit to the `p_left` branch +-- | use the `try` combinator. +-- | +-- | The `alt` combinator works this way because it gives us good localized +-- | error messages while also allowing an efficient implementation. instance altParserT :: Monad m => Alt (ParserT s m) where alt p1 p2 = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState i p _)) -> do - Tuple e s'@(ParseState _ _ c') <- runStateT (runExceptT (unwrap p1)) (ParseState i p false) + Tuple e s'@(ParseState _ _ consumed) <- runStateT (runExceptT (unwrap p1)) (ParseState i p false) case e of Left _ - | not c' -> runStateT (runExceptT (unwrap p2)) s + | not consumed -> runStateT (runExceptT (unwrap p2)) s _ -> pure (Tuple e s') instance plusParserT :: Monad m => Plus (ParserT s m) where @@ -147,4 +172,3 @@ failWithPosition message pos = throwError (ParseError message pos) -- | `region` as the parser backs out the call stack. region :: forall m s a. Monad m => (ParseError -> ParseError) -> ParserT s m a -> ParserT s m a region context p = catchError p $ \err -> throwError $ context err - diff --git a/src/Text/Parsing/Parser/Combinators.purs b/src/Text/Parsing/Parser/Combinators.purs index 10e977e..ba31162 100644 --- a/src/Text/Parsing/Parser/Combinators.purs +++ b/src/Text/Parsing/Parser/Combinators.purs @@ -1,24 +1,25 @@ --- | Combinators for creating parsers. +-- | Parser combinators. A parser combinator is a function which takes some +-- | parsers as arguments and returns a new parser. -- | --- | ### Notes --- | --- | A few of the known combinators from Parsec are missing in this module. That +-- | A few of the well-known combinators from Haskell Parsec are missing in this module. That -- | is because they have already been defined in other libraries. -- | --- | ```purescript --- | Text.Parsec.many = Data.(Array|List).many --- | Text.Parsec.(<|>) = Control.Alt.alt (<|>) --- | ``` +-- | * `Text.Parsec.many` ⟶ `Data.Array.many` or `Data.List.many` +-- | * `Text.Parsec.(<|>)` ⟶ `Control.Alt.(<|>)` -- | --- | Because Strings are not Char Arrays in PureScript `many` and `some` on Char Parsers need to --- | be used in conjunction with `Data.String.CodeUnits.fromCharArray` to achieve "Parsec-like" results. +-- | Note that `Data.(Array|List).(many|some)` are not stack safe. If you need to parse +-- | large numbers of items then consider +-- | using `Data.(Array|List).(manyRec|someRec)` instead. -- | --- | ```purescript --- | Text.Parsec.many (char 'x') <=> fromCharArray <$> Data.Array.many (char 'x') +-- | Because `String`s are not `Char` arrays in PureScript, `many` and `some` on `Char` parsers need to +-- | be used in conjunction with `Data.String.CodeUnits.fromCharArray` to +-- | construct a `String`. +-- | +-- | ``` +-- | fromCharArray <$> Data.Array.many (char 'x') -- | ``` -- | --- | Note that `Data.(Array|List).(many|some)` are not stack safe. If you need to parse --- | large numbers of items then consider using `Data.List.(manyRec|someRec)` instead. +-- | To repeat a parser exactly *N* times use `Data.Unfoldable.replicateA`. module Text.Parsing.Parser.Combinators where @@ -87,7 +88,19 @@ optional p = void p <|> pure unit optionMaybe :: forall m s a. Monad m => ParserT s m a -> ParserT s m (Maybe a) optionMaybe p = option Nothing (Just <$> p) --- | In case of failure, reset the stream to the unconsumed state. +-- | If the parser fails then reset the stream to the unconsumed state. +-- | +-- | One use for this combinator is to ensure that the right parser of an +-- | alternative will always be tried when the left parser fails. +-- | ``` +-- | >>> runParser "ac" ((char 'a' *> char 'b') <|> (char 'a' *> char 'c')) +-- | Left (ParseError "Expected 'b'" (Position { line: 1, column: 2 })) +-- | ``` +-- | +-- | ``` +-- | >>> runParser "ac" (try (char 'a' *> char 'b') <|> (char 'a' *> char 'c')) +-- | Right 'c' +-- | ``` try :: forall m s a. Monad m => ParserT s m a -> ParserT s m a try p = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState _ _ consumed)) -> do Tuple e s'@(ParseState input position _) <- runStateT (runExceptT (unwrap p)) s @@ -95,7 +108,19 @@ try p = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState _ _ consumed)) -> do Left _ -> pure (Tuple e (ParseState input position consumed)) _ -> pure (Tuple e s') --- | Like `try`, but will reannotate the error location to the `try` point. +-- | If the parser fails then reset the stream to the unconsumed state. +-- | +-- | Like `try`, but will relocate the error to the `try` point. +-- | +-- | ``` +-- | >>> runParser "ac" (try (char 'a' *> char 'b')) +-- | Left (ParseError "Expected 'b'" (Position { line: 1, column: 2 })) +-- | ``` +-- | +-- | ``` +-- | >>> runParser "ac" (tryRethrow (char 'a' *> char 'b')) +-- | Left (ParseError "Expected 'b'" (Position { line: 1, column: 1 })) +-- | ``` tryRethrow :: forall m s a. Monad m => ParserT s m a -> ParserT s m a tryRethrow p = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState _ position consumed)) -> do Tuple e s'@(ParseState input' position' _) <- runStateT (runExceptT (unwrap p)) s @@ -113,7 +138,9 @@ lookAhead p = (ParserT <<< ExceptT <<< StateT) \s -> do many1 :: forall m s a. Monad m => ParserT s m a -> ParserT s m (NonEmptyList a) many1 p = NEL.cons' <$> p <*> many p --- | Stack-safe version of `many1` at the expense of `MonadRec` constraint +-- | Match one or more times. +-- | +-- | Stack-safe version of `many1` at the expense of `MonadRec` constraint. many1Rec :: forall m s a. MonadRec m => ParserT s m a -> ParserT s m (NonEmptyList a) many1Rec p = NEL.cons' <$> p <*> manyRec p @@ -127,7 +154,9 @@ many1Rec p = NEL.cons' <$> p <*> manyRec p sepBy :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) sepBy p sep = map NEL.toList (sepBy1 p sep) <|> pure Nil --- | Stack-safe version of `sepBy` at the expense of `MonadRec` constraint +-- | Parse phrases delimited by a separator. +-- | +-- | Stack-safe version of `sepBy` at the expense of `MonadRec` constraint. sepByRec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) sepByRec p sep = map NEL.toList (sepBy1Rec p sep) <|> pure Nil @@ -138,7 +167,9 @@ sepBy1 p sep = do as <- many $ sep *> p pure (NEL.cons' a as) --- | Stack-safe version of `sepBy1` at the expense of `MonadRec` constraint +-- | Parse phrases delimited by a separator, requiring at least one match. +-- | +-- | Stack-safe version of `sepBy1` at the expense of `MonadRec` constraint. sepBy1Rec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (NonEmptyList a) sepBy1Rec p sep = do a <- p @@ -149,7 +180,9 @@ sepBy1Rec p sep = do sepEndBy :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) sepEndBy p sep = map NEL.toList (sepEndBy1 p sep) <|> pure Nil --- | Stack-safe version of `sepEndBy` at the expense of `MonadRec` constraint +-- | Parse phrases delimited and optionally terminated by a separator. +-- | +-- | Stack-safe version of `sepEndBy` at the expense of `MonadRec` constraint. sepEndByRec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) sepEndByRec p sep = map NEL.toList (sepEndBy1Rec p sep) <|> pure Nil @@ -163,7 +196,9 @@ sepEndBy1 p sep = do pure (NEL.cons' a as) ) <|> pure (NEL.singleton a) --- | Stack-safe version of `sepEndBy1` at the expense of `MonadRec` constraint +-- | Parse phrases delimited and optionally terminated by a separator, requiring at least one match. +-- | +-- | Stack-safe version of `sepEndBy1` at the expense of `MonadRec` constraint. sepEndBy1Rec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (NonEmptyList a) sepEndBy1Rec p sep = do a <- p @@ -184,7 +219,9 @@ sepEndBy1Rec p sep = do endBy1 :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (NonEmptyList a) endBy1 p sep = many1 $ p <* sep --- | Stack-safe version of `endBy1` at the expense of `MonadRec` constraint +-- | Parse phrases delimited and terminated by a separator, requiring at least one match. +-- | +-- | Stack-safe version of `endBy1` at the expense of `MonadRec` constraint. endBy1Rec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (NonEmptyList a) endBy1Rec p sep = many1Rec $ p <* sep @@ -192,7 +229,9 @@ endBy1Rec p sep = many1Rec $ p <* sep endBy :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) endBy p sep = many $ p <* sep --- | Stack-safe version of `endBy` at the expense of `MonadRec` constraint +-- | Parse phrases delimited and terminated by a separator. +-- | +-- | Stack-safe version of `endBy` at the expense of `MonadRec` constraint. endByRec :: forall m s a sep. MonadRec m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a) endByRec p sep = manyRec $ p <* sep @@ -206,6 +245,8 @@ endByRec p sep = manyRec $ p <* sep chainr :: forall m s a. Monad m => ParserT s m a -> ParserT s m (a -> a -> a) -> a -> ParserT s m a chainr p f a = chainr1 p f <|> pure a +-- | Parse phrases delimited by a right-associative operator. +-- | -- | Stack-safe version of `chainr` at the expense of `MonadRec` constraint. chainrRec :: forall m s a. MonadRec m => ParserT s m a -> ParserT s m (a -> a -> a) -> a -> ParserT s m a chainrRec p f a = chainr1Rec p f <|> pure a @@ -220,6 +261,8 @@ chainrRec p f a = chainr1Rec p f <|> pure a chainl :: forall m s a. Monad m => ParserT s m a -> ParserT s m (a -> a -> a) -> a -> ParserT s m a chainl p f a = chainl1 p f <|> pure a +-- | Parse phrases delimited by a left-associative operator. +-- | -- | Stack-safe version of `chainl` at the expense of `MonadRec` constraint. chainlRec :: forall m s a. MonadRec m => ParserT s m a -> ParserT s m (a -> a -> a) -> a -> ParserT s m a chainlRec p f a = chainl1Rec p f <|> pure a @@ -237,6 +280,8 @@ chainl1 p f = do chainl1' (f' a a') ) <|> pure a +-- | Parse phrases delimited by a left-associative operator, requiring at least one match. +-- | -- | Stack-safe version of `chainl1` at the expense of `MonadRec` constraint. chainl1Rec :: forall m s a. MonadRec m => ParserT s m a -> ParserT s m (a -> a -> a) -> ParserT s m a chainl1Rec p f = do @@ -265,6 +310,8 @@ chainr1 p f = do pure $ f' a a' ) <|> pure a +-- | Parse phrases delimited by a right-associative operator, requiring at least one match. +-- | -- | Stack-safe version of `chainr1` at the expense of `MonadRec` constraint. chainr1Rec :: forall m s a. MonadRec m => ParserT s m a -> ParserT s m (a -> a -> a) -> ParserT s m a chainr1Rec p f = do @@ -316,6 +363,8 @@ choice = foldl (<|>) empty skipMany :: forall s a m. Monad m => ParserT s m a -> ParserT s m Unit skipMany p = skipMany1 p <|> pure unit +-- | Skip many instances of a phrase. +-- | -- | Stack-safe version of `skipMany` at the expense of `MonadRec` constraint. skipManyRec :: forall s a m. MonadRec m => ParserT s m a -> ParserT s m Unit skipManyRec p = skipMany1Rec p <|> pure unit @@ -327,6 +376,8 @@ skipMany1 p = do _ <- skipMany p pure unit +-- | Skip at least one instance of a phrase. +-- | -- | Stack-safe version of `skipMany1` at the expense of `MonadRec` constraint. skipMany1Rec :: forall s a m. MonadRec m => ParserT s m a -> ParserT s m Unit skipMany1Rec p = p *> tailRecM go unit @@ -346,6 +397,8 @@ manyTill p end = scan xs <- scan pure (x : xs) +-- | Parse several phrases until the specified terminator matches. +-- | -- | Stack-safe version of `manyTill` at the expense of `MonadRec` constraint manyTillRec :: forall s a m e. MonadRec m => ParserT s m a -> ParserT s m e -> ParserT s m (List a) manyTillRec p end = tailRecM go Nil @@ -362,6 +415,8 @@ many1Till p end = do xs <- manyTill p end pure (NEL.cons' x xs) +-- | Parse several phrases until the specified terminator matches, requiring at least one match. +-- | -- | Stack-safe version of `many1Till` at the expense of `MonadRec` constraint many1TillRec :: forall s a m e. MonadRec m => ParserT s m a -> ParserT s m e -> ParserT s m (NonEmptyList a) many1TillRec p end = NEL.cons' <$> p <*> manyTillRec p end diff --git a/src/Text/Parsing/Parser/String.purs b/src/Text/Parsing/Parser/String.purs index 96fb8a2..e5e0421 100644 --- a/src/Text/Parsing/Parser/String.purs +++ b/src/Text/Parsing/Parser/String.purs @@ -1,5 +1,10 @@ -- | Primitive parsers for working with an input stream of type `String`. -- | +-- | All of these primitive parsers will consume their input when they succeed. +-- | +-- | All of these primitive parsers will consume no input when they +-- | fail. +-- | -- | The behavior of these primitive parsers is based on the behavior of the -- | `Data.String` module in the __strings__ package. -- | In most JavaScript runtime environments, the `String` @@ -17,11 +22,13 @@ module Text.Parsing.Parser.String ( string , eof + , rest , anyChar , anyCodePoint , satisfy , satisfyCodePoint , char + , takeN , whiteSpace , skipSpaces , oneOf @@ -39,19 +46,29 @@ import Data.Char (fromCharCode) import Data.CodePoint.Unicode (isSpace) import Data.Foldable (elem) import Data.Maybe (Maybe(..)) -import Data.String (CodePoint, Pattern(..), null, singleton, stripPrefix, uncons) +import Data.String (CodePoint, Pattern(..), length, null, singleton, splitAt, stripPrefix, uncons) import Data.String.CodeUnits as SCU import Data.Tuple (Tuple(..), fst) -import Text.Parsing.Parser (ParseState(..), ParserT, fail) +import Text.Parsing.Parser (ParseState(..), ParserT, consume, fail) import Text.Parsing.Parser.Combinators (skipMany, tryRethrow, (), (<~?>)) import Text.Parsing.Parser.Pos (Position(..)) import Unsafe.Coerce (unsafeCoerce) --- | Match end-of-file. +-- | Match “end-of-file,” the end of the input stream. eof :: forall m. Monad m => ParserT String m Unit eof = do ParseState input _ _ <- get - unless (null input) (fail "Expected EOF") + if (null input) + -- We must consume so this combines correctly with notFollowedBy + then consume + else (fail "Expected EOF") + +-- | Match the entire rest of the input stream. Always succeeds. +rest :: forall m. Monad m => ParserT String m String +rest = do + ParseState input position _ <- get + put $ ParseState "" (updatePosString position input) true + pure input -- | Match the specified string. string :: forall m. Monad m => String -> ParserT String m String @@ -106,12 +123,22 @@ satisfyCodePoint f = tryRethrow do char :: forall m. Monad m => Char -> ParserT String m Char char c = satisfy (_ == c) show c +-- | Match a `String` exactly *N* characters long. +takeN :: forall m. Monad m => Int -> ParserT String m String +takeN n = do + ParseState input position _ <- get + let { before, after } = splitAt n input + if length before == n then do + put $ ParseState after (updatePosString position before) true + pure before + else fail ("Could not take " <> show n <> " characters") + -- | Match zero or more whitespace characters satisfying --- | `Data.CodePoint.Unicode.isSpace`. +-- | `Data.CodePoint.Unicode.isSpace`. Always succeeds. whiteSpace :: forall m. Monad m => ParserT String m String whiteSpace = fst <$> match skipSpaces --- | Skip whitespace characters. +-- | Skip whitespace characters and throw them away. Always succeeds. skipSpaces :: forall m. Monad m => ParserT String m Unit skipSpaces = skipMany (satisfyCodePoint isSpace) diff --git a/src/Text/Parsing/Parser/Token.purs b/src/Text/Parsing/Parser/Token.purs index 7d75fe3..3241817 100644 --- a/src/Text/Parsing/Parser/Token.purs +++ b/src/Text/Parsing/Parser/Token.purs @@ -4,6 +4,7 @@ module Text.Parsing.Parser.Token ( token , when , match + , eof , LanguageDef , GenLanguageDef(LanguageDef) , unGenLanguageDef @@ -23,7 +24,7 @@ module Text.Parsing.Parser.Token import Prelude hiding (between, when) import Control.Lazy (fix) -import Control.Monad.State (gets, modify_) +import Control.Monad.State (get, gets, modify_) import Control.MonadPlus (guard, (<|>)) import Data.Array as Array import Data.Char (fromCharCode, toCharCode) @@ -43,12 +44,12 @@ import Data.String.CodeUnits as SCU import Data.String.Unicode as Unicode import Data.Tuple (Tuple(..)) import Math (pow) -import Text.Parsing.Parser (ParseState(..), ParserT, fail) +import Text.Parsing.Parser (ParseState(..), ParserT, consume, fail) import Text.Parsing.Parser.Combinators (between, choice, notFollowedBy, option, sepBy, sepBy1, skipMany, skipMany1, try, tryRethrow, (), ()) import Text.Parsing.Parser.Pos (Position) import Text.Parsing.Parser.String (char, noneOf, oneOf, satisfy, satisfyCodePoint, string) --- | Create a parser which Returns the first token in the stream. +-- | A parser which returns the first token in the stream. token :: forall m a. Monad m => (a -> Position) -> ParserT (List a) m a token tokpos = do input <- gets \(ParseState input _ _) -> input @@ -59,7 +60,7 @@ token tokpos = do ParseState tail (tokpos head) true pure head --- | Create a parser which matches any token satisfying the predicate. +-- | A parser which matches any token satisfying the predicate. when :: forall m a. Monad m => (a -> Position) -> (a -> Boolean) -> ParserT (List a) m a when tokpos f = tryRethrow do a <- token tokpos @@ -70,6 +71,15 @@ when tokpos f = tryRethrow do match :: forall a m. Monad m => Eq a => (a -> Position) -> a -> ParserT (List a) m a match tokpos tok = when tokpos (_ == tok) +-- | Match the “end-of-file,” the end of the input stream. +eof :: forall a m. Monad m => ParserT (List a) m Unit +eof = do + ParseState input _ _ <- get + if (List.null input) + -- We must consume so this combines correctly with notFollowedBy + then consume + else (fail "Expected EOF") + type LanguageDef = GenLanguageDef String Identity -- | The `GenLanguageDef` type is a record that contains all parameterizable diff --git a/test/Main.purs b/test/Main.purs index aa5784f..4d5d525 100644 --- a/test/Main.purs +++ b/test/Main.purs @@ -18,13 +18,14 @@ import Data.Tuple (Tuple(..)) import Effect (Effect) import Effect.Console (logShow) import Test.Assert (assert') -import Text.Parsing.Parser (ParseError(..), Parser, ParserT, parseErrorMessage, parseErrorPosition, region, runParser) -import Text.Parsing.Parser.Combinators (between, chainl, chainl1Rec, chainlRec, chainr1Rec, chainrRec, endBy1, endBy1Rec, endByRec, many1Rec, many1TillRec, manyTillRec, optionMaybe, sepBy1, sepBy1Rec, sepByRec, sepEndBy1Rec, sepEndByRec, skipMany1Rec, skipManyRec, try) +import Text.Parsing.Parser (ParseError(..), Parser, ParserT, parseErrorMessage, parseErrorPosition, position, region, runParser) +import Text.Parsing.Parser.Combinators (between, chainl, chainl1Rec, chainlRec, chainr1Rec, chainrRec, endBy1, endBy1Rec, endByRec, many1Rec, many1TillRec, manyTillRec, notFollowedBy, optionMaybe, sepBy1, sepBy1Rec, sepByRec, sepEndBy1Rec, sepEndByRec, skipMany1Rec, skipManyRec, try) import Text.Parsing.Parser.Expr (Assoc(..), Operator(..), buildExprParser) import Text.Parsing.Parser.Language (haskellDef, haskellStyle, javaStyle) import Text.Parsing.Parser.Pos (Position(..), initialPos) -import Text.Parsing.Parser.String (anyChar, anyCodePoint, char, eof, noneOfCodePoints, oneOfCodePoints, satisfy, string, whiteSpace) +import Text.Parsing.Parser.String (anyChar, anyCodePoint, char, eof, noneOfCodePoints, oneOfCodePoints, rest, satisfy, string, takeN, whiteSpace) import Text.Parsing.Parser.Token (TokenParser, letter, makeTokenParser, match, token, when) +import Text.Parsing.Parser.Token as Parser.Token parens :: forall m a. Monad m => ParserT String m a -> ParserT String m a parens = between (string "(") (string ")") @@ -574,6 +575,17 @@ main = do parseTest "1*2+3/4-5" (-3) exprTest parseTest "ab?" "ab" manySatisfyTest + parseTest "ab" unit (char 'a' *> notFollowedBy (char 'a')) + + parseTest "rest" "rest" rest + parseTest "rest" unit (rest *> eof) + parseTest "rest\nrest" (Position { line: 2, column: 5 }) (rest *> position) + + parseErrorTestPosition + (rest *> notFollowedBy eof) + "aa\naa" + (Position { column: 3, line: 2 }) + parseErrorTestPosition anyChar "𝅘𝅥𝅯" @@ -591,6 +603,11 @@ main = do one <- Array.many $ oneOfCodePoints $ SCP.toCodePointArray "🤔💯✅" pure $ SCP.fromCodePointArray <$> [ none, one ] + parseTest "abcd" "ab" $ takeN 2 + parseTest "abcd" "" $ takeN 0 + parseErrorTestPosition (takeN 10) "abcd" (Position { column: 1, line: 1 }) + parseErrorTestPosition (takeN (-1)) "abcd" (Position { column: 1, line: 1 }) + parseErrorTestMessage (noneOfCodePoints $ SCP.toCodePointArray "❓✅") "❓" @@ -617,6 +634,8 @@ main = do parseTest (fromFoldable [ B ]) B (match tokpos B) parseTest (fromFoldable [ A, B ]) A (match tokpos A) + parseTest (fromFoldable []) unit Parser.Token.eof + parseErrorTestPosition (string "abc") "bcd" (Position { column: 1, line: 1 }) parseErrorTestPosition (string "abc" *> eof) "abcdefg" (Position { column: 4, line: 1 }) parseErrorTestPosition (string "a\nb\nc\n" *> eof) "a\nb\nc\nd\n" (Position { column: 1, line: 4 })