- Scan the input string *linearly*, without backtracking
- Peek at most one character (unicode code-point) ahead
- Store data (unescaped string literals and chars, identifiers, integers, floats) inside Token
- This unfortunately makes tokens non-Copy
- Refactor Parser to accommodate these changes
- On the bright side, Parser no longer needs a reference to the text!
- Write a new set of lexer tests
- TODO: write a new set of token tests using tokendata
Every day, we get closer to parsing `dummy.cl`!
- Unified math operations into a single self-referential enum
- Walk now visits the children of a node, rather than the node itself
- The old behavior was super confusing, and led to numerous stack overflows.
Things left to consider:
- token::Type enum is getting fairly large.
breaking it up could invoke substantial code bloat
- Compound operators might make more sense at the parser level
- Compound-assign operators are ripe for syntactic desugaring,
but there must be some reason it's done separately in other languages.
- Operators like FatArrow may still make sense at the tokenizer level, regardless.
- What is a lexer? A miserable pile of parsers!
- Operator overloading, or user-defined operators? Hmm...
- Renamed literal Types to reflect their literal nature
- This allows for consistent naming across future non-literal Types
- Complicated lexer Rules have been split into composable sub-rules,
and moved into the Rule struct.
- This improves modularity, and allows sharing of sub-rules across rules.
- Documented each lexer rule with (at least) a one-line blurb
describing its function
- Lexer now ignores leading whitespace
- Rule now has shorter, clearer function names
- Tests for comment lexing are now consolidated into a module
- Tests using the assert_has_type_and_len wrapper can now specify
an expected length