v0.3.0 #1

Merged
j merged 12 commits from v0.3.0 into main 2024-02-01 20:11:02 +00:00
Owner

Summary

New

Expressions!

The new expression engine can do wrapping arithmetic on immediate values and absolute addresses!

  • Supported operations:
    • Binary:
      • a * b: Multiplication
      • a / b: Division
      • a % b: Remainder/Modulus
      • a + b: Addition
      • a - b: Subtraction
      • a << b: Left bit-shift
      • a >> b: Right bit-shift
      • a & b: Bitwise and
      • a | b: Bitwise or
      • a ^ b: Bitwise xor
    • Unary:
      • -a: Two's complement negation
      • !a: Bitwise not (Inversion)
      • *a: *.org-based address dereference
    • AddrOf:
      • &ident: *Resolves the absolute address of an identifier
    • Group:
      • (a): Sub-expression grouping
  • Commas are now required between source and destination, to disambiguate between expressions :(
  • Span locations are now reported according to byte index within the file
  • Token spans are preserved throughout the entire assembly process, for better error messages

* Operations on the assembled binary are deferred until the late evaluation stage

Licensing

Relicensed the project under the MIT license, except those parts already licensed under the GPL.

Updated

The vast majority of the application has been rewritten

  • Syntax errors now highlight the precise location of the invalid token
  • External dependencies in libmsp430 have been changed:
    • Removed regex: lexer is now fully manual
    • Added unicode-ident to accept valid unicode identifiers
  • External dependencies in msp430-asm have been changed:
    • anes has been removed - it manages to make ANSI escapes more clunky than they usually are for no material benefit. It has been replaced with a couple hand-rolled macros and constants.

Lexer

  • Hand-written from scratch! Token has been rewritten from the ground up to reflect this.
  • Preprocesses integer literals, including integer base conversion, in the following formats:
    • 90 | 0d90 | 090: Decimal (90)
    • 0xf0: Hexadecimal (240)
    • 0o70: Octal (56)
    • 0b10: Binary (2)
  • Currently contains dormant string-unescaping code borrowed from Conlang. Character escaping is performed for the (currently unused) character literal syntax. I may get around to unescaping strings soon! 🦈
  • No longer spans a half-dozen files, filled with arbitrarily-chosen abstractions.
  • Comments are still not discarded
  • Still returns Option

Preprocessor

-Rewritten from the ground up. It does pretty much the same thing as the previous iterator implementation, but in a simpler way, without all the iterator nonsense.

  • Accepts (without discarding) .define statements
  • Stores a map from token lexemes to their replacement token streams

Parser

  • The top-down structure and recursive descent strategy are similar to the old one, but far more maintainable.
  • Parsing is implemented in terms of a completely restructured Parsable trait, which now takes only the parser by mut reference.
    • This greatly simplifies the addition of new features, like the...
  • Brand new expression parsing rules, mentioned above!

Canonicalizer

  • Desugars "emulated" instructions into their TwoArg counterparts
    • i.e. nop => mov cg, cg, tst.b r15 => cmp.b #0, r15
  • Eagerly evaluates numeric expressions, terminating if it encounters either late-expression form
    • This could probably be improved by continuing to eagerly evaluate nested expressions before returning? Thoughts to think about.

Assembler

  • Now supports the .org directive in a meaningful way!
  • Defers jump label resolution and expression evaluation until backpatch time.
    • Usually, this deferral will be fairly cheap, but it is allowed to perform an arbitrary number of heap allocations/deallocations as expressions are cloned out of the AST and stored in the table.
    • This overhead could be avoided, at the cost of higher code complexity.

Questions and Todos

  • Is late expression evaluation expensive, compared to canonicalization? Profiling memory usage, and copies in particular, could be an avenue for optimization.
  • Is it smart to continue early-evaluating nested binary subexpressions after the first failure? It may reduce indirection during late evaluation.
# Summary ## New ### Expressions! The new expression engine can do wrapping arithmetic on immediate values and absolute addresses! - Supported operations: - Binary: - `a * b`: Multiplication - `a / b`: Division - `a % b`: Remainder/Modulus - `a + b`: Addition - `a - b`: Subtraction - `a << b`: Left bit-shift - `a >> b`: Right bit-shift - `a & b`: Bitwise and - `a | b`: Bitwise or - `a ^ b`: Bitwise xor - Unary: - `-a`: Two's complement negation - `!a`: Bitwise not (Inversion) - `*a`: \*`.org`-based address dereference - AddrOf: - `&ident`: \*Resolves the absolute address of an identifier - Group: - `(a)`: Sub-expression grouping - Commas are now required between source and destination, to disambiguate between expressions :( - Span locations are now reported according to byte index within the file - Token spans are preserved throughout the entire assembly process, for better error messages \* Operations on the assembled binary are deferred until the late evaluation stage ### Licensing Relicensed the project under the MIT license, except those parts already licensed under the GPL. ## Updated The vast majority of the application has been rewritten - Syntax errors now highlight the precise location of the invalid token - External dependencies in libmsp430 have been changed: - Removed `regex`: lexer is now fully manual - Added `unicode-ident` to accept valid unicode identifiers - External dependencies in msp430-asm have been changed: - `anes` has been removed - it manages to make ANSI escapes more clunky than they usually are for no material benefit. It has been replaced with a couple hand-rolled macros and constants. ### Lexer - Hand-written from scratch! Token has been rewritten from the ground up to reflect this. - Preprocesses integer literals, including integer base conversion, in the following formats: - `90` | `0d90` | `090`: Decimal (90) - `0xf0`: Hexadecimal (240) - `0o70`: Octal (56) - `0b10`: Binary (2) - Currently contains dormant string-unescaping code borrowed from Conlang. Character escaping is performed for the (currently unused) character literal syntax. I may get around to unescaping strings soon! 🦈 - No longer spans a half-dozen files, filled with arbitrarily-chosen abstractions. - Comments are still not discarded - Still returns Option<Token> ### Preprocessor -Rewritten from the ground up. It does pretty much the same thing as the previous iterator implementation, but in a simpler way, without all the iterator nonsense. - Accepts (without discarding) `.define` statements - Stores a map from token lexemes to their replacement token streams ### Parser - The top-down structure and recursive descent strategy are similar to the old one, but far more maintainable. - Parsing is implemented in terms of a completely restructured Parsable trait, which now takes only the parser by mut reference. - This greatly simplifies the addition of new features, like the... - Brand new expression parsing rules, mentioned above! ### Canonicalizer - Desugars "emulated" instructions into their TwoArg counterparts - i.e. `nop` => `mov cg, cg`, `tst.b r15` => `cmp.b #0, r15` - Eagerly evaluates numeric expressions, terminating if it encounters either late-expression form - This could probably be improved by continuing to eagerly evaluate nested expressions before returning? Thoughts to think about. ### Assembler - Now supports the .org directive in a meaningful way! - Defers jump label resolution and expression evaluation until backpatch time. - Usually, this deferral will be fairly cheap, but it is allowed to perform an arbitrary number of heap allocations/deallocations as expressions are cloned out of the AST and stored in the table. - This overhead could be avoided, at the cost of higher code complexity. # Questions and Todos - [ ] Is late expression evaluation expensive, compared to canonicalization? Profiling memory usage, and copies in particular, could be an avenue for optimization. - [ ] Is it smart to continue early-evaluating nested binary subexpressions after the first failure? It may reduce indirection during late evaluation.
j added the
enhancement
label 2024-02-01 19:51:52 +00:00
j added 12 commits 2024-02-01 19:51:52 +00:00
- Everything has been rewritten
- Modularity is improved somewhat
  - No dependency injection in preprocessor/parser, though
- There are now early and late constant evaluation engines
  - This engine allows for by-value access to already-assembled code
  - Performs basic math operations, remainder, bitwise logic, bit shifts, negation, and bit inversion
  - Also allows for indexing into already-generated code using pointer-arithmetic syntax: `*(&main + 10)`. This is subject to change? It's clunky, and only allows word-aligned access. However, this rewrite is taking far too long, so I'll call the bikeshedding here.
  - Pretty sure this constant evaluation is computationally equivalent to Deadfish?
- TODO: allow embedding unicode characters as numerics in expressions
- ANSI escape codes are stupid simple, and really don't warrant an external dependency
j merged commit dbc5a5fb69 into main 2024-02-01 20:11:02 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: j/msp430-repl#1
No description provided.