v0.3.0 #1

j · 2024-02-01T19:51:52Z

j commented

2024-02-01 19:51:52 +00:00

Summary

New

Expressions!

The new expression engine can do wrapping arithmetic on immediate values and absolute addresses!

Supported operations:
- Binary:
  - a * b: Multiplication
  - a / b: Division
  - a % b: Remainder/Modulus
  - a + b: Addition
  - a - b: Subtraction
  - a << b: Left bit-shift
  - a >> b: Right bit-shift
  - a & b: Bitwise and
  - a | b: Bitwise or
  - a ^ b: Bitwise xor
- Unary:
  - -a: Two's complement negation
  - !a: Bitwise not (Inversion)
  - *a: *.org-based address dereference
- AddrOf:
  - &ident: *Resolves the absolute address of an identifier
- Group:
  - (a): Sub-expression grouping
Commas are now required between source and destination, to disambiguate between expressions :(
Span locations are now reported according to byte index within the file
Token spans are preserved throughout the entire assembly process, for better error messages

* Operations on the assembled binary are deferred until the late evaluation stage

Licensing

Relicensed the project under the MIT license, except those parts already licensed under the GPL.

Updated

The vast majority of the application has been rewritten

Syntax errors now highlight the precise location of the invalid token
External dependencies in libmsp430 have been changed:
- Removed regex: lexer is now fully manual
- Added unicode-ident to accept valid unicode identifiers
External dependencies in msp430-asm have been changed:
- anes has been removed - it manages to make ANSI escapes more clunky than they usually are for no material benefit. It has been replaced with a couple hand-rolled macros and constants.

Lexer

Hand-written from scratch! Token has been rewritten from the ground up to reflect this.
Preprocesses integer literals, including integer base conversion, in the following formats:
- 90 | 0d90 | 090: Decimal (90)
- 0xf0: Hexadecimal (240)
- 0o70: Octal (56)
- 0b10: Binary (2)
Currently contains dormant string-unescaping code borrowed from Conlang. Character escaping is performed for the (currently unused) character literal syntax. I may get around to unescaping strings soon! 🦈
No longer spans a half-dozen files, filled with arbitrarily-chosen abstractions.
Comments are still not discarded
Still returns Option

Preprocessor

-Rewritten from the ground up. It does pretty much the same thing as the previous iterator implementation, but in a simpler way, without all the iterator nonsense.

Accepts (without discarding) .define statements
Stores a map from token lexemes to their replacement token streams

Parser

The top-down structure and recursive descent strategy are similar to the old one, but far more maintainable.
Parsing is implemented in terms of a completely restructured Parsable trait, which now takes only the parser by mut reference.
- This greatly simplifies the addition of new features, like the...
Brand new expression parsing rules, mentioned above!

Canonicalizer

Desugars "emulated" instructions into their TwoArg counterparts
- i.e. nop => mov cg, cg, tst.b r15 => cmp.b #0, r15
Eagerly evaluates numeric expressions, terminating if it encounters either late-expression form
- This could probably be improved by continuing to eagerly evaluate nested expressions before returning? Thoughts to think about.

Assembler

Now supports the .org directive in a meaningful way!
Defers jump label resolution and expression evaluation until backpatch time.
- Usually, this deferral will be fairly cheap, but it is allowed to perform an arbitrary number of heap allocations/deallocations as expressions are cloned out of the AST and stored in the table.
- This overhead could be avoided, at the cost of higher code complexity.

Questions and Todos

Is late expression evaluation expensive, compared to canonicalization? Profiling memory usage, and copies in particular, could be an avenue for optimization.
Is it smart to continue early-evaluating nested binary subexpressions after the first failure? It may reduce indirection during late evaluation.

# Summary ## New ### Expressions! The new expression engine can do wrapping arithmetic on immediate values and absolute addresses! - Supported operations: - Binary: - `a * b`: Multiplication - `a / b`: Division - `a % b`: Remainder/Modulus - `a + b`: Addition - `a - b`: Subtraction - `a << b`: Left bit-shift - `a >> b`: Right bit-shift - `a & b`: Bitwise and - `a | b`: Bitwise or - `a ^ b`: Bitwise xor - Unary: - `-a`: Two's complement negation - `!a`: Bitwise not (Inversion) - `*a`: \*`.org`-based address dereference - AddrOf: - `&ident`: \*Resolves the absolute address of an identifier - Group: - `(a)`: Sub-expression grouping - Commas are now required between source and destination, to disambiguate between expressions :( - Span locations are now reported according to byte index within the file - Token spans are preserved throughout the entire assembly process, for better error messages \* Operations on the assembled binary are deferred until the late evaluation stage ### Licensing Relicensed the project under the MIT license, except those parts already licensed under the GPL. ## Updated The vast majority of the application has been rewritten - Syntax errors now highlight the precise location of the invalid token - External dependencies in libmsp430 have been changed: - Removed `regex`: lexer is now fully manual - Added `unicode-ident` to accept valid unicode identifiers - External dependencies in msp430-asm have been changed: - `anes` has been removed - it manages to make ANSI escapes more clunky than they usually are for no material benefit. It has been replaced with a couple hand-rolled macros and constants. ### Lexer - Hand-written from scratch! Token has been rewritten from the ground up to reflect this. - Preprocesses integer literals, including integer base conversion, in the following formats: - `90` | `0d90` | `090`: Decimal (90) - `0xf0`: Hexadecimal (240) - `0o70`: Octal (56) - `0b10`: Binary (2) - Currently contains dormant string-unescaping code borrowed from Conlang. Character escaping is performed for the (currently unused) character literal syntax. I may get around to unescaping strings soon! 🦈 - No longer spans a half-dozen files, filled with arbitrarily-chosen abstractions. - Comments are still not discarded - Still returns Option<Token> ### Preprocessor -Rewritten from the ground up. It does pretty much the same thing as the previous iterator implementation, but in a simpler way, without all the iterator nonsense. - Accepts (without discarding) `.define` statements - Stores a map from token lexemes to their replacement token streams ### Parser - The top-down structure and recursive descent strategy are similar to the old one, but far more maintainable. - Parsing is implemented in terms of a completely restructured Parsable trait, which now takes only the parser by mut reference. - This greatly simplifies the addition of new features, like the... - Brand new expression parsing rules, mentioned above! ### Canonicalizer - Desugars "emulated" instructions into their TwoArg counterparts - i.e. `nop` => `mov cg, cg`, `tst.b r15` => `cmp.b #0, r15` - Eagerly evaluates numeric expressions, terminating if it encounters either late-expression form - This could probably be improved by continuing to eagerly evaluate nested expressions before returning? Thoughts to think about. ### Assembler - Now supports the .org directive in a meaningful way! - Defers jump label resolution and expression evaluation until backpatch time. - Usually, this deferral will be fairly cheap, but it is allowed to perform an arbitrary number of heap allocations/deallocations as expressions are cloned out of the AST and stored in the table. - This overhead could be avoided, at the cost of higher code complexity. # Questions and Todos - [ ] Is late expression evaluation expensive, compared to canonicalization? Profiling memory usage, and copies in particular, could be an avenue for optimization. - [ ] Is it smart to continue early-evaluating nested binary subexpressions after the first failure? It may reduce indirection during late evaluation.

j added the

enhancement

label 2024-02-01 19:51:52 +00:00

j added 12 commits 2024-02-01 19:51:52 +00:00

v0.3.0: Total overhaul fc8f8b9622

- Everything has been rewritten
- Modularity is improved somewhat
  - No dependency injection in preprocessor/parser, though
- There are now early and late constant evaluation engines
  - This engine allows for by-value access to already-assembled code
  - Performs basic math operations, remainder, bitwise logic, bit shifts, negation, and bit inversion
  - Also allows for indexing into already-generated code using pointer-arithmetic syntax: `*(&main + 10)`. This is subject to change? It's clunky, and only allows word-aligned access. However, this rewrite is taking far too long, so I'll call the bikeshedding here.
  - Pretty sure this constant evaluation is computationally equivalent to Deadfish?

grammar.ebnf: Commit incomplete grammar description 22ade3750e

Add one of my old Microcorruption solutions as an example b31295ad21

- TODO: allow embedding unicode characters as numerics in expressions

lexer: Fix bug that did not check the second character of identifiers. This also fixes one-character identifiers not being properly detected. 5a77985b39

msp430-asm: Remove ANES as a dependency 6b5663ae4e

- ANSI escape codes are stupid simple, and really don't warrant an external dependency

lexer: Fix copy+paste error in greater. Now emits Rsh tokens~! af89541af1

parser: Refactor Parsable to align with assembler::Assemble 860c9d4a97

Unit tests: Add more lexer tests, add parser tests 11bae9b348

util: Only contains Span, so rename module to span.rs a63a4b7ece

cargo doc: Fix linking errors in doc comments f6c1914720

msp430-asm: Make UI a little bet prettier 618200dc42

Update copyright header e8fbae9837

j merged commit dbc5a5fb69 into main

2024-02-01 20:11:02 +00:00

j referenced this issue from a commit

2024-02-01 20:11:03 +00:00

Merge pull request 'v0.3.0' (#1) from v0.3.0 into main

Sign in to join this conversation.

No reviewers

No Label

No Milestone

No project

No Assignees

1 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: j/msp430-repl#1