-
-
Notifications
You must be signed in to change notification settings - Fork 439
Improve lark-cython compatibility #1528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
return self.postlex.process(tokens) | ||
|
||
def __copy__(self): | ||
return type(self)(self.lexer, copy(self.state), self.postlex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that the postlexer shouldn't be copied too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would say so? i see it as some kind of processor, it holds no data per se about the current states. i could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does hold a state, e.g. https://github.com/lark-parser/lark/blob/master/lark/indenter.py#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, tbh, those shouldn't be instance attributes, but local variables inside of process
. I think this postlexer design should already be broken on the current main if this copy method is called since lexer=PostLexConnector
doesn't get copied either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, i agree those don't really make sense as instance attributes. it makes sense that a postlexer instance should be agnostic to a stream of tokens. this matter is probably beyond the scope of this PR though.
…tting, and support for a dark theme
Docs: Updated link of DSL article to a new version
The "Python type check" test should now work properly if you rebase over master. |
Overall looks good, but I have a few questions -
Also, I still think copy() should make a copy of the postlex too, since it's not an object we have control of. (and if there's no data to copy, the cost is negligible anyway) |
This PR addresses two significant compatibility issues when using lark-cython:
1. Fix Token handling for lark-cython
In the standard lark implementation,
Token
inherits fromstr
, allowing string methods to be called directly onToken
objects. However, lark-cython implementsToken
differently - it doesn't inherit fromstr
, which means string methods likersplit()
won't work on token objects.Changes:
Token
instances with calls ontoken.value
instead2. Refactor postlexer handling architecture
The original implementation used a
PostLexConnector
class that acted as a wrapper around a lexer but only implemented part of theLexer
interface (thelex()
method but notnext_token()
). This partial implementation worked in standard lark through duck typing but caused errors in lark-cython's more strict implementation.Changes:
PostLexConnector
classself.postlex
attribute to store the postlexer separatelyPostLex
while ensuring proper interface implementationThis approach is more consistent with the architectural intent of postlexers as stream processors, while avoiding type compatibility issues in lark-cython.