-
-
Notifications
You must be signed in to change notification settings - Fork 439
Rethinking the import system #1516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @Niki4tap ! It's possible that the current design isn't ideal. IIRC when I made the choice, I was trying to match the behavior of importing terminals, and I didn't take nested transformers into consideration. The question remains, whether both behaviors are useful, or only the name spaced behavior is correct. Because if the latter is true, we should just change the behavior of import, instead of introducing a new keyword. Regarding the implementation, I think it might be a bit over complicated. Unless I'm missing something, "use" can use the existing import mechanism, with the only difference of renaming the imported rule at the end. @MegaIng thoughts? |
Hi,
Yeah I wasn't sure about this, hence the new keyword, and even if that's the case, I'm not sure about backwards compatibility, maybe something depends on the current behavior. The renaming should allow for switching back to old behavior if we go with changing the
Yes, that's what I was trying to do, but I didn't find a way to rename the rule in the current grammar, as changing I'll look around some more but if you have any specific pointers, it would be appreciated. Also I'll look at the CI fails in a bit. |
Had to work around types for a bit, but I think CI should be green now. |
I think we should at some point reconsider the entire import system. There are a few issues I can think of:
For the last part, I think something like this would be good:
And whatever renaming scheme we decide upon is handled internally. I haven't read the code of this PR, but I don't like the general idea adding an additional directive - I think having two directives with similar behavior is a bad idea. Instead we should consider how we can make a breaking change here. |
In the past we made breaking changes by adding a switch, such as I agree we should fix problems with diamond-shaped imports, but it doesn't have to be in this PR. The biggest "fix" I'd like is for %ignore to be importable (by being local to rules), but that's a big and complicated task. |
Actually, I don't think I've noticed any?
Again, maybe the luck has spared me, but I think I have a couple in my project, and I haven't had any issues, except aforementioned namespacing troubles.
That's how it would look like if the proposed namespacing changes are applied: class LeafGrammarTransformer(Transformer): ...
class SubGrammarTransformer(Transformer): ...
class MainGrammarTransformer(Transformer): ...
def make_transformer(...) -> Transformer:
leaf_grammar_transformer = LeafGrammarTransformer()
sub_grammar_transformer = merge_transformers(SubGrammarTransformer(), leaf = leaf_grammar_transformer)
return merge_transformers(MainGrammarTransformer(), sub = sub_grammar_transformer, leaf = leaf_grammar_transformer) And mostly how it looks like right now if your grammar is 1 level deep, if it's more, you still have to do this, but also mixin imported implementations, and manually namespace in the grammar.
I'm completely fine with changing current
Sounds fine!
I can take a peek at it, if pointed to the problem directly, since I haven't encountered one in the wild.
Yeah, this sounds like a good idea, though I'm not sure how it would work, since I can change the implementation of Also would be nice to see what tests fail or trip up on this, since we probably want to consider those usecases. |
Dove a bit deeper into the implementation, made the change to Simple stuff works, but a few tests are failing ( I'll try to fix them as soon as I'll have some more time, but for now feel free to look at the implementation (I probably missed something important) Failing test cases:
|
We do want to be backwards compatible, and changing the tests kind of beats the purpose of having them in the first place :) |
Oh yeah, I forgot about that, we probably want to test the new implementation as well though, so I might need to do the same thing as with the lexers/parsers, but with old and new imports |
A couple updates: 1. Added an option, and refactored some of the code (again), to allow for legacy imports. 2. Hacked together a solution to run tests that involve importing with both legacy imports and new ones. With that, only 1 old test is failing, the cache one, I'll look into it hopefully tomorrow. Not sure why, but it seems like it's still trying to load the grammar with 3. Well, I've got good news and bad ones: Imagine a grammar layout like this: # lib.lark
a: "b" # nested.lark
%import lib.a # main.lark
start: a
%import nested.a Important thing to note is that So the new import system would expect definitions to look like:
But what happens in reality is:
And it throws an error with From here I see 2 potential solutions: I personally prefer the second one (although it might be a little complicated), but I'm not exactly sure about it, and I want to hear some opinions on it. Related: This is also why I added P. S. I've also added |
4368cfe
to
61c12c5
Compare
Hello again, I think I've fixed all of the tests now. Turns out cache test wasn't an issue, but output was different due to defining the Interactive parser test just needed the namespacing. And I worked around the re-import issue by doing none of what I have proposed before, but linking new names to old definitions (override now overwrites the existing definition instead of creating a new one and replacing the old one). With that all the tests are passing on my machine now and pre-commit also doesn't error so I think the CI should be green again. Nothing to be done on my side for now, I think this should be ready for a review. cc @erezsh |
What a weird type error, I don't think I even touched that code in this pr, but should be fixed anyways. |
Before I go deeply into the implementation, can you explain in a few words why it took changing so many lines? I mean, if the only difference in behavior is renaming the imported rule, why was it necessary to update so many different parts? |
I've split up the changes across commits, so it would be more reviewable, if you're looking just at the diff number github gives you in the pr it hides a lot of things. I think most of the changes are test-related, the commit with the actual implementation has the diff numbers of |
For now, I'm talking mostly about the changes in load_grammar.py
That's part of what I'm asking. Why did you need to duplicate it? |
Ah I guess I didn't necessarily need to, when I was starting out I thought it would be littered with |
Yes, minimal code is preferred.. |
Unified the function and rebased. Also set |
Hello Lark developers!
I've tried using this library in a personal project, and really liked it, thank you for it!
The problem
...except one thing, which was grammar composition. Lark docs make you think it has a nice module system where you can just import rules and terminals and everything magically works... but for some reason it doesn't.
Rules don't compose well since when you
%import
them, the grammar module that imported the rule is now responsible for parsing it since the rule has namespace of the current grammar module assigned to it, and funnily enough, every rule that it depends on still gets the imported grammar module prefix.So if you had a layout like this:
Parse tree for "b" would look like:
which makes approximately 0 sense to me 😄 (this has been discussed before in lark, and in my opinion the "official solution" came out looking as more of a workaround than something to be used and recommended to others)
Main grammar is supposed to parse imported terminal, but none of its children. Which is a very odd way of making the import system work. For example, if multiple grammar modules import the same terminal, they would all be responsible for parsing it, but not its children, which in transformer implementation terms means that importing transformers would have to inherit the implementation for parsing the imported terminal, and then on top of that get
merge_transformers
'd with thelib
transformer (which also wouldn't be responsible for parsing one of its own rules?).All of this makes working with multiple grammars very inconvenient.
The solution
%use
directive.More is explained in added docs, but here is a quick excerpt:
Changing
%import
to%use
in the example above, and parsing same "b" will now result in:This allows to simply compose transformers with
merge_transformers
and not worry about inheritance, mixins, or anything else for such a simple example!Compatibility with
%import
%use
is currently incompatible with%import
, i. e. you cannot do this:or in the other direction, but you can import/use on different modules:
The implementation
I am not going to lie, the implementation could use some design work, but right now it works as is.
Any comments or ideas on how to make it more robust will be appreciated as I am new to this codebase.
Currently it creates new group of aliases called
use_aliases
and uses that to rewrite all_define
s, and that's about it. I am a bit worried it has some hidden bugs I haven't considered, but I hope for them to get ironed out during the code review :)Other concerns
Of course, all of these issues could be solved with
But in my opinion, lark shouldn't impose a hell of mixins, inheritance, and a ton of other OOP boilerplate on its users just to do such a simple thing as basic grammar composition, especially when it can be "just fixed" in a hundred lines or so in lark itself, for basically no cost.
Another option would be to enforce manually prefixing imported rules, which doesn't scale. Like at all.
And I guess you could always just string together all files and implement all rules and terminals in one transformer, but then you would have to manually namespace all rules and terminals, and their uses, not just
%import
s.Lastly...
This isn't supposed to be final, I'm open to comments and suggestions! The tests aren't supposed to be final either, I can write some more, the only one I added is just supposed to show how it all works :)
P.S.: I've also added minimal
.editorconfig
to the repository since my editor always wanted to convert everything to tabs, maybe it will also help other contributorsFootnotes
https://github.com/lark-parser/lark/issues/531#issuecomment-593422234 ↩