feat: variable length pattern matching#685
Conversation
|
JFYI, we have a configured pre-commit hook |
|
@SemyonSinchenko Thanks for letting me know. It seems that it failed due to the formatting. I pushed the formatted result by the |
We have quite a strict rules for scalafix... Just add something like case _ => throw new GraphFramesUnreachableException(), should fix it. |
SemyonSinchenko
left a comment
There was a problem hiding this comment.
LGTM! Thanks @goungoun !
|
Thanks! @SemyonSinchenko. |
rjurney
left a comment
There was a problem hiding this comment.
Please add one more unit test.
|
There are two ways to handle quantified path patterns. Everyone wants to use the pattern such as |
This is excellent. Do you think you know how to implement variable length patterns in another PR? That is a highly desirable, requested feature! |
|
@rjurney I think |
|
I didn't implement the parser side, still tricky, but it is better to see the code to validate what it looks like. |
|
The variable pattern relies on the dataframe |
|
I've added more tests on the edge cases and checked the error message. Not supported: |
|
@rjurney @SemyonSinchenko, There were several commits to cover more cases and edge cases. It is not easy to track changes from the original branch. How can I make it easier for your review? Is it better to open a new PR with a new branch? |
At least if we merge the first PR from you, I won't need anymore to approve each CI run :) |
|
@goungoun I would like to mention it again, that we have a |
|
@SemyonSinchenko Thanks for your help. By the way, the named edge pattern from the Issue #539 is not supported. Not Supported: |
|
@goungoun Tbh Im fine with it. There will be another and more Cypher-like API built on top of the PropertyGraphFrame that will cover that. I would like to leave the GraphFrames pattern matching more for motiffs finding instead of MATCH-like queries. |
|
@rjurney what do you think? |
|
@SemyonSinchenko I referenced the Issue #539 and support named edge. If the edge is named e, it will generate _e1, _e2, _e3. Supported: Not Supported: |
| withClue("Failed to catch parse error with unsupported variable length pattern") { | ||
| intercept[InvalidParseException] { | ||
| Pattern.parse("(u)-[*2..]->(v)") | ||
| } |
There was a problem hiding this comment.
Ahhh thank you for this clarification! After we merge this PR, want to take on variable lengths as well? :)
|
@goungoun please let us know if you need any support for more contributions! Thanks! |
|
@rjurney @SemyonSinchenko It is merged to the master! Thank you so much! |
Thanks for the contribution! |
What changes were proposed in this pull request?
This PR implements variable-length pattern matching. e.g.
(u)-[*3]->(v),(u)-[*1..3]->(v)https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf (page 15)
Why are the changes needed?
Users must manually write a long pattern without the feature. After the change, the parser can handle the pattern by creating intermediate vertices and edges. For example
(u)-[*3]->(v)is converted to(u)-[]->(_v1);(_v1)-[]->(_v2);(_v2)-[]->(v);internally. It also supports named edge(u)-[e*3]->(v)creating _e1, _e2, _e3.The variable length pattern is implemented on top of the fixed-length pattern. It is split into fixed-length pattern. For example
(u)-[*2..5]->(v)is(u)-[*2]->(v)or(u)-[*3]->(v)or(u)-[*4]->(v)or(u)-[*5]->(v). Then union all the results.