add another foaf to demonstrate failures#105
add another foaf to demonstrate failures#105neonstalwart wants to merge 5 commits intolevelgraph:masterfrom
Conversation
test/navigator_spec.js
Outdated
There was a problem hiding this comment.
this test case is the one which i'm concerned about. here's my reasoning for how i arrive at a solution. for reference, here is a table with all the friend relationships
| subject | object |
|---|---|
| matteo | daniele |
| daniele | matteo |
| daniele | davide |
| daniele | marco |
| lucio | matteo |
| lucio | marco |
| marco | davide |
my mental model for the navigator API is that each step follows the predicate from each of the nodes in the previous result set and then those nodes become the starting set for the next step. below is what i expect to be the working set after each step in db.nav('davide').archIn('friend').archIn('friend').archOut('friend')
db.nav('davide') |
.archIn('friend') |
.archIn('friend') |
.archOut('friend') |
|---|---|---|---|
['davide'] |
[ 'marco', 'daniele' ] |
['daniele', 'lucio', 'matteo'] |
['matteo', 'davide', 'marco', 'matteo', 'marco', 'daniele'] with duplicates removed is ['matteo', 'davide', 'marco', 'daniele'] |
i've walked through the navigator code and it essentially produces the same search query:
db.search([
{
subject: db.v('x'),
predicate: 'friend',
object: 'davide'
},
{
subject: db.v('y'),
predicate: 'friend',
object: db.v('x')
},
{
subject: db.v('y'),
predicate: 'friend',
object: db.v('z')
}
], function (err, results) {
var solutions = results.map(function (result) { return result.z; });
// ...
})for some reason though, whether you use the navigator api or the search, the result being produced by the code currently is [ 'daniele' ] which doesn't seem right.
as i walked through the code, i noticed that the query planner was sorting the search queries based on the approximate size. it seems that the sorting may put the queries into an order that can break my assumption that the output for one query produces the starting nodes for the next query. do i have the wrong mental model of how queries are applied?
There was a problem hiding this comment.
You are definitely right about the bug, but you are wrong about the query planner.
Each step is running in parallel if it is sort-joinable (with solutions that flows in a pipeline), and in series if it is a basic join (with solutions that flows in a pipeline). The model is definitely complex but fast. The difference between the two is that the first only activate a single ReadStream from LevelUp, while the second creates a ReadStream for each partial solution. As you can imagine, the first is way faster. However, the first cannot be applied to all cases, and the query planner job is to check which type of join to use. The ordering based on size is non relevant, as all steps should be created to run in whatever order because a Graph pattern is declarative.
Can you please verify that passing { joinAlgorithm: 'basic' } to force the basic join, so we can check if it's the query planner or worse.
There was a problem hiding this comment.
thanks for the explanation. 51e26a7 forces { joinAlgorithm: 'basic' } but no luck - it's still just giving one result.
|
another failure introduced by this commit was for |
|
It can definitely be the same bug, but let's solve one a time. |
|
The Can you please add the same test to the |
70cfe38 to
5393225
Compare
|
good news - the tests pass in the version using fwiw, it looks like maybe the broken version is returning the results from only one stream rather than merging the results together? i'm not sure where to dig in the code to confirm if that's what's happening but maybe that triggers an idea in your mind. this thought is based on the following... with the original data:
by coincidence, in contrast, adding davide as a friend of daniele gives us:
the broken result is just |
|
So, it seems a query planner issue to me, it is trying to use the SortJoin algorithm while it cannot do it. Can you check if the sort_join tests fails now? Travis highlight only the basic one that is passing. |
sure, this PR was just for discussion but i can do that.
i'll look into it |
|
i added a sort_join fails - it produces 1 result (the one with daniele) when it should be 6 results. |
|
Exactly what I originally thought. The key problem is that the sort-join can only be used on a limited subset of possible queries, basically the ones that flows in one direction of the graph. What you are doing in that query is that you are 'returning back', e.g. the last BTW, would you like to become one of the LevelGraph maintainers? I definitely need help. Check the CONTIRIBUTING.md and let me know if you are ok with that :). |
|
i'm fine with the CONTRIBUTING.md. right now i'm in a crunch for time but then again everyone is always in a crunch for time so that's irrelevant. after looking at a number of graph databases, i've gone with levelgraph because the alternatives i looked at had issues and i am confident that i can hack on levelgraph if i have any issues that need to be resolved. given that i'm depending on this project, i'd be glad to do what i can to help maintain it. if you want to give me some pointers about what i might need to look at to address this issue, i don't mind digging in deeper and reporting back what i can find. |
|
The culprit is https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L93. As it seems, https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L133-L135 is missing a check for this particular case, as you want to stop using a SortJoin when you hit this particular case. Does it make sense? |
in trying to build a failing test for #104, i added another foaf so that the foaf queries in the tests would match 2 friends rather than just one. in adding this extra triple, i think i've uncovered a number of other issues. i've opened this PR as a way to have a discussion around whether or not my expectations are correct. i'll make more specific notes on the commit to help give context for the discussion.