Skip to content

add another foaf to demonstrate failures#105

Open
neonstalwart wants to merge 5 commits intolevelgraph:masterfrom
neonstalwart:more-tests
Open

add another foaf to demonstrate failures#105
neonstalwart wants to merge 5 commits intolevelgraph:masterfrom
neonstalwart:more-tests

Conversation

@neonstalwart
Copy link
Copy Markdown
Collaborator

in trying to build a failing test for #104, i added another foaf so that the foaf queries in the tests would match 2 friends rather than just one. in adding this extra triple, i think i've uncovered a number of other issues. i've opened this PR as a way to have a discussion around whether or not my expectations are correct. i'll make more specific notes on the commit to help give context for the discussion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is the one which i'm concerned about. here's my reasoning for how i arrive at a solution. for reference, here is a table with all the friend relationships

subject object
matteo daniele
daniele matteo
daniele davide
daniele marco
lucio matteo
lucio marco
marco davide

my mental model for the navigator API is that each step follows the predicate from each of the nodes in the previous result set and then those nodes become the starting set for the next step. below is what i expect to be the working set after each step in db.nav('davide').archIn('friend').archIn('friend').archOut('friend')

db.nav('davide') .archIn('friend') .archIn('friend') .archOut('friend')
['davide'] [ 'marco', 'daniele' ] ['daniele', 'lucio', 'matteo'] ['matteo', 'davide', 'marco', 'matteo', 'marco', 'daniele'] with duplicates removed is ['matteo', 'davide', 'marco', 'daniele']

i've walked through the navigator code and it essentially produces the same search query:

db.search([
  {
    subject: db.v('x'),
    predicate: 'friend',
    object: 'davide'
  }, 
  {
    subject: db.v('y'),
    predicate: 'friend',
    object: db.v('x')
  }, 
  {
    subject: db.v('y'),
    predicate: 'friend',
    object: db.v('z')
  }
], function (err, results) {
    var solutions = results.map(function (result) { return result.z; });
    // ...
})

for some reason though, whether you use the navigator api or the search, the result being produced by the code currently is [ 'daniele' ] which doesn't seem right.

as i walked through the code, i noticed that the query planner was sorting the search queries based on the approximate size. it seems that the sorting may put the queries into an order that can break my assumption that the output for one query produces the starting nodes for the next query. do i have the wrong mental model of how queries are applied?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are definitely right about the bug, but you are wrong about the query planner.

Each step is running in parallel if it is sort-joinable (with solutions that flows in a pipeline), and in series if it is a basic join (with solutions that flows in a pipeline). The model is definitely complex but fast. The difference between the two is that the first only activate a single ReadStream from LevelUp, while the second creates a ReadStream for each partial solution. As you can imagine, the first is way faster. However, the first cannot be applied to all cases, and the query planner job is to check which type of join to use. The ordering based on size is non relevant, as all steps should be created to run in whatever order because a Graph pattern is declarative.

Can you please verify that passing { joinAlgorithm: 'basic' } to force the basic join, so we can check if it's the query planner or worse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation. 51e26a7 forces { joinAlgorithm: 'basic' } but no luck - it's still just giving one result.

@neonstalwart
Copy link
Copy Markdown
Collaborator Author

another failure introduced by this commit was for sort join algorithm - should do a join with three conditions. for some reason now it only returns 2 results when it should return 4.

@mcollina
Copy link
Copy Markdown
Collaborator

It can definitely be the same bug, but let's solve one a time.

@mcollina
Copy link
Copy Markdown
Collaborator

The joinAlgorithm option is a db-wide setup: https://github.com/mcollina/levelgraph/blob/master/test/abstract_join_algorithm.js#L10.

Can you please add the same test to the test/abstract_join_algorithn.js, so we know exactly if the basic join can solve the problem, and it is a planner error.

@neonstalwart
Copy link
Copy Markdown
Collaborator Author

good news - the tests pass in test/abstract_join_algorithm.js 🎉

the version using db.search includes the duplicates and the db.nav version removes duplicates - all as expected.

fwiw, it looks like maybe the broken version is returning the results from only one stream rather than merging the results together? i'm not sure where to dig in the code to confirm if that's what's happening but maybe that triggers an idea in your mind.

this thought is based on the following...

with the original data:

subject object
matteo daniele
daniele matteo
daniele marco
lucio matteo
lucio marco
marco davide
db.nav('davide') .archIn('friend') .archIn('friend') .archOut('friend')
['davide'] ['marco'] ['daniele', 'lucio'] ['matteo', 'marco', 'matteo', 'marco'] with duplicates removed is ['matteo', 'marco']

by coincidence, db.nav('daniele').archOut('friend') and db.nav('lucio').archOut('friend') give the same results and so if there was a problem with merging those 2 then it wouldn't be found.

in contrast, adding davide as a friend of daniele gives us:

db.nav('davide') .archIn('friend') .archIn('friend') .archOut('friend')
['davide'] [ 'marco', 'daniele' ] ['daniele', 'lucio', 'matteo'] ['matteo', 'davide', 'marco', 'matteo', 'marco', 'daniele'] with duplicates removed is ['matteo', 'davide', 'marco', 'daniele']

the broken result is just ['daniele'] which is the same as db.nav('matteo').archOut('friend') which leads me to suspect that maybe the issue is that the db.nav('daniele').archOut('friend') and db.nav('lucio').archOut('friend') results are not being merged.

@mcollina
Copy link
Copy Markdown
Collaborator

So, it seems a query planner issue to me, it is trying to use the SortJoin algorithm while it cannot do it. Can you check if the sort_join tests fails now? Travis highlight only the basic one that is passing.

@neonstalwart
Copy link
Copy Markdown
Collaborator Author

please move this test out of basic_algorithm, as I want the navigator to have its own tests

sure, this PR was just for discussion but i can do that.

Can you check if the sort_join tests fails now?

i'll look into it

@neonstalwart
Copy link
Copy Markdown
Collaborator Author

i added a mocha npm script that doesn't bail on the first error so that running npm run mocha will show all failures.

sort_join fails - it produces 1 result (the one with daniele) when it should be 6 results.

@mcollina
Copy link
Copy Markdown
Collaborator

Exactly what I originally thought. The key problem is that the sort-join can only be used on a limited subset of possible queries, basically the ones that flows in one direction of the graph. What you are doing in that query is that you are 'returning back', e.g. the last archOut has in the possible space of solutions the same stuff that is in one of the previous step.

BTW, would you like to become one of the LevelGraph maintainers? I definitely need help. Check the CONTIRIBUTING.md and let me know if you are ok with that :).

@neonstalwart
Copy link
Copy Markdown
Collaborator Author

i'm fine with the CONTRIBUTING.md. right now i'm in a crunch for time but then again everyone is always in a crunch for time so that's irrelevant. after looking at a number of graph databases, i've gone with levelgraph because the alternatives i looked at had issues and i am confident that i can hack on levelgraph if i have any issues that need to be resolved. given that i'm depending on this project, i'd be glad to do what i can to help maintain it.

if you want to give me some pointers about what i might need to look at to address this issue, i don't mind digging in deeper and reporting back what i can find.

@mcollina
Copy link
Copy Markdown
Collaborator

The culprit is https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L93.

As it seems, https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L133-L135 is missing a check for this particular case, as you want to stop using a SortJoin when you hit this particular case. Does it make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants