add another foaf to demonstrate failures by neonstalwart · Pull Request #105 · levelgraph/levelgraph

neonstalwart · 2014-12-16T17:19:27Z

in trying to build a failing test for #104, i added another foaf so that the foaf queries in the tests would match 2 friends rather than just one. in adding this extra triple, i think i've uncovered a number of other issues. i've opened this PR as a way to have a discussion around whether or not my expectations are correct. i'll make more specific notes on the commit to help give context for the discussion.

neonstalwart · 2014-12-16T17:46:43Z

test/navigator_spec.js

this test case is the one which i'm concerned about. here's my reasoning for how i arrive at a solution. for reference, here is a table with all the friend relationships

subject object

matteo daniele

daniele matteo

daniele davide

daniele marco

lucio matteo

lucio marco

marco davide

my mental model for the navigator API is that each step follows the predicate from each of the nodes in the previous result set and then those nodes become the starting set for the next step. below is what i expect to be the working set after each step in db.nav('davide').archIn('friend').archIn('friend').archOut('friend')

db.nav('davide') .archIn('friend') .archIn('friend') .archOut('friend')

['davide'] [ 'marco', 'daniele' ] ['daniele', 'lucio', 'matteo'] ['matteo', 'davide', 'marco', 'matteo', 'marco', 'daniele'] with duplicates removed is ['matteo', 'davide', 'marco', 'daniele']

i've walked through the navigator code and it essentially produces the same search query:

db.search([ { subject: db.v('x'), predicate: 'friend', object: 'davide' }, { subject: db.v('y'), predicate: 'friend', object: db.v('x') }, { subject: db.v('y'), predicate: 'friend', object: db.v('z') } ], function (err, results) { var solutions = results.map(function (result) { return result.z; }); // ... })

for some reason though, whether you use the navigator api or the search, the result being produced by the code currently is [ 'daniele' ] which doesn't seem right.

as i walked through the code, i noticed that the query planner was sorting the search queries based on the approximate size. it seems that the sorting may put the queries into an order that can break my assumption that the output for one query produces the starting nodes for the next query. do i have the wrong mental model of how queries are applied?

You are definitely right about the bug, but you are wrong about the query planner.

Each step is running in parallel if it is sort-joinable (with solutions that flows in a pipeline), and in series if it is a basic join (with solutions that flows in a pipeline). The model is definitely complex but fast. The difference between the two is that the first only activate a single ReadStream from LevelUp, while the second creates a ReadStream for each partial solution. As you can imagine, the first is way faster. However, the first cannot be applied to all cases, and the query planner job is to check which type of join to use. The ordering based on size is non relevant, as all steps should be created to run in whatever order because a Graph pattern is declarative.

Can you please verify that passing { joinAlgorithm: 'basic' } to force the basic join, so we can check if it's the query planner or worse.

thanks for the explanation. 51e26a7 forces { joinAlgorithm: 'basic' } but no luck - it's still just giving one result.

neonstalwart · 2014-12-16T17:53:06Z

another failure introduced by this commit was for sort join algorithm - should do a join with three conditions. for some reason now it only returns 2 results when it should return 4.

mcollina · 2014-12-16T19:02:12Z

It can definitely be the same bug, but let's solve one a time.

mcollina · 2014-12-17T06:41:36Z

The joinAlgorithm option is a db-wide setup: https://github.com/mcollina/levelgraph/blob/master/test/abstract_join_algorithm.js#L10.

Can you please add the same test to the test/abstract_join_algorithn.js, so we know exactly if the basic join can solve the problem, and it is a planner error.

neonstalwart · 2014-12-17T15:22:21Z

good news - the tests pass in test/abstract_join_algorithm.js 🎉

the version using db.search includes the duplicates and the db.nav version removes duplicates - all as expected.

fwiw, it looks like maybe the broken version is returning the results from only one stream rather than merging the results together? i'm not sure where to dig in the code to confirm if that's what's happening but maybe that triggers an idea in your mind.

this thought is based on the following...

with the original data:

subject	object
matteo	daniele
daniele	matteo
daniele	marco
lucio	matteo
lucio	marco
marco	davide

`db.nav('davide')`	`.archIn('friend')`	`.archIn('friend')`	`.archOut('friend')`
`['davide']`	`['marco']`	`['daniele', 'lucio']`	`['matteo', 'marco', 'matteo', 'marco']` with duplicates removed is `['matteo', 'marco']`

by coincidence, db.nav('daniele').archOut('friend') and db.nav('lucio').archOut('friend') give the same results and so if there was a problem with merging those 2 then it wouldn't be found.

in contrast, adding davide as a friend of daniele gives us:

`db.nav('davide')`	`.archIn('friend')`	`.archIn('friend')`	`.archOut('friend')`
`['davide']`	`[ 'marco', 'daniele' ]`	`['daniele', 'lucio', 'matteo']`	`['matteo', 'davide', 'marco', 'matteo', 'marco', 'daniele']` with duplicates removed is `['matteo', 'davide', 'marco', 'daniele']`

the broken result is just ['daniele'] which is the same as db.nav('matteo').archOut('friend') which leads me to suspect that maybe the issue is that the db.nav('daniele').archOut('friend') and db.nav('lucio').archOut('friend') results are not being merged.

mcollina · 2014-12-17T16:09:39Z

So, it seems a query planner issue to me, it is trying to use the SortJoin algorithm while it cannot do it. Can you check if the sort_join tests fails now? Travis highlight only the basic one that is passing.

neonstalwart · 2014-12-17T16:11:32Z

please move this test out of basic_algorithm, as I want the navigator to have its own tests

sure, this PR was just for discussion but i can do that.

Can you check if the sort_join tests fails now?

i'll look into it

neonstalwart · 2014-12-17T16:20:56Z

i added a mocha npm script that doesn't bail on the first error so that running npm run mocha will show all failures.

sort_join fails - it produces 1 result (the one with daniele) when it should be 6 results.

mcollina · 2014-12-17T16:25:21Z

Exactly what I originally thought. The key problem is that the sort-join can only be used on a limited subset of possible queries, basically the ones that flows in one direction of the graph. What you are doing in that query is that you are 'returning back', e.g. the last archOut has in the possible space of solutions the same stuff that is in one of the previous step.

BTW, would you like to become one of the LevelGraph maintainers? I definitely need help. Check the CONTIRIBUTING.md and let me know if you are ok with that :).

neonstalwart · 2014-12-17T16:31:33Z

i'm fine with the CONTRIBUTING.md. right now i'm in a crunch for time but then again everyone is always in a crunch for time so that's irrelevant. after looking at a number of graph databases, i've gone with levelgraph because the alternatives i looked at had issues and i am confident that i can hack on levelgraph if i have any issues that need to be resolved. given that i'm depending on this project, i'd be glad to do what i can to help maintain it.

if you want to give me some pointers about what i might need to look at to address this issue, i don't mind digging in deeper and reporting back what i can find.

mcollina · 2014-12-17T16:54:47Z

The culprit is https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L93.

As it seems, https://github.com/mcollina/levelgraph/blob/master/lib/queryplanner.js#L133-L135 is missing a check for this particular case, as you want to stop using a SortJoin when you hit this particular case. Does it make sense?

add another foaf to demonstrate failures

1ad8d7c

neonstalwart reviewed Dec 16, 2014
View reviewed changes

neonstalwart mentioned this pull request Dec 16, 2014

filterTwoArguments not working #104

Open

force basic join algorithm

51e26a7

add tests to abstract join algorithm tests

5393225

neonstalwart force-pushed the more-tests branch from 70cfe38 to 5393225 Compare December 17, 2014 15:04

neonstalwart added 2 commits December 17, 2014 10:17

add mocha npm script that doesn't bail

fd2ff92

remove navigator test from abstract join algorithm

c715ec6

Conversation

neonstalwart commented Dec 16, 2014

Uh oh!

neonstalwart Dec 16, 2014

Choose a reason for hiding this comment

Uh oh!

mcollina Dec 16, 2014

Choose a reason for hiding this comment

Uh oh!

neonstalwart Dec 16, 2014

Choose a reason for hiding this comment

Uh oh!

neonstalwart commented Dec 16, 2014

Uh oh!

mcollina commented Dec 16, 2014

Uh oh!

mcollina commented Dec 17, 2014

Uh oh!

neonstalwart commented Dec 17, 2014

Uh oh!

mcollina commented Dec 17, 2014

Uh oh!

neonstalwart commented Dec 17, 2014

Uh oh!

neonstalwart commented Dec 17, 2014

Uh oh!

mcollina commented Dec 17, 2014

Uh oh!

neonstalwart commented Dec 17, 2014

Uh oh!

mcollina commented Dec 17, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants