And a quad, lets say:
\n<http://sq/> <http://pq/> <http://oq/> <http://fq/> .
If I run a query without asking for graph
\nSELECT * WHERE { ?s ?p ?o . }
I am only getting the triple
\nTo get the quad I need to also add the graph pattern:
\nSELECT * WHERE { GRAPH ?g { ?s ?p ?o . } }
Then I only get the quad
\nIn those 2 queries I would expect to get the triple and quad returned (triple should be returned with the default graph URI as ?g
)
I am not sure what is the stance of the SPARQL specs about this, but from a user point of view I find it quite confusing and unexpected (I think most other triplestores I use were always returning the quads stored even when I skipped the graph pattern).
\nI expect we should be able to query quads using directly the triples pattern without the need to add the graph pattern (which makes it easier to query triples across multiple graph without knowing exactly the graph composition).
\nAccording to Oxigraph internal design docs: 9 tables are created to index the triples and quads, 3 for the default triple graphs. 6 for the nquads
\nIt seems like depending on the user providing or not the graph pattern, then either the 3 tables for the default triple graph are used or the 6 tables for the quads
\nA solution could be to make oxigraph working only with nquads by default. With a default graph URI that can be set at startup, when triples are added without specifying the graph, they go to this graph
\nThis would reduce the total number of tables from 9 to 6. And not using the graph pattern keyword could be interpreted as select *
on the graph column
And there could be a switch to start oxigraph in triple mode, without graph, which would use only 3 tables
\nFrom a user and a admin point of view personally I think it is normal a triple store can be either supporting nquads or triples. As long as it is indicated in the service description (I think there is a metadata for this).
\nBy default it would be supporting quads, and even if I add triples they will be converted to quads going to the default graph. But if I don't need the graph, and I value more efficiency and performance for querying my triples, then I can start the triplestore in triple mode.
\nNot sure if that makes sense, @Tpt you probably already thought about this, and I might be missing some information on implementation. But you were mentioning wanting to reduce the number of combinations, we can maybe also reduce the number of tables ;)
","upvoteCount":1,"answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"Hey! Thank you so much for this detailed proposal. It's very appreciated.
\nAccording SPARQL spec by default the triple patterns only queries the default graph: Outside the use of GRAPH, matching is done using the default graph.
in the spec. A lot of triple stores consider the default graph as the union of the default graph and all named graphs, hence also returning <http://sq/> <http://pq/> <http://oq/>
when using the first query.
In Oxigraph I have made the choice of only querying the default graph by default (to enable use case like named graph representing not asserted facts) but provide an option for the \"default graph is the union of all graphs behavior\". This is the set_default_graph_as_union
Rust option, the use_default_graph_as_union=True
option of the query method in Python, the brand new use_default_graph_as_union: true
option of the query method in JS... The fact that you have not seen it make me think this option is maybe not highlighted enough, feel free to point me places in the documentations where you think it would be great to talk about it (PR also welcome of course).
About indexes, triples in the default graph are added to the 3 triple indexes and triples in named graphs are added to the 6 quad indexes. This has been done to transparently use only 3 tables for triples in the default graph and so, provide nearly as good performances as a \"triple-only mode\" without having to set a configuration option explicitly. If I understand correctly your proposal, applying your change would mean 1. creating a configuration option that must be set before loading data and is hard to change 2. store triples in the default graph in 6 indices (the quad ones and not only 3 (the default graph ones), hence increasing the memory footprint significantly in the use case \"a lot of triples in the default graph and few in named graphs\" while only making query evaluation slightly faster (hopefully a bit less data chunks to merge when doing a scan).
\nI hope it helps, maybe there is something I am missing in your proposal. Thank you again!
","upvoteCount":2,"url":"https://github.com/oxigraph/oxigraph/discussions/823#discussioncomment-8756461"}}}-
Currently Oxigraph stores and queries the ntriples and nquads completely separately in the same triplestore. If I add a triple, lets say: <http://st/> <http://pt/> <http://ot/> . And a quad, lets say: <http://sq/> <http://pq/> <http://oq/> <http://fq/> . If I run a query without asking for graph SELECT * WHERE { ?s ?p ?o . } I am only getting the triple To get the quad I need to also add the graph pattern: SELECT * WHERE { GRAPH ?g { ?s ?p ?o . } } Then I only get the quad In those 2 queries I would expect to get the triple and quad returned (triple should be returned with the default graph URI as I am not sure what is the stance of the SPARQL specs about this, but from a user point of view I find it quite confusing and unexpected (I think most other triplestores I use were always returning the quads stored even when I skipped the graph pattern). I expect we should be able to query quads using directly the triples pattern without the need to add the graph pattern (which makes it easier to query triples across multiple graph without knowing exactly the graph composition). A potential solution?According to Oxigraph internal design docs: 9 tables are created to index the triples and quads, 3 for the default triple graphs. 6 for the nquads It seems like depending on the user providing or not the graph pattern, then either the 3 tables for the default triple graph are used or the 6 tables for the quads A solution could be to make oxigraph working only with nquads by default. With a default graph URI that can be set at startup, when triples are added without specifying the graph, they go to this graph This would reduce the total number of tables from 9 to 6. And not using the graph pattern keyword could be interpreted as And there could be a switch to start oxigraph in triple mode, without graph, which would use only 3 tables From a user and a admin point of view personally I think it is normal a triple store can be either supporting nquads or triples. As long as it is indicated in the service description (I think there is a metadata for this). By default it would be supporting quads, and even if I add triples they will be converted to quads going to the default graph. But if I don't need the graph, and I value more efficiency and performance for querying my triples, then I can start the triplestore in triple mode. Not sure if that makes sense, @Tpt you probably already thought about this, and I might be missing some information on implementation. But you were mentioning wanting to reduce the number of combinations, we can maybe also reduce the number of tables ;) |
Beta Was this translation helpful? Give feedback.
-
Hey! Thank you so much for this detailed proposal. It's very appreciated. According SPARQL spec by default the triple patterns only queries the default graph: In Oxigraph I have made the choice of only querying the default graph by default (to enable use case like named graph representing not asserted facts) but provide an option for the "default graph is the union of all graphs behavior". This is the About indexes, triples in the default graph are added to the 3 triple indexes and triples in named graphs are added to the 6 quad indexes. This has been done to transparently use only 3 tables for triples in the default graph and so, provide nearly as good performances as a "triple-only mode" without having to set a configuration option explicitly. If I understand correctly your proposal, applying your change would mean 1. creating a configuration option that must be set before loading data and is hard to change 2. store triples in the default graph in 6 indices (the quad ones and not only 3 (the default graph ones), hence increasing the memory footprint significantly in the use case "a lot of triples in the default graph and few in named graphs" while only making query evaluation slightly faster (hopefully a bit less data chunks to merge when doing a scan). I hope it helps, maybe there is something I am missing in your proposal. Thank you again! |
Beta Was this translation helpful? Give feedback.
Hey! Thank you so much for this detailed proposal. It's very appreciated.
According SPARQL spec by default the triple patterns only queries the default graph: the spec. A lot of triple stores consider the default graph as the union of the default graph and all named graphs, hence also returning
in<http://sq/> <http://pq/> <http://oq/>
when using the first query.In Oxigraph I have made the choice of only querying the default graph by default (to enable use case like named graph representing not asserted facts) but provide an option for the "default graph is the union of all graphs behavior". This is the
set_default_graph…