intermine.bio2rdf.org
A QLever SPARQL endpoint for InterMine databases
François Belleau
ISMB 2024 BOSC 12 July 2024
2
Contents
• InterMine project
• Semantic Web concepts
• Data transformation process from REST to RDF
• InterMine SPARQL endpoint demo
• Concluding remarks
 InterMine is an open source data warehouse system,
licensed under the LGPL 2.1. It is used to create
databases of biological data accessed by sophisticated
web query.
 InterMine includes a user-friendly web interface that
works 'out of the box' and can be easily customised.
 InterMine makes it easy to integrate multiple data
sources into a single data warehouse.
 There is 19 InterMine databases are available.
https://en.wikipedia.org/wiki/InterMine
InterMine Project
3
FlyMine query :
Pathway identifier and name for the selected gene with FlyMine
4
https://www.flymine.org/flymine
InterMine : Programming API available
5
 Semantic Web is an extension of the World Wide Web
that create a web of data where machines can
understand the meaning and relationships between
things.
 RDF provides a standard way to represent information
in the form of triples: (subject, predicate, object).
 SPARQL is a query language specifically designed
for retrieving and manipulating data stored in RDF
format in a triplestore.
Semantic Web concepts
6
20 years of Semantic web in Life Science evolution
7
ISMB 2005 ISMB 2008
Linked Data 2005
2014
Linked Data 2024 for life science
Linked Data 2009
SPARQL endpoints today…and many more
8
https://www.genome.jp/sparql/linkdb
https://disease-ontology.org/do-kb/sparql
https://sparql.uniprot.org/
https://id.nlm.nih.gov/mesh/query
https://query.wikidata.org/
https://rdfportal.org/dataset/pdbj
UniProt
MeSH
Kegg
DO
PDP Wikipedia
InterMine SPARQL endpoints are missing.
9
MO-LD project : InterMine RDF conversion first attempt
10
https://github.com/mo-ld
DÉRASPE, Maxime, BINKLEY, Gail, BUTANO, Daniela, et al. Making linked data SPARQL with the
InterMine biological data warehouse. In : CEUR Workshop Proceedings. Rheinisch-Westfaelische
Technische Hochschule Aachen* Lehrstuhl Informatik V, 2016.
• Data transformation process from REST to RDF
11
http://es.kibio.science
QLever
triplestore
DB collection :
• FlyMine
• WormMine
• YeastMine
http://intermine.bio2rdf.org
SPARQL endpoints - QLever UI
• intermine-linkml-classe
• intermine-linkml-field
• intermine-DB-relation
• Intermine-DB-object
Python InterMine REST API • DB-relation.nt.gz
• DB-object.nt.gz qlever index
intermine2linkml.py
intermine2os.py
elasticdump
https://huggingface.co/datasets/bio2rdf/intermine
json_gz2nt_gz.py
• DB-relation.ndjson.gz
• DB-object.ndjson.gz
• linkml-DB.yaml
linkml2es.py
http://intermine.bio2rdf.org:7000
12
Show the Pathway identifier(s) and name for the selected gene
13
SPARQL query
InterMine query
14
Concluding remarks
• Semantic Web as evolved with new technologies :
• JSON-LD (https://json-ld.org/)
• LinkML (https://linkml.io/)
• QLever triplestore (https://qlever.cs.uni-
freiburg.de)
• Converting JSON from REST API to RDF is a simple
approach
• Future works :
• Other InterMine SPARQL endpoints will be added
• We will explore SPARQL query generation with LLM
15
Acknowlegments
• Collaborators
• Gos Micklem (Cambridge University)
• Deepak Unni (SIB, Swiss Institute of Bioinformatics)
• Arnaud Droit (ADLab)
• Funding
• BioHackathon 2023 organizers
• BOSC 2024 Organizing Committee
1
1
http://intermine.bio2rdf.org:7000
https://huggingface.co/datasets/bio2rdf/intermine
Try it
Get data and scripts
francois.belleau@gmail.com

intermine.bio2rdf.org : A QLever SPARQL endpoint

  • 1.
    intermine.bio2rdf.org A QLever SPARQLendpoint for InterMine databases François Belleau ISMB 2024 BOSC 12 July 2024
  • 2.
    2 Contents • InterMine project •Semantic Web concepts • Data transformation process from REST to RDF • InterMine SPARQL endpoint demo • Concluding remarks
  • 3.
     InterMine isan open source data warehouse system, licensed under the LGPL 2.1. It is used to create databases of biological data accessed by sophisticated web query.  InterMine includes a user-friendly web interface that works 'out of the box' and can be easily customised.  InterMine makes it easy to integrate multiple data sources into a single data warehouse.  There is 19 InterMine databases are available. https://en.wikipedia.org/wiki/InterMine InterMine Project 3
  • 4.
    FlyMine query : Pathwayidentifier and name for the selected gene with FlyMine 4 https://www.flymine.org/flymine
  • 5.
    InterMine : ProgrammingAPI available 5
  • 6.
     Semantic Webis an extension of the World Wide Web that create a web of data where machines can understand the meaning and relationships between things.  RDF provides a standard way to represent information in the form of triples: (subject, predicate, object).  SPARQL is a query language specifically designed for retrieving and manipulating data stored in RDF format in a triplestore. Semantic Web concepts 6
  • 7.
    20 years ofSemantic web in Life Science evolution 7 ISMB 2005 ISMB 2008 Linked Data 2005 2014 Linked Data 2024 for life science Linked Data 2009
  • 8.
    SPARQL endpoints today…andmany more 8 https://www.genome.jp/sparql/linkdb https://disease-ontology.org/do-kb/sparql https://sparql.uniprot.org/ https://id.nlm.nih.gov/mesh/query https://query.wikidata.org/ https://rdfportal.org/dataset/pdbj UniProt MeSH Kegg DO PDP Wikipedia
  • 9.
  • 10.
    MO-LD project :InterMine RDF conversion first attempt 10 https://github.com/mo-ld DÉRASPE, Maxime, BINKLEY, Gail, BUTANO, Daniela, et al. Making linked data SPARQL with the InterMine biological data warehouse. In : CEUR Workshop Proceedings. Rheinisch-Westfaelische Technische Hochschule Aachen* Lehrstuhl Informatik V, 2016.
  • 11.
    • Data transformationprocess from REST to RDF 11 http://es.kibio.science QLever triplestore DB collection : • FlyMine • WormMine • YeastMine http://intermine.bio2rdf.org SPARQL endpoints - QLever UI • intermine-linkml-classe • intermine-linkml-field • intermine-DB-relation • Intermine-DB-object Python InterMine REST API • DB-relation.nt.gz • DB-object.nt.gz qlever index intermine2linkml.py intermine2os.py elasticdump https://huggingface.co/datasets/bio2rdf/intermine json_gz2nt_gz.py • DB-relation.ndjson.gz • DB-object.ndjson.gz • linkml-DB.yaml linkml2es.py
  • 12.
  • 13.
    Show the Pathwayidentifier(s) and name for the selected gene 13 SPARQL query InterMine query
  • 14.
    14 Concluding remarks • SemanticWeb as evolved with new technologies : • JSON-LD (https://json-ld.org/) • LinkML (https://linkml.io/) • QLever triplestore (https://qlever.cs.uni- freiburg.de) • Converting JSON from REST API to RDF is a simple approach • Future works : • Other InterMine SPARQL endpoints will be added • We will explore SPARQL query generation with LLM
  • 15.
    15 Acknowlegments • Collaborators • GosMicklem (Cambridge University) • Deepak Unni (SIB, Swiss Institute of Bioinformatics) • Arnaud Droit (ADLab) • Funding • BioHackathon 2023 organizers • BOSC 2024 Organizing Committee
  • 16.