Exploring Semantic Web Data
                     and particularly Linked Data

                               Roberto García


                              AIC Seminar Series
                  SRI International, Menlo Park, August 14th 2012



Human-Computer Interaction
                                                     Universitat de Lleida
       and Data Integration
                                                     Spain
             Research Group
Who
• Associate Professor, Universitat de Lleida, Spain
• Visiting Associate Professor, Standford University
   – Stanford HCI Group

• +12 years Semantic Web research
   – 1999 MSc Thesis: Knowledge Management using
      RDF plus reasoning (SiLRI)
   – 2006 PhD Thesis: A Semantic Web approach to DRM
   – 2006- Copyright Ontology
   – 2007- Lleida HCI Group, Semantic Web User
      Interfaces
What is Open Data?
“Open data is data that can be freely used, reused
 and redistributed by anyone - subject only, at most,
 to the requirement to attribute and sharealike”
                 Open Knowledge Foundation

• Make your data OPEN
   – Available online with open license
      • For instance Creative Commons CC-BY
   – No more than reproduction cost
   – No matter format
Open Data Worldwide
• 169 initiatives                                                    Rate:
    – City (40), Country, Region or State (125),
      Supranational (4)




                     http://datos.fundacionctic.org/sandbox/catalog/faceted/
Welcome to Data.CA.Gov
Open Data Formats
           • However, encourage formats that facilitate
             reuse and interoperability
                   – Tim Berners-Lee 5 stars classification




http://5stardata.info
★ Open Data
           • Make data available on the Web under an
             open license
                   – Data licenses:
                        • Public Domain Dedication and License (PDDL), Open Data
                          Commons Attribution License (ODC-by) or Creative Commons
                          Public Domain Dedication (CC0)

           • Whatever format
                   – Example: PDF
           • But… data is locked-up in a document
                   – Hard to get data out, custom scrapers
http://5stardata.info
★★ Open Data
           • Make it available as structured data
                   – Example: Excel instead of image scan of a table




           • But… data still locked-up
                   – You depend on proprietary software

http://5stardata.info
★★★ Open Data
           • Use non-proprietary formats
                   – Example: CSV instead of Excel

                           "Temperature forecast for Galway",

                           "Day","Lowest Temperature (C)"
                           "Saturday, 13 November 2010",2
                           "Sunday, 14 November 2010",4
                           "Monday, 15 November 2010",7

           • But… data on the Web and not data in the Web
                   – What does “Galway” mean? Is it a temperature?
                     What is the unit? Local time?...


http://5stardata.info
Galway (disambiguation)
• Places
   – Ireland
        • Galway
        • County Galway
        • Galway Bay
   – Sri Lanka
        • Galway's Land National Park
   – United States
        • Galway (town), New York
        • Galway (village), New York
• Things
   –   Galway (sheep), a breed of sheep that originated in Galway, Ireland
   –   Galway harp, a type of harp
   –   Galway Hooker, a type of sailing boat
   –   Galway or Claddagh Ring, a type of wedding ring made in Galway
• …
★★★★ Open Data
• Use URIs to identify things,
  so that people can point at your stuff
   – Example: RDF1 (but also Atom, OData, JSON-LD,…)

  @prefix meteo: <http://purl.org/ns/meteo#> .
  @prefix galweather: <http://5stardata.info/galweather#> .        Vocabularies
  @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .               Ontologies
  <http://example.org/Galway> meteo:forecast
         [       meteo:predicted "2010-11-13T12:00:00Z"^^xsd:dateTime ;
                 meteo:temperature [ meteo:celsius "2"^^xsd:decimal ] ] .




• But… what if we (humans or computers) don’t
  know what http://example.org/Galway means?

      1
          Resource Description Framework, http://www.w3.org/RDF/
★★★★★ Linked Open Data
• Link your data to other data to provide context
  (semantics, meaning)
   – Example: http://dbpedia.org/resource/Galway                                                    HTTP GET
        @prefix dbpedia: <http://dbpedia.org/resource/> .
        ...
        dbpedia:Galway a <http://dbpedia.org/ontology/Place>, <http://dbpedia.org/ontology/PopulatedPlace>,
        <http://dbpedia.org/ontology/Settlement>;
            rdfs:label "Galway"@en;
            dbp:populationBlank "Galwegian, Tribesman"@en;
            dbp:populationTotal "75529"^^xsd:int;
            dbp:populationUrban "76778"^^xsd:int;
            dcterms:subject <http://dbpedia.org/resource/Category:Cities_in_the_Republic_of_Ireland>,
                <http://dbpedia.org/resource/Category:County_towns_in_the_Republic_of_Ireland>,
                <http://dbpedia.org/resource/Category:Port_cities_and_towns_in_the_Republic_of_Ireland>,
                <http://dbpedia.org/resource/Category:University_towns>;
            rdfs:comment "Galway or City of Galway (Cathair na Gaillimhe) is a city on the west coast of
        Ireland. It is located on the River Corrib between Lough Corrib and Galway Bay and is surrounded by
        County Galway. It is the third largest city within the state, though if the wider urban area is
        included then it falls into fourth place behind Limerick. The population of Galway city at the 2011
        census was 75,529, rising to 76,778 across the entire urban area."@en;
            geo:lat 53.2719;
            geo:long -9.04889;
            foaf:homepage <http://www.galwaycity.ie> .




   … and also dbp:PopulationTotal, dct:subject,…
Network Effect
  ~31 billion
  statements




http://linkeddata.org
Fine for computers… but people?
                                    C. Warren
                                    (blogger)

                                    I’m writing
                                    about “Films I
                                    Like”.
                                    Can I reuse
                                    LinkedMDB?


                                       M. Harper
                                       (developer)

                                       I’m developing
                                       a bird watching
                                       application.
                                       Can I reuse
                                       DBPedia?
http://linkeddata.org
User Testing
• Users typical questions:
  –   Where do I start?
  –   Where do I go now?
  –   What is this data about?
  –   How do I find this?
  –   …

• What do Linked Data user interfaces offer?
DBPedia Scenario
• Linked Data version of Wikipedia
  – 3.5 million things described
     • Ontology: 257 classes y 1276 properties
Target Technical Users
• DBPedia main page
Semantic Query Languages
• SPARQL:
  – select distinct(?c) (count(?i) as ?n)
    where {?i a ?c} order by desc(?n)

                                c                              n
       http://www.w3.org/2002/07/owl#Thing           1668503
       http://www.w3.org/2004/02/skos/core#Concept   632607
       http://www.opengis.net/gml/_Feature           571764
       http://dbpedia.org/ontology/Place             462349
       http://dbpedia.org/ontology/Person            363751
       http://dbpedia.org/ontology/Work              355100
       http://dbpedia.org/ontology/PopulatedPlace    340443
       http://xmlns.com/foaf/0.1/Person              296595
Text Search
• What to type? A URI? A URI label?
• How to take advantage from semantics?
Semantic Query UIs
• iSPARQL
  http://dbpedia.org/isparql/
Proposal
    Ontologies and dataset structure

Automatic UI Generation          Information
                                 Architecture
                                 Components
                                       [Morville]

                 Overview    Menus, Sitemaps,…
Interaction
Patterns for Zoom & Filter   Facets
Data Analysis
 [Shneiderman]
                 Details     Lists, Maps, Timelines…
IA Components. Menus
– From dataset ontologies and thesaurus
   • For each class/topic
      – URI, label, # instances/uses, subclasses/subtopics
– Flatten to desired # entries and subentries
   • When there is room, entries or subentries,
     divide class/topic with the most instances
   • When too many, group that with the fewest
      – “Other” is the generic group
IA Components. Menus

7 menus with 10 submenus
         Automatic
         Generation
DEMO
                                     http://rhizomik.net/dbpedia/

   IA Components. Menus
Provide DBPedia overview…
     …but what about 12.334 birds?
IA Components. Facets
• Pre-computed list of facets / class or topic
  – Ontologies or thesaurus + instance data
  – Facet metrics:
     • frequency, #values, most common value
       cardinality…
• DBPedia Birds class:
  – 226 properties
     • dbo:kingdom, 100%, 3 values,
       6846 (Animalia),…
DEMO
                   http://rhizomik.net/dbpedia/

Scenario DBPedia
DEMO
                     http://rhizomik.net/linkedmdb

Scenario LinkedMDB
Testing LinkedMDB
• Evaluation with lay users as part of RITE1
  development process
  – Iteration test with 6 users
  – LinkedMDB (Linked Data version of iMDb)

                    User Task:
                    “Find three films where
                    Woody Allen is director and
                    also actor”.



   1
       Rapid Iterative Testing and Evaluation
Evaluation Results
• Seemed easy but…
  no user completed task without help
• Really, just 1 issue:
  – Users started from “Actor” instead than from
    “Film”, and got lost from there
• User interaction is too constrained by
  underlying “explicit” data structure
• Lack of context while browsing graph
New Features
• Facets for all inverse properties
  (explicit or implicit)
  – Actor  actor – Film:
     • Actor has facet “is actor of Film”
• Breadcrumbs show “query” built so far
  – Click Film, then for facet “Actor”
    search “Woody Allen”:
     • “Showing Film has actor
        where actor name is Woody Allen”
New Features
• What about getting from Actors to Films to
  restrict by director?

• Add Actor facet “directed by”?
  – DANGER: facets explosion
     • Director  Film  Country  Continent
       Director facet:
       “continents of countries where films directed”!
New Features
• Pivoting: switch from faceted view to
  related faceted view (keeping filters)
  – E.g.: from Actors facets move to Films facets
    through “is Actor of Film” facet
• For each class facet also compute:
  – Most specific class for target instances
     • Actor “is Actor of” Film and TV Episode
        Audiovisual Work
  – Pivot that facet to get:
     • Faceted view for target class… + filters so far
DEMO
http://rhizomik.net/linkedmdb/
Next Round Evaluation
• Semantic Web Exploration Tools
  Quality in Use Model:
  – Task success, Task time, Satisfaction,…
  – UI Component Efficiency, Task Flexibility, Layout
    Flexibility,…
• Task: “Films Woody Allen director and actor”
  – Task time:
                 Pre-pivot   Pivot      Reduction
    Minimum        1.05      0.89         15%
    Maximum        5.23      2.23         57%
    Mean           2.41      1.69         30%
    St. Dev.       1.49      0.57         62%
Summary
• Menus
  – Dataset classes (topics) overview
• Facets
  – Filter class using properties and values
• Pivoting
  – Switch faceted views, carrying filters
DEMO
                                          http://rhizomik.net/linkedmdb/


   Conclusions
• Users build queries without SPARQL or
  dataset structure knowledge
• Example:
  – Who has directed more films in Oceania?
  SELECT DISTINCT ?r1 WHERE {
    ?r1 a movie:Director .
    ?r2 movie:director ?r1 .
    ?r2 a movie:Film.
    ?r2 movie:country ?r3 .
    ?r3 movie:country_continent ?r3var0
    FILTER(str(?r3var0)="Oceania") }
Work in Progress
• Interaction design
  – Explore the best way to make pivoting, and un-
    pivoting, evident for users
  – Improve “breadcrumbs”
• Specialized facets:
  – Range dependent: histogram for numbers,
    calendar for dates,…
Work in Progress
Integrate
RDF2SVG
Work in Progress
   • Object-Action interaction paradigm
      – Objet properties determine actions
      – Actions: plugable Semantic Web Services
                                          lat, long, point…



time, date, start, end…
DEMO
                                  http://lodvisualization.appspot.com



   Work in Progress
• Other IA components: sitemaps
DEMO
                                       http://rhizomik.net/apollo/

    Work in Progress
• Interactively select data and configure
  visualizations
Data Quality
Assisted Edition (and Trust)




   WebID




    http://www.w3.org/2008/09/msnws/papers/foaf+ssl.html
Thanks for your attention

                              Roberto García
                              http://rhizomik.net/~roberto
                                roberto.garcia@udl.cat




Human-Computer Interaction
                                                             Universitat de Lleida
       and Data Integration
                                                             Spain
             Research Group

Exploring the Semantic Web

  • 1.
    Exploring Semantic WebData and particularly Linked Data Roberto García AIC Seminar Series SRI International, Menlo Park, August 14th 2012 Human-Computer Interaction Universitat de Lleida and Data Integration Spain Research Group
  • 2.
    Who • Associate Professor,Universitat de Lleida, Spain • Visiting Associate Professor, Standford University – Stanford HCI Group • +12 years Semantic Web research – 1999 MSc Thesis: Knowledge Management using RDF plus reasoning (SiLRI) – 2006 PhD Thesis: A Semantic Web approach to DRM – 2006- Copyright Ontology – 2007- Lleida HCI Group, Semantic Web User Interfaces
  • 3.
    What is OpenData? “Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike” Open Knowledge Foundation • Make your data OPEN – Available online with open license • For instance Creative Commons CC-BY – No more than reproduction cost – No matter format
  • 4.
    Open Data Worldwide •169 initiatives Rate: – City (40), Country, Region or State (125), Supranational (4) http://datos.fundacionctic.org/sandbox/catalog/faceted/
  • 5.
  • 6.
    Open Data Formats • However, encourage formats that facilitate reuse and interoperability – Tim Berners-Lee 5 stars classification http://5stardata.info
  • 7.
    ★ Open Data • Make data available on the Web under an open license – Data licenses: • Public Domain Dedication and License (PDDL), Open Data Commons Attribution License (ODC-by) or Creative Commons Public Domain Dedication (CC0) • Whatever format – Example: PDF • But… data is locked-up in a document – Hard to get data out, custom scrapers http://5stardata.info
  • 8.
    ★★ Open Data • Make it available as structured data – Example: Excel instead of image scan of a table • But… data still locked-up – You depend on proprietary software http://5stardata.info
  • 9.
    ★★★ Open Data • Use non-proprietary formats – Example: CSV instead of Excel "Temperature forecast for Galway", "Day","Lowest Temperature (C)" "Saturday, 13 November 2010",2 "Sunday, 14 November 2010",4 "Monday, 15 November 2010",7 • But… data on the Web and not data in the Web – What does “Galway” mean? Is it a temperature? What is the unit? Local time?... http://5stardata.info
  • 10.
    Galway (disambiguation) • Places – Ireland • Galway • County Galway • Galway Bay – Sri Lanka • Galway's Land National Park – United States • Galway (town), New York • Galway (village), New York • Things – Galway (sheep), a breed of sheep that originated in Galway, Ireland – Galway harp, a type of harp – Galway Hooker, a type of sailing boat – Galway or Claddagh Ring, a type of wedding ring made in Galway • …
  • 11.
    ★★★★ Open Data •Use URIs to identify things, so that people can point at your stuff – Example: RDF1 (but also Atom, OData, JSON-LD,…) @prefix meteo: <http://purl.org/ns/meteo#> . @prefix galweather: <http://5stardata.info/galweather#> . Vocabularies @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . Ontologies <http://example.org/Galway> meteo:forecast [ meteo:predicted "2010-11-13T12:00:00Z"^^xsd:dateTime ; meteo:temperature [ meteo:celsius "2"^^xsd:decimal ] ] . • But… what if we (humans or computers) don’t know what http://example.org/Galway means? 1 Resource Description Framework, http://www.w3.org/RDF/
  • 12.
    ★★★★★ Linked OpenData • Link your data to other data to provide context (semantics, meaning) – Example: http://dbpedia.org/resource/Galway HTTP GET @prefix dbpedia: <http://dbpedia.org/resource/> . ... dbpedia:Galway a <http://dbpedia.org/ontology/Place>, <http://dbpedia.org/ontology/PopulatedPlace>, <http://dbpedia.org/ontology/Settlement>; rdfs:label "Galway"@en; dbp:populationBlank "Galwegian, Tribesman"@en; dbp:populationTotal "75529"^^xsd:int; dbp:populationUrban "76778"^^xsd:int; dcterms:subject <http://dbpedia.org/resource/Category:Cities_in_the_Republic_of_Ireland>, <http://dbpedia.org/resource/Category:County_towns_in_the_Republic_of_Ireland>, <http://dbpedia.org/resource/Category:Port_cities_and_towns_in_the_Republic_of_Ireland>, <http://dbpedia.org/resource/Category:University_towns>; rdfs:comment "Galway or City of Galway (Cathair na Gaillimhe) is a city on the west coast of Ireland. It is located on the River Corrib between Lough Corrib and Galway Bay and is surrounded by County Galway. It is the third largest city within the state, though if the wider urban area is included then it falls into fourth place behind Limerick. The population of Galway city at the 2011 census was 75,529, rising to 76,778 across the entire urban area."@en; geo:lat 53.2719; geo:long -9.04889; foaf:homepage <http://www.galwaycity.ie> . … and also dbp:PopulationTotal, dct:subject,…
  • 13.
    Network Effect ~31 billion statements http://linkeddata.org
  • 14.
    Fine for computers…but people? C. Warren (blogger) I’m writing about “Films I Like”. Can I reuse LinkedMDB? M. Harper (developer) I’m developing a bird watching application. Can I reuse DBPedia? http://linkeddata.org
  • 15.
    User Testing • Userstypical questions: – Where do I start? – Where do I go now? – What is this data about? – How do I find this? – … • What do Linked Data user interfaces offer?
  • 16.
    DBPedia Scenario • LinkedData version of Wikipedia – 3.5 million things described • Ontology: 257 classes y 1276 properties
  • 17.
    Target Technical Users •DBPedia main page
  • 18.
    Semantic Query Languages •SPARQL: – select distinct(?c) (count(?i) as ?n) where {?i a ?c} order by desc(?n) c n http://www.w3.org/2002/07/owl#Thing 1668503 http://www.w3.org/2004/02/skos/core#Concept 632607 http://www.opengis.net/gml/_Feature 571764 http://dbpedia.org/ontology/Place 462349 http://dbpedia.org/ontology/Person 363751 http://dbpedia.org/ontology/Work 355100 http://dbpedia.org/ontology/PopulatedPlace 340443 http://xmlns.com/foaf/0.1/Person 296595
  • 19.
    Text Search • Whatto type? A URI? A URI label? • How to take advantage from semantics?
  • 20.
    Semantic Query UIs •iSPARQL http://dbpedia.org/isparql/
  • 21.
    Proposal Ontologies and dataset structure Automatic UI Generation Information Architecture Components [Morville] Overview Menus, Sitemaps,… Interaction Patterns for Zoom & Filter Facets Data Analysis [Shneiderman] Details Lists, Maps, Timelines…
  • 22.
    IA Components. Menus –From dataset ontologies and thesaurus • For each class/topic – URI, label, # instances/uses, subclasses/subtopics – Flatten to desired # entries and subentries • When there is room, entries or subentries, divide class/topic with the most instances • When too many, group that with the fewest – “Other” is the generic group
  • 23.
    IA Components. Menus 7menus with 10 submenus Automatic Generation
  • 24.
    DEMO http://rhizomik.net/dbpedia/ IA Components. Menus Provide DBPedia overview… …but what about 12.334 birds?
  • 25.
    IA Components. Facets • Pre-computedlist of facets / class or topic – Ontologies or thesaurus + instance data – Facet metrics: • frequency, #values, most common value cardinality… • DBPedia Birds class: – 226 properties • dbo:kingdom, 100%, 3 values, 6846 (Animalia),…
  • 26.
    DEMO http://rhizomik.net/dbpedia/ Scenario DBPedia
  • 27.
    DEMO http://rhizomik.net/linkedmdb Scenario LinkedMDB
  • 28.
    Testing LinkedMDB • Evaluationwith lay users as part of RITE1 development process – Iteration test with 6 users – LinkedMDB (Linked Data version of iMDb) User Task: “Find three films where Woody Allen is director and also actor”. 1 Rapid Iterative Testing and Evaluation
  • 30.
    Evaluation Results • Seemedeasy but… no user completed task without help • Really, just 1 issue: – Users started from “Actor” instead than from “Film”, and got lost from there • User interaction is too constrained by underlying “explicit” data structure • Lack of context while browsing graph
  • 31.
    New Features • Facetsfor all inverse properties (explicit or implicit) – Actor  actor – Film: • Actor has facet “is actor of Film” • Breadcrumbs show “query” built so far – Click Film, then for facet “Actor” search “Woody Allen”: • “Showing Film has actor where actor name is Woody Allen”
  • 32.
    New Features • Whatabout getting from Actors to Films to restrict by director? • Add Actor facet “directed by”? – DANGER: facets explosion • Director  Film  Country  Continent Director facet: “continents of countries where films directed”!
  • 33.
    New Features • Pivoting:switch from faceted view to related faceted view (keeping filters) – E.g.: from Actors facets move to Films facets through “is Actor of Film” facet • For each class facet also compute: – Most specific class for target instances • Actor “is Actor of” Film and TV Episode  Audiovisual Work – Pivot that facet to get: • Faceted view for target class… + filters so far
  • 34.
  • 39.
    Next Round Evaluation •Semantic Web Exploration Tools Quality in Use Model: – Task success, Task time, Satisfaction,… – UI Component Efficiency, Task Flexibility, Layout Flexibility,… • Task: “Films Woody Allen director and actor” – Task time: Pre-pivot Pivot Reduction Minimum 1.05 0.89 15% Maximum 5.23 2.23 57% Mean 2.41 1.69 30% St. Dev. 1.49 0.57 62%
  • 40.
    Summary • Menus – Dataset classes (topics) overview • Facets – Filter class using properties and values • Pivoting – Switch faceted views, carrying filters
  • 41.
    DEMO http://rhizomik.net/linkedmdb/ Conclusions • Users build queries without SPARQL or dataset structure knowledge • Example: – Who has directed more films in Oceania? SELECT DISTINCT ?r1 WHERE { ?r1 a movie:Director . ?r2 movie:director ?r1 . ?r2 a movie:Film. ?r2 movie:country ?r3 . ?r3 movie:country_continent ?r3var0 FILTER(str(?r3var0)="Oceania") }
  • 42.
    Work in Progress •Interaction design – Explore the best way to make pivoting, and un- pivoting, evident for users – Improve “breadcrumbs” • Specialized facets: – Range dependent: histogram for numbers, calendar for dates,…
  • 43.
  • 44.
    Work in Progress • Object-Action interaction paradigm – Objet properties determine actions – Actions: plugable Semantic Web Services lat, long, point… time, date, start, end…
  • 45.
    DEMO http://lodvisualization.appspot.com Work in Progress • Other IA components: sitemaps
  • 46.
    DEMO http://rhizomik.net/apollo/ Work in Progress • Interactively select data and configure visualizations
  • 47.
  • 48.
    Assisted Edition (andTrust) WebID http://www.w3.org/2008/09/msnws/papers/foaf+ssl.html
  • 49.
    Thanks for yourattention Roberto García http://rhizomik.net/~roberto [email protected] Human-Computer Interaction Universitat de Lleida and Data Integration Spain Research Group

Editor's Notes

  • #27 Faceted view for Species &gt; Bird Looking for pigeons (Columbidae) in “Mediterranean Countries”… Filter on direct properties like Familia = Columbidae