Spatial Queries Entity
Recognition and
Disambiguation
BY: EHSAN HAMZEI
Table of contents
1- Introduction
2- Query Processing (Related Works)
3- State of the Art
4- Our approach
5- Conclusion
Introduction
December 1990 >> First Search engine (W3Catalog) >> Entirely indexed by hand
September 1993 >> WebCrawler >> Finding automatically
…
January 1994 >> Yahoo!
September 1997>> Google
Introduction(Spatial Search Engine)
New Sources on the web:
◦ New Search Engines for Images, Videos
◦ New Search engine for geospatial data (Google Maps, Bing Maps)
February 2005 >> Google Maps
December 2010 >> Bing Maps
Query Processing
(What is Query Processing?)
Search Engine two major process:
◦ 1- Offline (For crawling and collection data)
◦ 2- Online (Started from user’s query and end with returning the results)
Where is Query Processing?
What is Query Processing brings to us?
Query Processing and Related Works
NLP >> Natural Language Processing
ER >> Entity Recognition
Related Works:
◦ Guo et al. (2009) addresses the problem of Named Entity Recognition in Query (NERQ)
◦ …
◦ Dalvi et al.(2014) developed a four step algorithm named Topic-specific Language Model (TLM method)
for doing Entity Recognition and Disambiguation from search queries.
Query Processing (State Of The Art)
An Example of two same query by google maps:
1- Intersection of shariati and resalat
2- Intersection of valiasr and enqelab
Proposed Approach (Definition)
Spatial Query = Combination of:
◦ 1- Location Name
◦ 2- Location Type
◦ 3- Spatial Relationship
◦ Example : Hospitals around Resalat Square
Based On NLP (ER) We can recognize and tag these types for further processes
Proposed Approach (Algorithm)
1- Input Query > Segmentation (Top to Down)
2- Candidate
◦ 2-1 Location Name
◦ 2-2 Location Type
◦ 2-3 Spatial Relationship
3- Validate The Result
◦ 3-1 Check that it is fully understand
◦ 3-2 Check the conceptual criteria
◦ 3-3 Check the logical criteria
4- Returning the result
Proposed Approach (Evaluation)
Two kind of evaluations can be possible:
1- Disambiguation:
◦ The average disambiguation for 100 spatial queries: 89.45%
2- According to 100 spatial queries compared to Google Maps
◦ Google Maps : 72
◦ Our Approach : 91
Conclusion
Changing the perspective from textual to spatial
Take the spatial relationship into account
◦ Make them answerable in general
◦ Using them for disambiguation
Future Work:
◦ Using the combination of Geocode APIs
◦ Develop more sophisticated algorithm (2 or more spatial relationship)
Thanks For Your Attention

Spatial queries entity recognition and disambiguation

  • 1.
    Spatial Queries Entity Recognitionand Disambiguation BY: EHSAN HAMZEI
  • 2.
    Table of contents 1-Introduction 2- Query Processing (Related Works) 3- State of the Art 4- Our approach 5- Conclusion
  • 3.
    Introduction December 1990 >>First Search engine (W3Catalog) >> Entirely indexed by hand September 1993 >> WebCrawler >> Finding automatically … January 1994 >> Yahoo! September 1997>> Google
  • 4.
    Introduction(Spatial Search Engine) NewSources on the web: ◦ New Search Engines for Images, Videos ◦ New Search engine for geospatial data (Google Maps, Bing Maps) February 2005 >> Google Maps December 2010 >> Bing Maps
  • 5.
    Query Processing (What isQuery Processing?) Search Engine two major process: ◦ 1- Offline (For crawling and collection data) ◦ 2- Online (Started from user’s query and end with returning the results) Where is Query Processing? What is Query Processing brings to us?
  • 6.
    Query Processing andRelated Works NLP >> Natural Language Processing ER >> Entity Recognition Related Works: ◦ Guo et al. (2009) addresses the problem of Named Entity Recognition in Query (NERQ) ◦ … ◦ Dalvi et al.(2014) developed a four step algorithm named Topic-specific Language Model (TLM method) for doing Entity Recognition and Disambiguation from search queries.
  • 7.
    Query Processing (StateOf The Art) An Example of two same query by google maps: 1- Intersection of shariati and resalat 2- Intersection of valiasr and enqelab
  • 8.
    Proposed Approach (Definition) SpatialQuery = Combination of: ◦ 1- Location Name ◦ 2- Location Type ◦ 3- Spatial Relationship ◦ Example : Hospitals around Resalat Square Based On NLP (ER) We can recognize and tag these types for further processes
  • 9.
    Proposed Approach (Algorithm) 1-Input Query > Segmentation (Top to Down) 2- Candidate ◦ 2-1 Location Name ◦ 2-2 Location Type ◦ 2-3 Spatial Relationship 3- Validate The Result ◦ 3-1 Check that it is fully understand ◦ 3-2 Check the conceptual criteria ◦ 3-3 Check the logical criteria 4- Returning the result
  • 10.
    Proposed Approach (Evaluation) Twokind of evaluations can be possible: 1- Disambiguation: ◦ The average disambiguation for 100 spatial queries: 89.45% 2- According to 100 spatial queries compared to Google Maps ◦ Google Maps : 72 ◦ Our Approach : 91
  • 11.
    Conclusion Changing the perspectivefrom textual to spatial Take the spatial relationship into account ◦ Make them answerable in general ◦ Using them for disambiguation Future Work: ◦ Using the combination of Geocode APIs ◦ Develop more sophisticated algorithm (2 or more spatial relationship)
  • 12.
    Thanks For YourAttention

Editor's Notes

  • #2 In this presentation, a new approach on spatial query processing is introduced, as the title mentions also this technique can be used for disambiguation the results.
  • #3 As you see in the table of contents, First we have brief Introduction then we discuss about Query processing and some related works. The next topic is The state of the art that include some sample from Google Maps. And finally we discuss about The proposed method and conclusion
  • #4 In our modern life, no one can deny the great importance of Internet on our daily life Internet and web play a major role on wide aspects of our life Because of the huge amount of websites and data in the internet, a critical need has arouse for Finding the data and resources on the Word Wide Web The result was the first generation of Search Engines, they are Indexed firstly by hand And then some mechanisms invented by IT researchers for finding and indexing the data Automatically. As you see in this Slide in December 1990 First Search engine was born and has developed in dramatic manner, that we can see their high performance in Modern Search engine like google
  • #5 As the Web has developed with its great slope, Some new sources of data came in this space, From images and videos and even spatial data and maps. Because of the great demands for spatial data, and the need for a place to answer user’s related queries, new search engine evolved, Such as Google Maps, Bing Maps, and etc. These search engines are responsible for many spatial tasks such as finding a places, facilities and even route finding tasks. As you see in this slide, in February 2005 Google Maps launched and In December 2010 Bing Maps started.
  • #6 If we consider a simplified search engine, two significant process must existed in it, First the offline process, and second the online process The offline process is responsible for finding new data, or websites on the web and also tracking the indexed data. And its run always in the backbone of the search engine, transparent to user. The online process is started from user’s query and ended with returning the results Query processing is the first step in the online process, to clarify the importance of this step it is worth mentioning if we consider the online process as a process of solving question, then query processing will be understanding the question. So by this simple example we can imagine how important is this process if it’s not done efficiently, the result would not be appropriate to the user’s demand.
  • #7 Natural Language processing is one of the challenging scope on computer science, It’s related to all tasks that can automatically done on textual information For Example, Finding the related topics in textual data, Summarizing the textual data automatically, Tagging the part of speech and some similar tasks are related to this field of study, one of the main parts of NLP is ER or Entity recognition, which is responsible for tagging and classifying a textual data to some pre-defined category. This mechanism, I mean ER, is used first in 2009 by guo for query processing, then it’s work developed by some researchers in order to Obtain better query processing. In 2014 Dalvi develop a new algorithm for Entity recognition in search engine’s query and use it’s benefit for disambiguation And limiting the results. But all this afforts are in General Search Engines, and in our approach we try to propose a method for Spatial Search Engines.
  • #8 In this slide we have two query with similar structure: Intersection of …. But the results are so different! If we search this places, Resalat or Shariati google can easily found them but google is not processing the query in spatial perspective. A question will raise, why google answer the second query? This because of a tag that google have in the intersection and in the attached information, we have the intersection of engelab and valiasr… so google could not understand Spatial relationship, and it depends only on it’s textual data… we called this approach textual perspective which is depending on textual info without any further process.
  • #9 The first question that must be answered is for query processing is, what is the spatial query? A spatial query is a combination of 3 types of information: Location name, Location type, Spatial Relationship As an example … So If we can find and tag the sub-queries into these three type and understand the relation between them we can easily execute the query in a appropriate approach.
  • #10 As you see in the algorithm, First the Input query segmented from top to down, it means that in the first iteration we have 1 sub query and in iteration continues to word by word segment if the process it’s not interrupted by achieving the requirement. In the next phase we candidate sub queries according to the 3 predefined categories First we suspect that the subquery is a location name and by the google geocode api we check that, if it return the results we tag that sub query as a location name and in other process we don’t break it into smaller parts. Google Geocode api is available as a web service that take the string input and return an array of information if it existed, in Json or XML format, depends on your request. Any member of this array contain geospatial information (longitude and latitude, bounding box) and also hierarchical address. Location type and Spatial relationship is checked by gazetteer list or predefined dictionary, Location Type dictionary is stored in a hierarchical structure because of it’s intrinsic characteristics. If the subquery has a adequate similarity to these predefined elements, it will tagged. And finally after each iteration we have a validation step, in order to check our tagging are meaningful or not, first we check that the query is fully processed or not, if the query is wholly processed and tagged or each of the unprocessed subqueries is contain only one word, it is fully processed otherwise it must be return for further segmentation. If the query is fully processed, we check the conceptual and logical criteria is a spatial relationship existed among the query, in conceptual criteria we check the spatial relationship is meaningful according the other parts or not, for example if we have intersection relation in our tag list we must have at least 2 location name if we don’t have them it’s not conceptually applicable so we bring the query for other iteration, and finally if the last criteria is passed, we apply the spatial relationship to the list of our data, and return the logical results, for example we have intersection of 2 location name, and google api bring to us 4 different location for each of these location name, so we 16 possible answer, and by applying the intersection analysis in this 16 possible answer we could easily find the result because most of time, the relation is not existed among lots of this possible answers. We call this disambiguation, because we eliminate the undesirable results. And finally we return the result if the validation phase passed overally.
  • #11 For evaluation we can consider two elements, first the average disambiguation, which reflects the amount of effectiveness through this approach for obtaining the desirable result And comparing our approach to the modern and state of the art search engine, which is google maps. As you see about 89 percent of disambiguation achieved through this method and also for 100 spatial queries this approach answer 91, while google answer 72
  • #12 In conclusion, for all domain specific search engine, we can model the query in order to obtain better results. We call this changing the perspective. And also, by this approach we see that more sophisticated queries are answerable and also lots of undesirable results are eliminated. For future work we suggest that to use combination of geocode api for better performance and also building more sophisticated algorithm which support More complex queries.
  • #13 Thank you for listening – and now if there are any questions, I would be pleased to answer them