Folie 1

Transcription

Folie 1
Search Engines 09
Query Processing
July 21, 2009
Alexander Lüders | Kai Schlichting
Query Processing Overview
2
Submit Query
Parse Query
Process Query
Spell Checking
Display Result
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Query Parsing (1/2)
3
■ Boolean query language
□ AND (implicit), OR, NOT, phrases
Submit Query
□ “albert einstein” OR schweitzer
■ Extensible query language
Parse Query
Process Query
□ Specification of a grammar (jjtree parser generator)
□ Class model
Spell Checking
Display Result
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Query Parsing (2/2)
4
■ Query preprocessing
□ Remove invalid chars (valid: a-z-A-Z0-9-_")
Submit Query
□ Remove “ if odd count
□ Stemming (Porter)
Parse Query
Process Query
□ Lower case
Spell Checking
Display Result
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Process Query
5
■ Term-at-a-time algorithm
□ Faster due to sequential disk access
Submit Query
□ Increased memory consumption
Parse Query
Process Query
■ IndexReader: Interface to the index
Spell Checking
Display Result
■ Ranking: BM25 algorithm
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Spell Checking
6
Submit Query
Parse Query
Process Query
Spell Checking
Display Result
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Demo
7
Submit Query
Parse Query
Process Query
Spell Checking
Display Result
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Performance Analysis
8
■ 30 – 500 ms per query
■ Connection pooling
□ Establishing and closing connection is very slow
□ Connection pool opens and holds limited number of
connections
■ Caching of inverted lists
□ Fast results of queries with cached query terms
□ Fast “next page” access
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Conclusion
9
■ Features
□ Ranking
□ Spell checking
□ Extensible boolean query language
□ Replaceable index layer
■ Further performance optimizations possible, but not absolutely
necessary
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09
Thank You For Your Attention!
10
Questions?
Search Engines 09 - Query Processing | Alexander Lüders, Kai Schlichting | 21.07.09

Documents pareils