Delete  Revise 

Lets meet Pollux our new Search Index

Jan Burse, created Jul 01. 2017

Dear All,

This is just to let you know that our website features a new search index. The search index is code named pollux, and its a n-gram index.

- Search without Index:
  The old search without index is still available under "found2.jsp" as a fall back. This old search had the disadvantage that it was simply scanning all available documents. For a query such as "Artificial Intelligence" we got the result:

  Search results - Artificial Intelligence
  Results 1 - 6 of 6 in 13744 ms. 

- Search with Index, Debug mode:
  The new search uses an index. We currently allow a debug mode, that shows the n-grams that are inquired for a particular query. This mode might go away in the future and the index handling might also change in the future. It can be invoked via "found.jsp?debug=true".

  Search results - Artificial Intelligence
  pregram=art, union=739
  pregram=ifi, union=433
  pregram=cia, union=596
  pregram=l, union=1786
  specimen res=209
  union res=209
  pregram=int, union=1083
  pregram=ell, union=592
  pregram=ige, union=69
  pregram=nce, union=812
  specimen res=21
  union res=21
  inter res=14

- Search with Index, Normal mode:
  As seen in the above the pollux index works for pregrams, that is n-grams which are exact and prefix matched. For example a pregram "l" matches all n-grams that start with "l". The normal mode is without debugging information that is what the search button currently directs to. The document retrieval is much faster:

  Search results - Artificial Intelligence
  Results 1 - 6 of 6 in 546 ms. 

We would like to thank Guy Castagnoli for helpful discussions and also repeatedly showing us KIWIX, a poket version of wikipedia which also features a search index.

Best Regards