[Drupal Con AMS 2005] Steven Wittens - Search Module Demystified

Roland Tanglao
2005
18
10
created on Tue, 2005-10-18 11:48
  • several layers of hooks
  • evolved out of pre-4.6 search
  • actually a layer on top of stuff in node and user module
  • it's not a standalone module
  • gateway is through hook_search() hook
  • accessible thru /search
  • search_view, search_data() fetches data
  • clean permalink for each query, returns array of results, results themed with theme_search_item
  • advantages of centralized search - consistent look and feel for different search, can do fancy stuff like URL that doesn't exist doing an automagic search from 404
  • html indexer - 2nd part of search
  • query implemented in SQL using 2 pass system
  • query is extensible
  • unicode aware
  • language specific processing: accent removal, stemming, word splitting
  • has stemming module that will go to contrib
  • algorithm discussion on ranking: rare words score better than common words
  • doesn't use MySQL fulltext because it's database specific, doesn't work on multiple tables, doesn't understand HTML or links,
  • node_search uses html indexer to index entire nodes, has google syntax, search ranking extended with extra factors,
  • upload indexer modules - hooks search into upload
  • content search results - nodeapi('search result') used to add extra info, highlighted snippet
  • Search module is control module, lots of logic in other modules to implement it
  • drupal.org search: large database, lots of noise, too many results, no one goes to 2nd page, pre-processing stemming reduces index size by 30%, will re-index drupal.org this week hopefully
  • can lose context info using stemming
  • vlado how about using ngrams?
  • what was wrong with 4.6: html tag recognition got confused,wildcards destroyed, no advanced matching, coefficients not as optimised
  • inspired by trip search
  • core search is better but trip search better for smaller sites
  • why not google?
  • only sees public content, doesn't understand Drupal node structure, Google API is limited in # of queries
  • should it be a module?
  • needs to be examined!
  • improvements: examine search patterns, determine guidelines for module developers to add search functionality
  • lots of questions: people should use the other room :-) !
Syndicate content