[Drupal Con AMS 2005] Steven Wittens - Search Module Demystified
created on Tue, 2005-10-18 11:48
- several layers of hooks
- evolved out of pre-4.6 search
- actually a layer on top of stuff in node and user module
- it's not a standalone module
- gateway is through hook_search() hook
- accessible thru /search
- search_view, search_data() fetches data
- clean permalink for each query, returns array of results, results themed with theme_search_item
- advantages of centralized search - consistent look and feel for different search, can do fancy stuff like URL that doesn't exist doing an automagic search from 404
- html indexer - 2nd part of search
- query implemented in SQL using 2 pass system
- query is extensible
- unicode aware
- language specific processing: accent removal, stemming, word splitting
- has stemming module that will go to contrib
- algorithm discussion on ranking: rare words score better than common words
- doesn't use MySQL fulltext because it's database specific, doesn't work on multiple tables, doesn't understand HTML or links,
- node_search uses html indexer to index entire nodes, has google syntax, search ranking extended with extra factors,
- upload indexer modules - hooks search into upload
- content search results - nodeapi('search result') used to add extra info, highlighted snippet
- Search module is control module, lots of logic in other modules to implement it
- drupal.org search: large database, lots of noise, too many results, no one goes to 2nd page, pre-processing stemming reduces index size by 30%, will re-index drupal.org this week hopefully
- can lose context info using stemming
- vlado how about using ngrams?
- what was wrong with 4.6: html tag recognition got confused,wildcards destroyed, no advanced matching, coefficients not as optimised
- inspired by trip search
- core search is better but trip search better for smaller sites
- why not google?
- only sees public content, doesn't understand Drupal node structure, Google API is limited in # of queries
- should it be a module?
- needs to be examined!
- improvements: examine search patterns, determine guidelines for module developers to add search functionality
- lots of questions: people should use the other room :-) !












