Site Search Using Whoosh

| tags: random, programming

I tried Google custom search for my updated site but I was unhappy with the control I had over what got indexed, so I decided to implement my own.

A quick search revealed the awesome Whoosh search library. It is very well documented and super easy to use.

I wrote a 50-line python script (buildIndex.py) to walk the _site directory and index the generated html files. It would be cool to integrate this into the Blogofile build process as a filter but I knew I could get a separate indexer going quicker.

I've got a 40-line python cgi script (searchIndex.cgi)to implement a jsonp service to search the index. This is called by 26 lines of javascript (search.js) on the search page to display the results.

I'm very satisfied with the result and the ease of getting it going. I simply added the buildIndex script to my Post-Receive Service Hook for automatically updating the site when I push to github. It takes about 15 seconds to index my 350 pages. Searches take milliseconds.