Seth Keiper

Sphider: a FOSS search engine

by Seth Keiper on Jul.27, 2009, under Coding and Programming, Left Brain, MySQL, Overall, PHP

Of course, you could just write your own spider/search engine. Not overly complex nor deeply involved. But what if you need it *now*?

Enter in Sphider. Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. Also the html is templatable. And kicker: it is free and open source under the GPL license.

How to set this screaming application up? Very simple instructions for GNU/Linux:

  1. wget http://www.sphider.eu/sphider-1.3.4.zip
  2. unzip sphider-1.3.4.zip
  3. mv sphider-1.3.4/ search/
  4. edit the file: search/settings/database.php to use your database credentials and in mysql console, run: CREATE DATABASE `sphider`;
  5. So the config file is writable: chmod 666 search/settings/conf.php
  6. As a personal touch, I like to do: ln -s search/search.php search/index.php
  7. You will need to edit the username and password defaults in search/admin/auth.php on lines 3 and 4, respectively.
  8. Then in your favorite web browser, point to: http://yoursite.com/search/admin/install.php. This creates the SQL tables for the sphider database.
  9. Follow the link to: http://yoursite.com/search/admin/admin.php.
  10. Log in with your credentials.
  11. Select the Settings tab.
  12. Some of the key items to change on this are:

    • Administrator e-mail address
    • Temporary directory (this will need to be writable if you are not running under the www-data user)
    • Indexing settings:

      • PDF:

        1. Check the Index PDF files
        2. Input /usr/bin/pdftotext for Full executable path to PDF converter (check if you have this by doing: which pdftotext
      • Microsoft DOC:

        1. Check the Index DOC files
        2. Input /usr/bin/catdoc for Full executable path to catdoc converter (check if you have this by doing: which catdoc
      • XLS:

        1. Check the Index XLS files
        2. Input /usr/bin/xls2csv for Full executable path to XLS converter (check if you have this by doing: which xls2csv
      • PPT:

        1. Check the Index PPT files
        2. Input /usr/bin/catppt for Full executable path to PPT converter (check if you have this by doing: which catppt
    • Change the User agent string to your own desired setting. Most of the time, I set this to the domain name or landing URL for people who see your search engine in their logs.
    • Do not forget to select the Save settings submit button
  13. Select the Index tab
  14. Input your site's domain name. Remember, www.yoursite.com is not the same as subdomain.yoursite.com. Both have to be indexed separately.
  15. Full indices seem to be the best, so I use that, personally.
  16. Start Indexing!
  17. Now you have a search engine that you can templatize. Take the search form and put it anywhere on your site or include it through php's include() for ever page.
  18. More information is available for sphider in their documentation.

Leave a Reply

You must be logged in to post a comment.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!