MySQL
Sphider: a FOSS search engine
by Seth Keiper on Jul.27, 2009, under Coding and Programming, Left Brain, MySQL, Overall, PHP
Of course, you could just write your own spider/search engine. Not overly complex nor deeply involved. But what if you need it *now*?
Enter in Sphider. Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. Also the html is templatable. And kicker: it is free and open source under the GPL license.
How to set this screaming application up? Very simple instructions for GNU/Linux:
- wget http://www.sphider.eu/sphider-1.3.4.zip
- unzip sphider-1.3.4.zip
- mv sphider-1.3.4/ search/
- edit the file: search/settings/database.php to use your database credentials and in mysql console, run: CREATE DATABASE `sphider`;
- So the config file is writable: chmod 666 search/settings/conf.php
- As a personal touch, I like to do: ln -s search/search.php search/index.php
- You will need to edit the username and password defaults in search/admin/auth.php on lines 3 and 4, respectively.
- Then in your favorite web browser, point to: http://yoursite.com/search/admin/install.php. This creates the SQL tables for the sphider database.
- Follow the link to: http://yoursite.com/search/admin/admin.php.
- Log in with your credentials.
- Select the Settings tab.
-
Some of the key items to change on this are:
- Administrator e-mail address
- Temporary directory (this will need to be writable if you are not running under the www-data user)
-
Indexing settings:
-
PDF:
- Check the Index PDF files
- Input /usr/bin/pdftotext for Full executable path to PDF converter (check if you have this by doing: which pdftotext
-
Microsoft DOC:
- Check the Index DOC files
- Input /usr/bin/catdoc for Full executable path to catdoc converter (check if you have this by doing: which catdoc
-
XLS:
- Check the Index XLS files
- Input /usr/bin/xls2csv for Full executable path to XLS converter (check if you have this by doing: which xls2csv
-
PPT:
- Check the Index PPT files
- Input /usr/bin/catppt for Full executable path to PPT converter (check if you have this by doing: which catppt
-
PDF:
- Change the User agent string to your own desired setting. Most of the time, I set this to the domain name or landing URL for people who see your search engine in their logs.
- Do not forget to select the Save settings submit button
- Select the Index tab
- Input your site's domain name. Remember, www.yoursite.com is not the same as subdomain.yoursite.com. Both have to be indexed separately.
- Full indices seem to be the best, so I use that, personally.
- Start Indexing!
- Now you have a search engine that you can templatize. Take the search form and put it anywhere on your site or include it through php's include() for ever page.
- More information is available for sphider in their documentation.