Sphider 2.0.0 nearing release

Sphider 2.0.0 is under going final testing and will be released probably by mid-October.

Virtually every file has gone at least some alteration. The features of Sphider 2.0.0 are:
– Better page charset handling to ensure that the database receives only UTF-8 input. UTF-8 encodeing of web pages already in UTF-8 format is avoided to eliminate garbled entries.
– Phrase searches have been improved.
– This version is PHP 7.1 ready.
– Integrated indexing of images, with the option to NOT index images. An image search page is also provided.
– RSS content may also be indexed and searched.
– Jquery has been updated to a more recent version.
– While not fully PSR-2 compliant when it comes to PHP coding standards, the code is a LOT closer than it ever has been. This involved the renaming of many functions, the elimination of a few functions which were found to be obsolete (and thus, unused). Coding style had to be changed virtually every module. This is why so much code has been altered, affecting nearly every Sphider PHP code segment.
– The search page is integrated for legacy, RSS, and image searches. Knowing that RSS and images are something not every user will be interested in, an updated (as in 2.0.x compliant) version of the 1.6.x search page is provided. The revised 1.6.x search form, it will work fine with 2.0.x. It will need to be renamed to replace the provided search.php.

Also, finding that porting PDO to databases other than MySQL was messier than anticipated (too many DB specific requirements for each), Sphider 2.0.0 will actually have 4 flavors. The “kits” for PostgreSQL and SQLite were too cumbersome and confusing.
1) The legacy Sphider, using the MySQL database (or MariaDB) and using MySQLi and MySQLnd.
2) PDO Sphider, also using the MySQL database (or MariaDB), but using a PDO implementation (for installations lacking MySQLnd support).
3) PostgreSQL version using a PostgresSQL database and accessed via PDO,
4) SQLite version, using a SQLite database accessed via PDO.

All flavors are testing well and it seems no more coding changes will be needed, after working out some “peculiarities” for each. Now each version must have a final full set of operations performed to ensure everything works. This includes new installation via PHP script, installation using SQL queries, upgrade installation, adding sites, indexing sites, deleting sites, adding, editing, and deleting categories. Also the same is done for RSS indexing. The search functions need to be tested for various situations. We have found a few websites which have, uh…., what you might call “unusual” methods resulting in unusual problems. (Ever seen an image “alt” tag with text running in excess of 1000 characters? We have!)

Future considerations for Sphider (but not guarantees)

I’ve been giving thought to just what should come next for Sphider.

Integrating the Sphider Image Indexing functions with the main Sphider, thus making content and image indexing a single operation is a rather obvious improvement.

The ability to index and search RSS feeds would also be a nice addition. I actually have an alpha of this running on both Linux and Windows machines. Since the spidering operations can be done from a command prompt, a simple cron is keeping the feeds updated on the Linux box. The Windows task scheduler is being a bit more stubborn, mainly because of a pesky PHP error I haven’t solved yet. PHP is fine in a browser, but the command prompt is giving trouble. It works, but I keep getting an error that DEMANDS a response! I’ll figure it out.

Since searching for content is different from searching for images, which in turn is different than searching for RSS feeds, three different sets of search and results pages are needed. To a user, the only obvious difference is the search page, as the results portion is integrated. So I am giving thought to a possible “unified” search page with tabs so that the appropriate search form (and corresponding results) can be present to the user. This is not definite yet, just a thought.

These are all ideas for the future. For now, version 1.6 remains the latest. If the need arises, minor release improvements/fixes are not out of the question.

Anything you would like to see in the Spider of the future? Give me your ideas and … well, who knows? It might be a very good, very doable idea!

Sphider 1.6.0 Released

Sphider 1.6.0 and Sphider 1.6.0 PDO version have been released.

Also released is the Sphider Image Indexer, a companion add-on to Sphider allowing the user to index and search images from a website.

And finally, there is also a conversion kit which will allow the PDO version of Sphider to work with SQLite databases in place of MySQL.

Sphider 1.6 Release Status

The regular version of Sphider 1.6.0 and the associated Sphider Image Indexer are completed, tested, and ready to go. Since I want to release the PDO version in tandem, that is the only hold up.

The PDO version and associated Image Indexer are also essentially completed, but undergoing further testing due to some last minute code changes. These changes involve code portability between database types. The release, as usual, targets MySQL (and presumably, MariaDB). There will also be a small set of four replacement modules (install.php, database.php, db_main.php, and db_backup.php)  available targeting SQLite users! It is anticipated that a similar set will soon be introduced for PostgreSQL users. The power of PDO will finally come to be realized.

As soon as everything has been more thoroughly tested, the appropriate zips will be posted in the Downloads section.

Preview of the OPTIONAL Sphider Image Indexer search results

Work has progressed to the testing phase of both Sphider 1.6 and the OPTIONAL* Sphider Image Indexer. This is a screenshot of the results of an image search during testing. To get these results, the PHP installation needs to have the imagick module installed. The search will still work without it, but the thumbnail previews will be absent. The rest of the results will remain. Search is in the choice of image name, image url or alt tag contents. Search can be for all indexed sites or be site specific.

Release date of mid-July.


* – Sphider 1.6 will work normally without the Sphider Image Indexer and will automatically detect when it has been installed. Image indexing is integrated into Sphider.

What’s next for Sphider?

Work is proceeding with Sphider 1.6!

What will be new in 1.6?

  • The ability to truncate selected tables from the database tab
  • The ability to clear all site data without deleting the site
  • The ability to crawl a site using a sitemap.xml, provided one exists
  • The option to preview pages from the results listing
  • An issue with resuming suspended indexing has finally been resolved
  • Support for an optional Sphider Image Indexer

At this point, the changes have been made in both the vanilla and PDO versions of 1.6 and testing is ongoing.

And what? An optional Sphider Image Indexer?  This is an add-on that will work with Sphider 1.6. You will be able to build a catalog of images from sites where you have previously indexed the pages. Currently, the indexer itself is being tested, with excellent results. Work has begun on an image search function, but that is still in the VERY early stages and nowhere near being a viable tool. While the indexer required some modification of the core Sphider, the search function will not.

What this means is that once testing of the vanilla and PDO versions of 1.6 are complete, it can be released. The Image Indexer add-on still has to have the search function completed, then both the indexer and search function ported to PDO, and finally fully tested. At that time it will be released as version 0.99.

Since the search function of the add-on is in the very early stages of development, input as to how you would like to see it operate would be considered.

Just what IS this Sphider, anyway?

Sphider is a program designed to visit a web site in an ordered fashion to find the information necessary to create an index for a search engine. This, in turn, allows the site to be searched for pages containing certain keywords or phrases. Spidering programs are also called web crawlers or bots. They operate by following the hyperlinks on each page.

The crawlers which build major internet search sites (Google, Yahoo, Bing, etc.) are quite sophisticated and can find not only keywords and phrases, but images and other content as well. The ranking system of these crawlers is equally sophisticated. Not only are keywords, considered, but so is keyword location and density, relevancy, traffic patterns, tld names, page design, and domain registration length. In fact, Google has a list of over 200 page ranking factors.

Sphider is much simpler. Pages are ranked solely on keyword weighting. Keyword weighting is calculated by word position and frquency and the user has a level of control over the weighting process. Images are not indexed and relevancy is not a factor (although better word position and greater frequency DO indicate higher relevance). While Sphider can index practically any website, the main purpose of the application is for the user to index his or her won website so that an internal search can be made available to site visitors.

There are a number of Sphider flavors. The original Sphider (version 1.3.6) can be found at http://www.sphider.eu. It is free, but has the disadvantages of being insecure and badly outdated. It is no longer maintained and will start throwing errors on any system running PHP 6.6 or greater. It will not function at all on PHP 7.

Sphider-Plus (http://www.sphider-plus.eu) and Sphider-Pro (http://www.sphiderpro.eu) are both paid versions of the original and do have added features. I cannot speak as to security or support. Sphider-Pro is at version 3.3, which has a date of 2013, so that may not speak well as to its status. For a small website, many of the enhancements provided by these variations may be overkill.

Then there is the Sphider located here on our Downloads page, It, too, is based upon the original, but has been updated. It functions without error with PHP 5.5 or greater, even with PHP 7. It is much more secure. All SQL queries are made using prepared statements to avoid the risk of SQL injection. Other security measures have also been taken. We even have a variation (PDO) which can not only operate in environments lacking MySQLnd support, but can be used with databases other than MySQL (with some tweaking). It can work with SQLite, PostgreSQL (port kits available for both), ODBC, Microsoft SQL Server, and others. Both the normal and PDO variations are supported. And best of all, they are still free!

Sphider 1.5.4 and Sphider 1.5.4 PDO may not have installed properly

If you did an upgrade, the regular and PDO versions of Sphider 1.5.4 may not have installed properly. You can check whether or not you are affected by checking the Settings tab on the Sphider admin page. If a version other than 1.5.4 (or 1.5.4 PDO) is reported, there is a problem. The settings table in your database is missing a column. Any downloads from this point on will not be affected.

The issue can be easily fixed and is addressed on this sphiderform post.

Sphider 1.5.3 has a similar defect and can be repaired the same way, by editing update_rollup.php and re-running. However, 1.5.3 is not so critical as no changes to the settings table take place excepting for the version number update.