Considering another Sphider improvement

The original version of Sphider had very erratic support for indexing HTTPS pages, and wouldn’t even look at the robots.txt file on a HTTPS site. That failing has never been addressed, and even the latest version, 1.5.2, has the same failings when it comes to HTTPS. This has never really been an issue for me before, and even now it is more annoyance than issue as I can work around it.

Still, the “problem” does seem intriguing. After a bit of experimenting, a fix may not be all that difficult. (Famous last words, right?)

I am debating now whether or not to continue investigating alternatives and make more code changes which would improve HTTPS support in Sphider, not only to ensure more reliable connectivity but to enable the robots.txt to be utilized as well. I don’t know that there is that big of a need. We’ve never received any complaints or comments on the issue…

Anyway, at this point there is a POSSIBILITY, but no definite plans one way or the other.

*******************************

UPDATE (Apr 6): I was able to get the robots.txt file read from a https site. First problem, regardless of http or https, the parsing of allowed or disallowed user agents and disallowed files/directories was iffy. If the robots.txt file had lines like “user-agent” or “disallow”, it was parsed, but “User-agent” or “Disallow” was not. It was a case issue. That is now fixed (on my side, not published yet). Second problem, now that I know the file IS being read and parsed, Sphider will STILL index some files in disallowed directories!

If you have any files or directories listed as “url_not_inc” in your settings, that will work, but not the robots.txt disallows, even though that SHOULD be the case. Well, this situation certainly has gotten my interest!

*******************************

UPDATE (Apr 7): I have begun the process of troubleshooting the code to see what is going awry and where. Working alone and having other things to do in life, this can be both time consuming and frustrating. So far, I do know the robots.txt is read and parsed properly. Just where and why the instructions are not acted upon is another matter. At least the question of whether or not I will be attempting another modification has been answered!

*******************************

UPDATE (Apr 8): GOT IT! Preliminary tests show robots.txt is now being followed in both http and https. More testing to follow (found a couple other misc issues and fixed them). Once everything is validated, there will be a 1.5.3. Stay tuned.

Sphider 1.5.2 and 1.5.2.1 (the PDO version) have been released

The newest version(s) of the Sphider search tool have been released and are available from the Downloads tab above. While there isn’t really anything NEW in these releases, they do address a couple of problems encountered. Of most importance, the problem of having Sphider exit during indexing due to web page coding errors on the site being indexed has been addressed. Instead of issuing a fatal error and stopping, only warnings are generated and indexing continues on its merry way. A potential database error when updating the settings has also be thwarted.

Also, the previous PDO version had a bug in which descriptions could disappear from search results listings. This has been fixed.
If you had the previous PDO version (1.5.1.1) and have lost the descriptions, after upgrading to 1.5.2.1, you will need to restore the descriptions by going into the settings tab, go down to the “Search settings” section where it says “Maximum length of page summary displayed in search results”, change the selection to 250 and “Save settings”. (Updating the settings before would change this from the default 250 to either 0 or 1!)

Happy Holidays and Happy indexing!

Sphider 1.5.2 – coming soon

The next version of the Sphider search tool is now in testing. Sphider 1.5.2 (and its companion PDO version, 1.5.2.1) is not very different from the previous version, except for a couple minor fixes on the Settings tab and the fact that the indexing portion has been toned down to issue warnings only when an improperly coded web page is encountered. Sphider 1.5.1 exits with a fatal error instead of continuing to index the site. While improper coding in a web page (commonly having to do with some off beat special character the database has no idea how to interpret) is rare, it sure was a monkey wrench when it came to indexing a web site. A couple other page conditions which could have produced a fatal exit now simply issue warnings (like the url exceeding the length the database could store).

At any rate, both the PDO and non-PDO varieties are now being tested to make sure the intended fixes work properly, and that we haven’t introduced any new problems. Expected arrival at this time is early December.

Blue Origin does it yet again. One booster, three launches, three landings.

On April 2, Blue Origin launched its New Shepard booster for the third successful West Texas landing after a suborbital flight. Previous landings of the same booster previous took place on January 22 and November 23, 2015.

The crew capsule successfully landed by parachute shorty after the booster landed.

SpaceX, which has been successful only once (so far), but it has to be noted that the Falcon 9 is larger and, being orbital, has a greater velocity to contend with. SpaceX hopes to be able to recover and reuse a booster sometime in 2016.

Whether it is Blue Origin or SpaceX, recovering a booster is no simple matter. It is, after all, rocket science!

PDO version of Sphider

Sphider 1.5.1 has proven to be a good, stable version of Sphider. HOWEVER, it seems some people can’t use it because their host chooses not to support MySQLnd, typically for shared hosting. It isn’t because it can’t be done, but because they don’t want to do it. In those instances, if you want MySQLnd, you to have to upgrade to VPS, at an additional charge of course. Sphider users in that scenario now have an option.

We have taken Sphider 1.5.1 and converted the sql to PDO (PHP Data Objects). PDO support is virtually guaranteed. The PDO version is referred to as Sphider 1.5.1.1. PDO has some advantages over MySQLi/MySQLnd, but there are also disadvantages.

MySQLi/MySQLnd is SPECIFIC to a MySQL database, where PDO is a generic supporting a variety of databases, one of which is MySQL. There is an overhead involved. For Sphider, we STILL consider the MySQLnd prepared statement methodology over PDO prepared statements. Reality dictates a PDO version be made available. Our recommendation is that you install the PDO version only if the standard MySQLi/MySQLnd option is not available. If you already have a working Sphider 1.5.1, DO NOT install 1.5.1.1.

One issue encountered was that PDO has no need to use the real_escape_string function…. EXCEPT WHERE IT IS NEEDED!!! The backup and restore functions failed without it. All research indicated “You don’t need real_escape_string, just use PDO prepared statements!” Dogmatic statements like that can come back to bite you. Well, our scenario wasn’t executing sql, it was CREATING sql, specifically, an sql string. Real_escape_string was necessary to create a valid string, and a prepared statement was not possible. We had ALREADY run a query, now we were manipulating the queried data to create a string for LATER use in a different kind of query. So we had to create an emulation for real_escape_string, which was a bit of trial and error. So much for “PDO NEVER needs real_escape_string”.

Working beta, Sphider for WordPress

I now have a working beta version of Sphider for WordPress. You can see the beta in action by clicking on the Search tab. This isn’t a very large blog, so there isn’t much to search for, but you can get an idea.

Suggestions STILL do not work. Accessing the suggest mechanism in test mode shows it IS responding and building a proper json, but is not being passed on as in the normal implementation. Suspect it is something to do with a collision with a WordPress json?

There are probably still some rough edges, but that is what a beta is for… to find those rough edges and smooth them out. Even rough, it functions, which is something the last Sphider for WordPress no longer does!

If you want to give it a whirl, drop us a line and we’ll get the files to you. And, yes, instructions…

THE MORNING AFTER: After a couple false starts, I finally got a package assembled with everything you need. The first package was done late at night and didn’t include everything it should have. I rushed a second version with an addition. There should have been additions!!!


How do you drop us a line? Use the Contact Us form on WorldSpaceflight.com home page, found in Links to the left.

Sphider 1.5.1 released

Sphider 1.5.0 was a major departure from older versions of Sphider in that it incorporated prepared statements, adding significantly to the security of Sphider. It performed very nicely.

But we did not like the database backup and restore procedures. Backup was quick enough, but restore was S-L-O-W!. The larger the database, the worse it got. There had to be a better way. There was, and we found it.  We grew our database to include:

    10 sites
    10 categories (5 top level, 5 sub-categories)
    10, 641 links (pages)
    70,317 keywords
    40,006 kb of cached text
    171,495 kb total size

A backup, producing a gzip file of 14,079 kb, was accomplished in 16 seconds.
A total restoration took 32 seconds. This was a definite improvement over the 6 1/2 HOURS for a smaller database.

Also, as we were no longer looking for coding errors, we began concentrating on the results (or outcomes of admin actions) looking for anything that just was not exactly what we expected to see. We found several bugs which were repaired and tested. Nothing earth-shattering, but bugs nonetheless. Sphider 1.5.1 is the result.

Since Sphider 1.5.1 seems to be the achievement of what we originally set out to do, namely, dispensing with deprecated code, improving security, fixing a few bugs in the original releases, etc., this will probably be the last release for awhile. In the event of some operational problem of immediate concern, a simple patch should be sufficient instead of a whole new release.

Now despite the hours of testing and line-by-line code reviews and results analysis, Murphy’s Law still reigns. We’ll leave it at that.

Sphider for WordPress?

Several years ago, there was a Sphider for WordPress introduced. It was based on the 1.3.4 version of Sphider. Time moved forward, Sphider for WordPress did not. You can still find it. It just most probably isn’t going to work.

A few months back we tried to update it. THAT was a lost cause! So now we have taken our newest Sphider and have started to convert it. It does work, mostly. Still having a few issues, such as suggest doesn’t work and we aren’t sure why not. Also having trouble getting the search integrated into WordPress, although there has been some progress there.

Naturally, since this is a tiny blog, there isn’t much we can thoroughly test it on. Give us a bit more time to get the integration part down and maybe we’ll put it out as a beta, even without suggest working. But maybe we’ll find the problem there, too.

That would be nice, a working version of Sphider for WordPress.


UPDATE: December 15. Integration with WordPress has been accomplished. Suggestions still are not working. Being able to spider and search from WordPress is still a significant achievement. The MAJOR components have been tested and are functional. Still need all the minor branches to be tested.


UPDATE: December 23. Suggestions STILL not working, but Sphider now does a re-index when a post is added or edited. Duplicate domains are being entered in the domains table, but that should be an easy fix. Getting closer to being generally usable.

TLD Mania

ICANN is issuing new top level domains faster than I can create new spam filters to stop the trash coming from the likes of .top, .download, ,mobi, .date, .xyz, .click, .rocks, .wang ….

Supposedly, there really are legitimate web sites using these TLDs, but I personally have yet to actually SEE any of them. The ONLY reason I even know most of these exist is the sudden appearance of a ton of spam from each one of them.

I administer the filters for a number of email addresses, and some of the owners have not been as careful as others and their email address has gotten on some spammer’s list. And once you are on one, you will shortly be on scores of them.

I have yet to see a legitimate email from ANY of the new TLDs.

ICANN thought all these new TLDs would be a great idea, a real boon to the internet. Well, they certainly have been a boon to ICANN and a bunch of shady businesses which are making mucho dinero off their creations! For the rest of us, they are more of a… I’ll be polite and say nuisance.

It’s time for ICANN to pull in the reins, maybe even start phasing out the use of some of the new TLDs. For those legitimate websites (if there really are any) with one of these fad TLDs, my advise would be to give it up get a real TLD. Then I might actually visit you.

Sphider 1.5.0 Search Tool is now live

Find the new Sphider 1.5.0 on our Downloads page.

UPDATE: 1 December 2015,   18:25 UTC. If you downloaded 1.5.0 before this time, the auto-suggest may not work if you installed Sphider to any directory NOT named /sphider. The current posting DOES work.

If you are affected, you do not need to re-download. Simply find “autocomplete.js” in the js_suggest directory,and edit line 7 from:
$.get( “/sphider/js_suggest/suggest.php”, { keyword: keyword } )
to
$.get( “js_suggest/suggest.php”, { keyword: keyword } )