PHP and MySqlnd (to make Sphider function)

Time to revisit the issue of enabling mysqlnd in PHP in order to make Sphider function. I have given instructions in the past on how to enable mysqlnd if it isn’t already. I believe those instructions were either unclear or incomplete.

A bit of history… At one time, mysqlnd was a separate module from mysqli. That is no longer the case. Mysqlnd (mysql native driver) is built into PHP. However, some hosting companies, particularly where shared hosting is concerned, have the installation configured so as to NOT enable mysqlnd. Why this is I do not know.  Why not just have it enabled from the get-go since it is a NATIVE driver and is already part of the package!

Fortunately, most users do have access to CPanel, which allows the user to change the configuration. But this can get interesting as the method is a bit counterintuitive.

To enable mysqlnd, you need to disable mysqli (which you really aren’t doing!), but then you also need to enable nd_mysqli! Are you confused yet? Trust me, this works. I have two shots of CPanel showing the CORRECT settings to get mysqlnd working on your system.

UNTICK mysqli
TICK mysqlnd
TICK nd_mysqli

Save configuration.

Trouble indexing a website with Sphider

Some websites just don’t index very well. Here are some examples, and a solution if there is one.

You are trying to index “” and you get an initial 301 error and get no further.  You try the hack for the fake 301 errors (my last post), but that doesn’t work. Cause? There just might be a REAL 301 error. A browser will take the “” and follow the redirect to “”! Sphider isn’t that smart.  The fix is to charge Sphider to look for https instead of http.

You try to index “”, the initial page indexes, but no further pages are found, or some pages are found but not others. The likely cause is that the website uses https and http interchangeably. That might work for a browser, but not for Sphider. As a Sphider user, there isn’t much you can do except hope the website owner does some editing and makes his/her references consistent.

After the first page, some site will not index very well. A possible cause is a heavy use of JavaScript in forming the pages, particularly the references (links). Sphider does not index JavaScript.  As a Sphider user, there is nothing you can do.

You are indexing a site and you get a lot of garbage results.  Sphider is built for full four-byte UTF-8. Not all websites have UTF-8 pages, and that is fine because Sphider knows that and performs conversions if needed. Not every web page tells Sphider what encoding it does use, and that is okay, too. Sphider is pretty good at figuring these things out. But sometimes, thankfully not common, a web page will be written with one encoding but explicitly state that is a different encoding. For example, a page written in Windows-1252 but declaring it is UTF-8 isn’t going to be converted to UTF-8 because Sphider has been led to believe it already is! Result is going to be some strange index results. Even worse, a page is UTF-8 but says it is something else… Believe me, converting UTF-8 to UTF-8 is going to be a mess! As a Sphider user, nothing you can do about a poorly written web page.

Another scenario is that you are indexing away and Sphider suddenly quits. You investigate and finally find it exited with a PHP exhausted memory error. After looking further, you see the error occurred on a file that Sphider shouldn’t even be processing. In one instance, I had PHP crash while trying to index a .swf (flash) file. Sphider SHOULD have reported a .swf file as “Not text or html” and gone on to the next page. I tore my hair out trying to see what the issue was, and it turns out the website was sending an erroneous header report the WRONG “Content-Type:”.  I had other strange halts with PHP errors on that same website, and the  cause each time was an incorrect header stating the content type. Shy of writing a huge function to determine content type from the file extension instead of reading the headers sent, the only thing a user can do is identify all the page’s problem url’s and put them in the “Must not” section of the site settings. As an aside, I am clueless as to how a website can send the wrong file headers. Anyone out there have insight on this?

Sphider and Sphiderlite — and 301’s!

Sphider 4.2.0 and Sphiderlite 2.2.0 have recently been released. These editions corrected a few issue which have slowly crept in. Stray white space was interfering with phrase searches,  Some MySql installations (or was it PHP?) was causing some mysqli errors which resulted in dropped connections. We discovered some new code deprecation in PHP 8.1. Filters started to cause some corruption of certain Unicode characters.

Well, these recent releases corrected those issues. And even though these releases are stable, we have more improvements on the way! Sphider 4.2.1 and Sphiderlite 2.2.1 pre-identified  some code deprecation from the not-yet-released PHP 8.2. We also improved identification of web page encoding. On rare occasions, a web page would throw an error during indexing due to a wrong interpretation of the page encoding. The odds of that happening have been greatly reduced. (NEVER say it can’t happen!) Also, the size of a spidering log is now displayed in the spidering log list.  Look for these releases very soon!

One “issue” that remains is that SOME websites, typically WordPress sites, just refuse to be indexed! MOST WordPress sites do fine … some don’t. The very first page comes back with a “301” (relocated) error, no other pages are found, and the indexing run halts with nothing being indexed. Upon investigation, the 301 is bogus. There is no redirection. We thought maybe it is something with WordPress, but now doubt that is the case. We really don’t have a clue as to the cause. Our latest thought is MAYBE it is something done intentionally to ward off indexing by small potatoes, like Sphider?

If anyone out there knows the cause of these phony 301 errors being given to Sphider, let us know!

At any rate, those stubborn pages CAN be indexed by Sphider/Sphiderlite, using a hack. And a hack is exactly what it is … not something you would want as a normal part of Sphider.  The hack can be found on the Sphider forum.

(There are other reasons for web sites that won’t index or won’t totally index, but that is for another post.)

EDIT: 7/15/2022
Found another possible cause of “fake” 301 errors! It may be that some websites do not like or recognize the User Agent string and block the crawl with a 301 error. Changing the User Agent string (in Settings) may help!

Where does Space begin? And should the definition be changed?

The Fédération Aéronautique Internationale (FAI) was founded on 14 October 1905. In the 1960’s, the FAI established the Kármán line as the beginning of space.  The line was named after Theodore von Kármán,  an engineer and physicist with interests in aeronautics and astronautics.  While von Kármán wrote about and discussed a range of values, but himself never came out and said “100 km”, that is the value which was generally agreed upon. Note that I say “generally”… NOT universally!

The Kármán line is widely recognized as the boundary of space, but not all countries or entities agree. In the United States, the United States Air Force defined space as beginning at 50 miles altitude (approximately 80 km). The FAA, in turn, agrees. Several pilots of the X-15 were awarded astronaut wings based on this definition. (A couple of X-15 flights did meet the FAI definition.)

Now comes the day when space tourism has arrived, and a couple companies are already offering sub-orbital flights to space. Blue Origin, using its New Shepard vehicle, is reaching 100+ km in altitude. Virgin Galactic and its SpaceShipTwo are reaching above 50 miles, but falling short of 100 km (approximately 62 miles). Competition leads one to declare they are the real deal, while the other also claims legitimacy.  (No matter who is “right”, either one is, in my opinion, one hell of a ride!)

There has been discussion among many (not JUST Virgin Galactic) that the definition needs to be changed, and the most common argument is for the 50 mile definition.  One argument, and a strong one at that, is that 50 miles  is the top of the mesosphere. The Kármán line, at 100 km,  is in, but not on a boundary of, the thermosphere.

Most definitions of the mesosphere gives it a lower limit of 31 miles (50 km) and an upper limit of 53 miles (85 km). In actuality, the limits and thickness have much to do with latitude and time of year.  The upper limit may actually be 53 to 62 miles (85 to 100 km), and some sources even say 74 miles (120 km).

Fifty miles altitude was chosen by the USAF is because it is at that altitude that aerodynamic lift of aircraft becomes negligible.

Here are other things to consider:
– lowest limit of low Earth orbit is about 160 km
– lowest 1 day orbit without reboost is 200 km (120 miles)
– lowest single orbit before reentry is about 125 km (80 miles)
– NASA determined that the space shuttle began to “feel” aerodynamic drag at 76 miles (122 km)
– already mentioned, lift disappears at 50 miles (62 km)
– University of Calgary in 2009 found that the behavior of ions changes at 73.3 miles (118 km)

Given that there are many ways to define the “beginning of space”, and that the mesosphere really can’t be absolutely defined, and that the Kármán line is also somewhat arbitrary, should we leave things alone (and let people argue about who is “really” an astronaut), or pick a definition and stick to it?

If we are to choose one, which one? I for one think we need a single LEGAL definition. Also, because the Kármán line is, in reality, somewhat arbitrary, it should NOT be it.  The x-15 pilots were heroes of mine, and I would hate to seem them lose their astronaut status.

But let’s be practical. The border between the mesosphere and thermosphere would be nice, but that isn’t a FIRM altitude. It COULD be as high as 120 km. The lowest possible single orbit without reentry is about 125 km.  The space shuttle began to “feel” atmospheric drag at 122 km. Ion behavior changes at 118 km.

I guess I would have to vote for 120 km (74.56 miles) to be where space should begin, at least where manned spaceflight is concerned.

Sphider 4.1.0, SphiderLite 2.1.0 are coming soon!

The next releases for Sphider are just around the corner.  There are two major changes in these releases.

The first change is the REMOVAL of the re-index restart ability. WHAT? WHY? Re-index restart was introduced because of an issue in which sometimes a re-index run gets interrupted. This was an attempt to be able to do another re-index, picking up where the last one stopped. The process worked — kind of — and not in all circumstances. The first issue was that the restart HAD to be the very next thing done, and certain steps HAD to be followed. So it wasn’t user friendly. Secondly, the restart HAD to be during the SAME session, a condition which was often not met and was totally out of the users control. The reason a re-index run stops is often because the session ended! In other words, IF the restart worked, it worked nicely. But when it didn’t worked, which was often, it left a bigger mess than the original incomplete index.

For those who feel the process was something they can’t live without, Sphider 4.0.2 and SphiderLite 2.0.2 will remain available for download upon special request until such time as I can add the instructions duplicate the restart functionality to the Sphider MODS board at

The second change has to do with sitemaps. Sphider has had the ability to index using a sitemap, but with a caveat — the sitemap had to be a simple sitemap.xml with a list of links to pages. Many larger sites have a sitemap.xml which consists of links to other sitemaps. Sphider 4.1.0 and SphiderLite 2.1.0 can handle this. One thing to be aware of is that with a larger site, it might take a very significant amount of time for Sphider to digest these maps! Sphider may appear frozen for awhile as it works in the background. Just watch in the browser tab for signs of activity.

Sphider: Indexing from sitemaps

Sphider can index using a sitemap — PROVIDED it is a traditional sitemap of url’s and not a sitemap directory listing additional sitemaps (which contain the url’s). This is popular on larger websites.

Well, we have been playing with a mod that can change that! Initial tests show that just might actually work! We have found one instance that can mess up the process and have disarmed it. The question is, are there other instances that can derail us? Only extensive testing will tell.

We will post the mod in the Sphider Help Forum, but will also provide it here.

In spiderfuncs.php, find the function getSiteMap(). Modify the function with the bold code as follows:

function getSiteMap($input_file)
$links = '';
$sitemap = simplexml_load_file($input_file);
if ($sitemap != '') {
$links = array ();
foreach ($sitemap as $url) {
// For some reason, wlwmanifest.xml interfers with the recursion
// Therefore, let's ignore it
if (preg_match("/wlwmanifest\.xml$/i", $url->loc)) {
if (preg_match("/\.xml$/i", $url->loc)) {
$submap = $url->loc;
foreach ($submap as $input2) {
$sitemap2 = simplexml_load_file($input2);
if ($sitemap2 != '') {
foreach ($sitemap2 as $url2) {
$links[] = ($url2->loc);
} else {
$links[] =($url->loc);
$links = explode(",", (implode(",", $links)));
return $links;

Let us know if you try this, and ESPECIALLY if there are issues!

Is Sphider obsolete?

IS Sphider obsolete!?

NO! At least not yet. Sphider is on the road to obsolescence, but it’s not there just yet. Before going into further detail, I wish to point out a few things.

The intended use for Sphider is for a website to have an internal search feature for that particular site. Sphider is, and never was, intended to be a personal Google, Bing, Yahoo, Yandex, or any other search engine. Yes, it is capable of indexing more than a single website, but even there it is intended for indexing perhaps a family of related sites.

Next, keep in mind that Sphider first debuted when the web was a much simpler place. Websites consisted of a series of files and sub-directories (some might refer to them as folders). A website had a home page, often named “index.html”, or “index.php”, or “index.aspx”.  There might be a directory named “products” and files in that directory like “product1.htm” and “product2.htm”. You would access these pages from a browser with something like ‘http://bigfactory/products/product1.htm”. For many websites today, this is still a valid scenario. Maybe “https” has largely replaced “http”, but it is still the same concept.

The reality, though, is that the web is changing. Take this blog, for example. It uses WordPress, and in a pretty basic, almost primitive, way. There are quite a number of pages. There is even a contact page, which judging from what appears at the top of the browser, is located in a directory named “contact”. But you know what? There is no such directory! There is an “index.php”, but it doesn’t contain anything like what you see on the home page of this blog. Since this blog is not very complex in the way it is laid out, Sphider can index it, although the results are rather messy! That is okay, since WordPress has it’s own search functionality if the user wishes to implement it.

You will notice that the “downloads” page of this blog has a url of “”. There is no name in the traditional sense, no page extension (htm, php, etc.). It looks like it is a directory, so the default would be “index.php” or something? NOPE!

This isn’t just a WordPress thing. This is the future of the internet. As time goes by, more and more websites are going to become like this. Cpanel settings, htaccess settings, iframes, api’s, server configurations… These all are evolving.

So what does all this have to do with Sphider? Sphider uses old technology, technology which is still in large use today. But that use is diminishing. Sphider is going to try to index some websites and immediately end with a “Relocation: 301” message and never get a step further. So why can’t Sphider simply follow the 301 and start indexing that page? Because it is a 301 that only Sphider can see. It isn’t a REAL 301. There is no redirect header, no redirect in htaccess (Apache servers). This is all in configuration. Sphider needs a file name, and increasing there just simply is no file name. It’s a modern website using features Sphider is not equipped to handle.

So is Sphider dead? No. Is Sphider dying? No, Sphider is not dying, but the universe in which it works is definitely shrinking. As websites evolve, the number of websites able to utilize Sphider is going to decrease.

So what about Sphider now? What is its future?

I don’t see any feature changes or additions in the future. Sphider will continue to be supported and updated to keep up with the technology it does use. Sphider works with PHP 8.1 and MySQL 8. As PHP evolves, Sphider will keep pace. The same goes for MySQL. Sphider will keep up. If any hidden flaws are found in the code, it will be corrected. If security issues are detected, we will attempt to address them.

As the web evolves, there may come a time in five or ten years, when Sphider becomes a quality buggy whip in a Tesla world. Even then you will still be able to find it residing in some antique software repository. But it isn’t quite ready to hang it all up just yet. – Broken pages!

It seems that quite a number of pages in the Astronauts ‘n’ Cosmonauts section of went blank! What the heck happened?

Well, the site is database driven and the database server got upgraded. That is a good thing, but… one of our table names became a reserved word!

We changed the table name and have updated the queries for the new name. The pages are working again.

All of them? I THINK so, but may have missed a query somewhere, so if you see an empty page, let us know.


Eric Jones, Spammer Extraordinaire

If you have a website with a contact form, you probably know of the “beloved” Eric Jones. But it doesn’t have to be that way.

If your site is Word Press, get an Akismet account and API key and install the Akismet Antispam plugin. If your contact form is Contact Form 7, these lines in the form will block this prolific spammer:

<label> Your Email (required)
[email* your-email akismet:author_email] </label>

I believe Akismet will also work with other contact forms, such as Gravity.

If your website is not Word Press, but can handle PHP pages, a custom contact form can easily include a filter to exclude his email id. Here is a sample snippet of code:

function ValidateMail($Email)
global $HTTP_HOST;
$result = array();

if (strcmp($Email,””) == 0) {
$result[1]=”Known spammer”;
return $result;


Sphider 4.0.0-MB and SphiderLite 2.0.0 released

The backup and restore utilities have been reworked to use MySQL directly. This provides higher dependability than depending on PHP.  Also, a limited ability to resume a re-index process which has been interrupted has been introduced. The process to determine page character set has been enhanced. Language file conversion to Unicode has been completed. Obsolete versions of code have been removed and general code cleanup done. Further safeguards against indexing of illegal characters has been implemented. SphiderLite has had more remnants of the full version removed.