Emojis and Sphider

Quite sometime back, Sphider had an indexing issue when emojis were encountered on a web page. The sql errors would fly! The solution at that time was to filter out emojis before storing in the database. This solution was working just fine, but admittedly the filter has not been updated and there are ALWAYS new emojis making their appearance.

While even the new emojis themselves have not been an issue, there was a very curious case of an emoji-free site in which the filter was clearing the entire full text of pages and storing — NOTHING! Well, that isn’t good. The workaround for that site was to disable the emoji removal function. Not an ideal fix, but very doable. As to WHY the function has this effect on that particular site is still a mystery.

But now may be the time to revisit the need for the filter in the first place. At the time the filter was installed, Sphider used the default MySQL utf8 scheme, which is 3-byte. Some emojis are 3-byte, but the vast majority are 4-byte, with even a few 8-byte emojis. You see the problem, don’t you? MySQL is not going to be happy when you try to stick a 4-byte character into 3 bytes!

Since that time, however, Sphider has moved to utf8_mb4, which IS 4-byte. This means that the troublesome 4-byte characters WILL fit into the database. As to those 8-byte emojis, well they are commonly composed of TWO 4 byte characters, which means — NO PROBLEM!

The next version of Sphider, 2.4, is VERY near release. The emoji filter remains in place. But after serious thought and consideration, and some testing, and this filter may be removed in the following release.  It is logical, but how will it test out?

Contact Us has been fixed

Well, it seems yet ANOTHER WordPress plugin “updated” itself into being useless. We found out our Contact Us page wasn’t working. The cause? An “updated” plugin. We rolled the “update” back two versions and the form is working again. Reading more about the issue, I found that the developer does just like Microsoft… instead of taking responsibility for the issue, they pass the blame, in this case, to whoever developed the theme! How many times have I heard: “There’s nothing wrong with our app. It must be your setup.”

I guess they are following the old Microsoft adage:
Update it until it breaks. We won’t be happy until you aren’t.

What to expect in Sphider 2.4.0

Sphider 2.4.0 is on track for an April 10th release. For the user, the changes are focused on cosmetics. Up until this point, search results ALWAYS had a result number and, after the description, a text url to the page containing the search result. In 2.4.0, you will have the option to either display or not to display those items. Also, the option to display the page’s indexing date has been added.

As to search templates, what were probably seven of the crappiest, lamest templates to have ever seen the light of day have been scrapped. Seven NEW templates are being introduced. Depending on your tastes, you might consider some of them crappy, too, but at least they have a bit of style to them. The “newspaper” template was introduced in an earlier post. Here are the other six:

“black” template
“green” template
“grey” template
“simple” template
“terminal” template
“yellow” template

The “green” style is, well, VERY GREEN! The purpose isn’t so much for actual use as to demonstrate the ability and flexibility of CSS in creating your own templates, even using an image as a border.

The “yellow” template features a bit of simple artwork in the upper left corner. This artwork is “logo.png”, located in the templates/yellow directory. The size is 150×150 and has a transparent background. By creating your own similarly sized logo/picture/artwork, and replacing “logo.png”, this template can be customized for your website.

Since everyone has different tastes, different needs, and every website is somewhat unique, these templates can serve as guides in customizing your own templates. With all the above, the ONLY thing different is the CSS.  Start with a copy of the “standard” template and start tweaking away! The basic Sphider modules remain the same.

Additionally in Sphider 2.4.0, the ‘settings’ table has been completely reworked. While this change is transparent to the user, it will make life much easier on the developer as Sphider moves forward.

Besides some minor fixes and tweaks, the only other big change is in the word stemming process. While the majority of Sphider users probably never use word stemming, those who do will be pleased to learn that the algorithm (for English) has been updated to Porter2. Completely new is the ability to use stemming for ten other languages!

The next Sphider is in the pipeline

Sphider 2.3.1 is brand new, but work has already begun on 2.4.0.

Among the features already being implemented are the ability to hide the result number when displaying search results. Also, for the regular text search, the option to display the index date is being added. (This will not be available for the image or RSS searches.) The RSS and image searches will have the option to turn off the advanced search features.

A new template is being added. Unlike nearly all the current templates, this one has some class. Here is a screen shot:

The Newspaper template

In the sample above, in “settings” the result number is turned off, the index date is turned on, and the description length has been increased to 1000.

Probably the biggest change will be transparent to the user. The “settings” table is being reworked. As Sphider has changed, so has the table, with new columns being appended on a regular basis. Now, while the position of columns within a table is totally immaterial to functionality, after awhile it can be really confusing for the developer having to bounce all over the place to gather data.  This change will organize the data in a regular flow which will be much easier to maintain going forward.

Other improvements are also being considered, but whether or not they are implemented at this time is yet to be determined. No release date has been set.

When 2.4.0 is released, whenever that may be, the downloads for the SQLite and PostgreSQL versions will likely be removed due to lack of demand.

Also, earlier thoughts of adding audio (mp3, wav, ogg) indexing support to Sphider have been dropped, also due to lack of demand. The actual indexing algorithm has been proven and sketched out, but there is no rationale for implementing it other than “Gee, that’s a neat feature.”

Amazon pulls Anti-Vaccine Videos

Amazon has pulled several anti-vaccination videos from their offerings.

Now, I’m not saying vaccines are good, nor am I saying vaccines are bad. But I WILL say that YOU have a right to get information, pro OR con, and make the decision for yourself. Amazon wants to deny you that right in the name of political correctness.

Here are other links to the three videos banned that I know of:
We Don’t Vaccinate!
Shoot ‘Em Up: The Truth About Vaccines
Vaxxed: From Cover-Up to Catastrophe

Note to Amazon: I’m not big on censorship. Let people judge and decide for themselves. Don’t be a Nanny.

Sphider 2.3.1 Released

Sphider 2.3.0 principally addressed security concerns, but it also was intended to bring Sphider into PHP 7.2 compliance by removing any use of the deprecated each() function. The function was used extensively, and the majority of the code replacement was very run-of-the-mill straightforward. There were four times the usage was atypical. Substitute code was put in place and tested. It seemed all worked well as many sites were indexed and searches performed as expected.

Well! It seems indexing and searching was being done properly — but only for words composed of Western characters. Words utilizing non-Western characters were not being indexed! And any searches for those words not only returned as “not found” (expected since they weren’t indexed), those searches also complained of gibberish characters/words being either too short or too common.

Investigation of the issue led to three of the four code segments replacing the non-standard usage of the deprecated each() function. The code replacements themselves have been replaced in 2.3.1. Testing on the problem sites now shows that all words are being indexed, those containing Western characters as well as those containing non-Western characters. The search anomalies are gone and searches for non-Western foreign languages is yielding expected results.  If a search word really IS too short or too common, it is reported as such, and not as gibberish. Sphider is now truly PHP 7.2 compliant.

Sphider 2.3.1, both legacy and PDO, are available for download on this blog’s download page, or from the Sphider Home page.

Sphider – PDO vs MySQLi

There are TWO editions of Sphider… the classic edition using MySQLi and the PDO edition.

Why are there two versions? The classic edition uses MySQLi and prepared statements. While MySQLi, by itself, does support prepared statements, there are a couple functions used in Sphider that require MySQLnd (the “nd” stands for “native driver”). These functions are used because they are the most efficient way of doing things.

MySQLnd has been the default driver since PHP 5.4. If you install a modern version of PHP and want MySQLi, you are going to get MySQLnd. Yet SOME hosting companies DISABLE MySQLnd for those using shared hosting. (I suppose they want people to shell out a few more bucks to get VPS or Dedicated hosting.) In those situations, the classic edition just ain’t gonna work! So, there is the PDO edition.

There are those who will tell you that PDO is what you should be using anyway. They will tout how versatile PDO is, how it can do anything MySQLi can do, only better. It is true the PDO IS versatile. It can work with many different databases, not just MySQL. But there ARE some things PDO just can’t do, at least not efficiently. And there is overhead. And memory requirements.

With PDO:              PHP <==> PDO <==> Your data
With MySQLi:       PHP <==> Your data

The classic version of Sphider is the better, more capable edition! The PDO edition is capable enough PROVIDED you aren’t trying to build your personal version of an internet search engine. It IS possible to tax the PDO edition to the point it chokes. (It is probably possible to choke the classic edition as well, but it takes more effort.)

Remember, the intent of Sphider was/is to index a web site for the benefit of that site’s visitors. In can be used to index a number of related sites for the same purpose. An individual may stretch Sphider for personal use to index MANY sites… but it is STILL just a small indexing tool and not a Google replacement!

NOW… the final point. If you REALLY need Sphider to stretch its capabilities to the absolute limit, maybe you should be using the classic edition and not PDO. If that is the case, shell out a couple extra bucks to your host so you can get access to MySQLnd. Don’t try to pull a 20′ travel trailer with a Honda Civic.

Relativity Space and 3D printed launchers

It was recently announced by NASA that Launch Complex 16 at Cape Canaveral (unused since 1988) is to be turned over to Relativity Space. They intend to use the complex to launch their methane/LOX Terran 1 launcher. This 2 stage rocket stands 100 feet tall and can launch a 2750 lb payload into low-Earth orbit.

But the real kicker here is that the Terran 1, engines, tanks, and other structures, will be produced by a 3D printer! 3D printing has come a long way. The Terran 1 isn’t going to be plastic… at least not pure plastic. Their printer is a metal 3D. It doesn’t actually produce pure metal, but a metal mixed with a plasticizing agent.  This leads to higher strength and greater durability.

I don’t doubt the ingenuity of the folks at Relativity Space. But maybe I’m from Missouri. I want to see this thing fly a few times before I put much stock in its reliability. In my mind, a metal/plastic mix is… well… plastic!

Beware of the Podamibe Custom User Gravatar

I had been using a plugin, “Podamibe Custom User Gravatar”, for quite sometime. After a very recent “update” of this plugin, I was locked out of my own account!

My host support investigated and found Podamibe was the cause and disabled it for me. I was then able to log in, at which point I UNINSTALLED this misbehaving piece of software. I found another, less intrusive substitute.

Just thought you would like to know….