Content-Type meta tags and HTTP response headers

How many of us have used a meta tag to define content type and default character sets? The tag may appear something like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

But do we REALLY understand what is going on? This tag is important when a webpage is being opened locally. It instructs the browser as to what character encoding to use to display the page. This may override the platform default.

But what about when a page is being viewed by HTTP? Well, the tag is important if the HTTP response header(s) being sent fail to designate a default character encoding. What if the response header(s) DO include a default character set? AHHH! The the meta tag is (are you ready for this?)… IGNORED!

Let’s say you designate a page, via meta tag, to have a character set of UTF-8, but your web server is sending a response header setting the default as Windows-1252. Your page is going to display in Windows-1252!

And guess what? Your page, viewed over HTTP,  just may still appear correctly giving you the impression that the meta tag is working! Then you force your browser to actually display in UTF-8 and that beautiful page suddenly becomes what is referred to as “mojibake!”

There are at least a couple ways to get this all sorted out. If you are coding in PHP, one way would be to set the response header in the code for each page. Here is an example PHP header:
header('Content-Type: text/html; charset=utf-8');
This needs to appear in the PHP BEFORE a single bit of HTML is displayed.

Another way is if you have access to your server settings, you can specify a default character set.

Still another way, with Apache servers, is to specify a default character set in you .htaccess file.
AddDefaultCharset UTF-8

So…. knowing all this, just HOW do you go about confirming that the character set you want is the character set actually being set? With Firefox/Waterfox/SeaMonkey, bring up the page in question. Up in the url display area, to the left of the url, click on the little circle with the upside-down “!”. There will be information on whether or not the connection is secure, then a “>”. Click on that. Click on “More information”, the the “General” tab.  This will display the text-encoding AND the meta tags. If they don’t agree, the response header being sent isn’t what you want it to be. This applies to Waterfox in Linux, also.

Google Chrome USED to allow the option to see what the default character set REALLY is, but they removed it. Fortunately, there is an extension that does it for you. The extension is simply named “Charset”, and allows you to not only see what the actual character set is for a given page, allows you to change it. The results may be an eye opener. BTW, this applies to Linux Chromium as well.

What about IE/Edge? You’re on your own! I won’t touch those monstrosities! LOL!!

The future of the PDO edition of Sphider…

Sphider comes in two editions, the legacy version and a PDO version. The legacy version is definitely the more stable, faster, easier to maintain version. The PDO version exists primarily for those who are restricted by their shared hosting providers.

Shared hosting has its advantages in that it is very cost effective (cheap) and very simple to use. It is great for personal use or for small businesses or organizations just getting started on the web.

But shared hosting has its downsides, too. It isn’t nearly as efficient, isn’t as secure, suffers from limited resources, and has limited functionality. One of the features commonly lacking in shared hosting is MySQLnd. Thus the need for PDO.

The are quite a few users of the PDO edition, and to simply drop PDO would be a great disservice. On the other hand, trying to keep the PDO edition in sync with the legacy edition is getting harder and requiring much time and effort.

The PDO version, as it stands, is quite usable. It is PHP 7.3 compliant, so it should be reasonably set for awhile, as the majority of shared hosting plans are still at least a few versions behind 7.3!

The thought is that the time for legacy and PDO to part paths, with most future effort going into the legacy edition. Because of the user base, PDO version 2.4.0 would remain and receive hot fixes as needed.

No decision has been made and feedback will be given consideration.

Emojis and Sphider

Quite sometime back, Sphider had an indexing issue when emojis were encountered on a web page. The sql errors would fly! The solution at that time was to filter out emojis before storing in the database. This solution was working just fine, but admittedly the filter has not been updated and there are ALWAYS new emojis making their appearance.

While even the new emojis themselves have not been an issue, there was a very curious case of an emoji-free site in which the filter was clearing the entire full text of pages and storing — NOTHING! Well, that isn’t good. The workaround for that site was to disable the emoji removal function. Not an ideal fix, but very doable. As to WHY the function has this effect on that particular site is still a mystery.

But now may be the time to revisit the need for the filter in the first place. At the time the filter was installed, Sphider used the default MySQL utf8 scheme, which is 3-byte. Some emojis are 3-byte, but the vast majority are 4-byte, with even a few 8-byte emojis. You see the problem, don’t you? MySQL is not going to be happy when you try to stick a 4-byte character into 3 bytes!

Since that time, however, Sphider has moved to utf8_mb4, which IS 4-byte. This means that the troublesome 4-byte characters WILL fit into the database. As to those 8-byte emojis, well they are commonly composed of TWO 4 byte characters, which means — NO PROBLEM!

The next version of Sphider, 2.4, is VERY near release. The emoji filter remains in place. But after serious thought and consideration, and some testing, and this filter may be removed in the following release.  It is logical, but how will it test out?

Contact Us has been fixed

Well, it seems yet ANOTHER WordPress plugin “updated” itself into being useless. We found out our Contact Us page wasn’t working. The cause? An “updated” plugin. We rolled the “update” back two versions and the form is working again. Reading more about the issue, I found that the developer does just like Microsoft… instead of taking responsibility for the issue, they pass the blame, in this case, to whoever developed the theme! How many times have I heard: “There’s nothing wrong with our app. It must be your setup.”

I guess they are following the old Microsoft adage:
Update it until it breaks. We won’t be happy until you aren’t.

What to expect in Sphider 2.4.0

Sphider 2.4.0 is on track for an April 10th release. For the user, the changes are focused on cosmetics. Up until this point, search results ALWAYS had a result number and, after the description, a text url to the page containing the search result. In 2.4.0, you will have the option to either display or not to display those items. Also, the option to display the page’s indexing date has been added.

As to search templates, what were probably seven of the crappiest, lamest templates to have ever seen the light of day have been scrapped. Seven NEW templates are being introduced. Depending on your tastes, you might consider some of them crappy, too, but at least they have a bit of style to them. The “newspaper” template was introduced in an earlier post. Here are the other six:

“black” template
“green” template
“grey” template
“simple” template
“terminal” template
“yellow” template

The “green” style is, well, VERY GREEN! The purpose isn’t so much for actual use as to demonstrate the ability and flexibility of CSS in creating your own templates, even using an image as a border.

The “yellow” template features a bit of simple artwork in the upper left corner. This artwork is “logo.png”, located in the templates/yellow directory. The size is 150×150 and has a transparent background. By creating your own similarly sized logo/picture/artwork, and replacing “logo.png”, this template can be customized for your website.

Since everyone has different tastes, different needs, and every website is somewhat unique, these templates can serve as guides in customizing your own templates. With all the above, the ONLY thing different is the CSS.  Start with a copy of the “standard” template and start tweaking away! The basic Sphider modules remain the same.

Additionally in Sphider 2.4.0, the ‘settings’ table has been completely reworked. While this change is transparent to the user, it will make life much easier on the developer as Sphider moves forward.

Besides some minor fixes and tweaks, the only other big change is in the word stemming process. While the majority of Sphider users probably never use word stemming, those who do will be pleased to learn that the algorithm (for English) has been updated to Porter2. Completely new is the ability to use stemming for ten other languages!

The next Sphider is in the pipeline

Sphider 2.3.1 is brand new, but work has already begun on 2.4.0.

Among the features already being implemented are the ability to hide the result number when displaying search results. Also, for the regular text search, the option to display the index date is being added. (This will not be available for the image or RSS searches.) The RSS and image searches will have the option to turn off the advanced search features.

A new template is being added. Unlike nearly all the current templates, this one has some class. Here is a screen shot:

The Newspaper template

In the sample above, in “settings” the result number is turned off, the index date is turned on, and the description length has been increased to 1000.

Probably the biggest change will be transparent to the user. The “settings” table is being reworked. As Sphider has changed, so has the table, with new columns being appended on a regular basis. Now, while the position of columns within a table is totally immaterial to functionality, after awhile it can be really confusing for the developer having to bounce all over the place to gather data.  This change will organize the data in a regular flow which will be much easier to maintain going forward.

Other improvements are also being considered, but whether or not they are implemented at this time is yet to be determined. No release date has been set.

When 2.4.0 is released, whenever that may be, the downloads for the SQLite and PostgreSQL versions will likely be removed due to lack of demand.

Also, earlier thoughts of adding audio (mp3, wav, ogg) indexing support to Sphider have been dropped, also due to lack of demand. The actual indexing algorithm has been proven and sketched out, but there is no rationale for implementing it other than “Gee, that’s a neat feature.”

Amazon pulls Anti-Vaccine Videos

Amazon has pulled several anti-vaccination videos from their offerings.

Now, I’m not saying vaccines are good, nor am I saying vaccines are bad. But I WILL say that YOU have a right to get information, pro OR con, and make the decision for yourself. Amazon wants to deny you that right in the name of political correctness.

Here are other links to the three videos banned that I know of:
We Don’t Vaccinate!
Shoot ‘Em Up: The Truth About Vaccines
Vaxxed: From Cover-Up to Catastrophe

Note to Amazon: I’m not big on censorship. Let people judge and decide for themselves. Don’t be a Nanny.

Sphider 2.3.1 Released

Sphider 2.3.0 principally addressed security concerns, but it also was intended to bring Sphider into PHP 7.2 compliance by removing any use of the deprecated each() function. The function was used extensively, and the majority of the code replacement was very run-of-the-mill straightforward. There were four times the usage was atypical. Substitute code was put in place and tested. It seemed all worked well as many sites were indexed and searches performed as expected.

Well! It seems indexing and searching was being done properly — but only for words composed of Western characters. Words utilizing non-Western characters were not being indexed! And any searches for those words not only returned as “not found” (expected since they weren’t indexed), those searches also complained of gibberish characters/words being either too short or too common.

Investigation of the issue led to three of the four code segments replacing the non-standard usage of the deprecated each() function. The code replacements themselves have been replaced in 2.3.1. Testing on the problem sites now shows that all words are being indexed, those containing Western characters as well as those containing non-Western characters. The search anomalies are gone and searches for non-Western foreign languages is yielding expected results.  If a search word really IS too short or too common, it is reported as such, and not as gibberish. Sphider is now truly PHP 7.2 compliant.

Sphider 2.3.1, both legacy and PDO, are available for download on this blog’s download page, or from the Sphider Home page.