The original Sphider was built with the English language in mind. While preparing to store indexed words, a check is performed to ensure the word is composed of alphabetic characters. Knowing that sites other than all English sites might be indexed, this check was modified to accommodate non-English but still Western characters. The function used for this was what eventually was named removeAccents(). It actually worked quite well. Sphider advanced and took on still more languages, particularly those with non-Western alphabets. Surprisingly, Sphider somehow kept chugging along, although just how is a bit of a mystery.
Well, recognizing some of the short comings of removeAccents(), it was “improved” in Sphider 3.1.0. And it WAS improved… in some instances. Unfortunately, it was discovered (after it had been released!! 🙁 ), that Aribic words would no longer index. Other languages were probably similarly impacted, but just one is enough to raise red flags.
Rather than fix a patch that was originally intended to cover a minor flaw, it seemed logical that the initial check on the word should be updated to conform to the realities of unicode. The check to be sure the word is alphabetic now truly covers the many alphabets rather try to patch the check for an ASCII alphabet.