Filtering of Content

Friday, Nov 24, 2017

When creating content (found under the publish tab) you might find the some characters, namely backticks " ` " and slashes " \ " are removed from the content you enter. The following explains why and how this works.

There are four mechanisms that clean strings:

  • PHP FILTER which is set to remove backticks and some other characters
  • Tiny MCE which makes sure the HTML is pure
  • HTML Tidy which makes sure the HTML is pure
  • MySQL Php Data Objects (PDO)

Based on other third-party framework security requirements what is removed from the text may change. I don’t think there is any need to include what they do in tests, among other things it’s not our code it’s either PHP itself or a third-party framework. In particular backticks are removed to make it harder to enter SQL into the content.

This cleaning works together to ensure other third party systems like the PDF generator work with the HTML used to generate the PDF file.

While these systems can work well occasionally strange things can happen when they don’t clean the content completely. In particular problems can occur if you are copying content from a word processing application or spreadsheet application or characters that are encoded in a non-supported format (we encode in UFT-8). Basically we suggest you don’t copy directly from those applications, copy them into a plain text editor first.