I just spent a lot of time researching encoding because I had a problem (which turned out not to be encoding-related at all) placing content created with CKEditor and then saved into the database into an XML file. For awhile, I was certain it was CKEditor’s fault. I use the terms HtmlBlockEditor and CKeditor interchangeably. Here are the issues I think need to be improved:
- CKEditor does not use UTF-8 by default. A lot of people are setting some entitites_latin thing to false to (partially?) resolve this. It seems to encode certain characters on its own, but that isn’t clear to anyone using EWL.
- No attention has been paid to what type of text an HtmlBlockEditor expects when loading it, or what it produces when you retrieve its text. This is not responsible.
- The documentation for HtmlBlockEditor should lay out very clear guidelines for whether you should HtmlEncode the content before you store it in your database, or not, and whether the output is UTF-8, at a minimum. It should also probably outline typical usage scenarios and the best practices for making the storage, loading/editing, and display of HTML content safe and convenient for everyone involved.
To a lesser extent, this information should be explicitly specified for other controls such as EwfTextBox. There is nothing stopping markup from getting in there (or is there - ASP can do it, but I think EWL disables that - EWL should say this and it should also give guidelines on how to safely store/display/load&edit this content).