EW 1: Fixing the Funny Characters

BOM Issues with UTF8 in Expression Web version 1

Updated with new options to kill the BOM in Expression Web version 1.

Version 2 allows you to turn off the BOM using Tools > Page Editor Options > Authoring on based on file extension. I recommend turning it off for php, html, htm and css at a minimum..

By default Expression Web (EW) uses the W3C recommend UTF-8. While it will be rare for me to recommend something different from the W3C character set is one of those instances.

What Is UTF-8?

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding to represent any universal character in the Unicode standard, yet is backwards compatible with ASCII. This is the reason the W3C recommends using UTF-8 as the preferred encoding for what is an international medium *.

Unicode is characterized by particular combination of bytes at the beginning of a file to indicate that the text contained in the file is Unicode. This combination of bytes is known as a signature or Byte Order Mark (BOM).

The problem with using UTF-8 as your page encoding occurs in PHP files CSS flies and some browsers. When UTF-8 is not supported properly you will see: ï«¿. at the to of your browser before above the rendered page. In some cases you may see extra space that should not he the according to your code (html and/or CSS) that cannot be removed except by changing the character encoding.

If this is a PHP page your scripting may fail (session & cookies in particular) since PHP does not support or recognize the BOM in Unicode. As a result unless you need support for non-western European languages 'recommend against using UTF-8 in EWD as your default.

charset=iso-8859-1

The character set used in this site is charset=iso-8859-1 which is my Expressions web Design default. To change the default character set in EW)) open the Site Settings dialog box from the Site menu.

On the language tab you will see UTF-8 as the default (1 in image):

language tab

Use the drop down arrow next to UTF-8 and select US/Western European (lSO) as shown in the screen shot above (2 select). Before say your changes check the box to ignore the keyboard deciding the encoding of new pages (3 checked in screen shot) you want the more generic ISO encoding.

Generated Pages

Unfortunately pages created by the EWD wizards such as site templates will still use the encoding created by the template or wizard creator. For that reason creating a code snippet with the ISO encoding is recommended. Use the following code to create your snippet:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

To replace the default:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Alternative Methods for Include Files

select encording

For those using PHP include files you can either use a file extension other than php such as .inc. The right click in code view and select Encoding from the right click menu (at right).

This will launch the Text File Encoding dialog box below. By default the box for adding the BOM is checked. Removing this checkmark will stop Expression Web from adding the BOM to UTF-8 pages.

This method is very effective at preventing at BOM from being added but is not recommended if your PHP code contains sensitive information such as connection strings for your databases. If you are simply including menus or other files that change frequently using an inc or other non-standard extensions is safe.

Christoph Schneegans has posted a VBA Macro in the MS public newsgroup to remove the BOM before publishing.

Steve Easton's Bomb the BOM where you replace the default text and html template files with new versions that do not have the BOM embedded in the page. See http://www.95isalive.com/expression/index.html for how to use his methods and the caveats.



Outstanding Hosting