Specifying a character set
A character set is a table of mappings from numerical codes to text characters. As you probably know, computers represent text using a distinct numeric code for each character, be it a letter, a numeral, a punctuation mark, or something else. However, there are many ways to map characters to codes. For example, in ISO-8859-1—the standard Internet character set for computers running the U.S. English version of Windows, also known as Western European (ISO)—the decimal code 92 represents a backslash (“\”), but in the EUC-KR character set, it represents the Korean currency symbol ().
Because of the international nature of the Web, the character set in which a document is viewed by a particular user may not be the same as the character set in which it was written. If you live in the United States and compose in English, your documents probably use some variant of ASCII, such as ISO-8859-1. But if your document is viewed by a user in Korea using the Korean version of Windows, her browser probably expects the document to use a Korean character set. This can result in some characters not appearing correctly on the user’s screen.
To avoid this problem as much as possible, it’s a good idea to specify the character set that a document uses. Doing so tells browsers to display your document in that character set. If you don’t specify a character set, browsers will usually display your document in the default character set specified by the user, which may be very different from the character set the document was authored in.
Note that specifying a character set doesn’t actually change the character codes in a document. It just tells browsers how they should interpret those codes.
To specify a character set for a document
- On the Document menu, go to Property, click Document, and then click the General tab.
- Do one of the following:
- Click the left Character set box and select a character set by its title (for example, Western European (ISO)). The right box will show the code name of the selected character set. The value User-defined means “Do not specify a character set.”
- Click the right Character set box and select a character set by its code name (for example, iso-8859-1). You can only select an item in this box if the selection in the left box is User-defined, and only a few character sets can be selected this way.
If you compose documents in a non-Western European language, such as Korean, users who do not have fonts installed on their computers that are compatible with the fonts you use in your documents will not be able to view your documents correctly, even if you correctly specify the character set.
Related topics
Changing character sets throughout a site