Main Page | See live article | Alphabetical index

Unicode and HTML

HTML 4.0 uses Unicode as its official character set. The numeric code for non-Latin-1 characters can be obtained by using the browser Mozilla, where, after typing in text edit box and saving page, the characters automatically convert into Unicode.

Usually though, an 8-bit character encoding is used that can only represent a small slice of this set. It is still possible to have characters from the whole of Unicode inside an HTML document by using a numeric character entity reference &#N;, where N is a decimal number for the Unicode code point, or a hexadecimal number prefixed by x.

The support for hexadecimal in this context is more recent, so older browsers might have problems displaying those characters – but they will probably have a problem displaying Unicode characters outside the 8-bit range in the first place. It is still a common practice to convert the hexadecimal code point into a decimal value (e.g. ♠ instead of ♠).

In the Unicode standard, each code point is expressed in the notation U+hhhh, where hhhh are the hexadecimal digits.

There is also a standard set of named character entity references for commonly used symbols outside of some character encodings, so one can use —, for example, to represent an em dash—like this—in text even if the character encoding used doesn't contain that character.

Many browsers, though, are only capable of displaying a small subset of the full UCS-2 repertoire. Here is how your browser displays various Unicode code points:

CodeDescriptionWhat your browser displays
ALatin capital letter "A"A
ßLatin small letter "Sharp S"ß
þLatin small letter "Thorn"þ
ΔGreek capital letter "Delta"Δ
ЙCyrillic capital letter "Short I"Й
קHebrew letter "Qof"ק
مArabic letter "Meem"م
๗Thai digit 7
ቐEthiopic syllable "Qha"
あJapanese Hiragana "A"
叶Simplified Chinese "Leaf"
葉Traditional Chinese "Leaf"
냻Korean Hangeul syllable "Nieun Yae Rieulhieuh"

Some multilingual web browsers that dynamically merge the required font sets on demand, e.g., Microsoft's Internet Explorer 5.5 on Windows, or Mozilla/Netscape 6 cross-platform, are capable of displaying all the Unicode characters on this page simultaneously after the appropriate "text display support packs" are downloaded. MSIE 5.5 would prompt the users if a new font were needed via its "install on demand" feature. Other browsers such as Netscape Navigator 4.77 can only display text supported by the current font associated with the character encoding of the page. When you are using the latter type of browser, it is unlikely that your computer has all of those fonts, nor the browser can use all available fonts on the same page. As a result, the browser will not display the text above all correctly, though it may display a subset of them. Because they are encoded according to the standard, though, they will display correctly on any system that is compliant and does have the characters available. Further, those characters given names for use in named entity references are likely to be more commonly available than others.

External links