HTML Character Sets
HTML Character Sets
To display an HTML page correctly, the browser must know what character-set
to use.
The character-set for the early world wide web was ASCII. ASCII
supports the numbers from 0-9, the uppercase and lowercase English alphabet, and some special
characters.
Complete ASCII reference.
Since many countries use characters which are not a part of ASCII, the default character-set for modern browsers is ISO-8859-1.
Complete ISO-8859-1 reference.
If a web page uses a different character-set than ISO-8859-1, it should be specified in the
<meta> tag.
Try it yourself
ISO Character Sets
It is the International Standards Organization (ISO)
that defines
the standard character-sets for different alphabets/languages.
The different character-sets being used around the world are listed below:
Character set |
Description |
Covers |
ISO-8859-1 |
Latin alphabet part 1 |
North America, Western Europe, Latin America, the Caribbean, Canada,
Africa |
ISO-8859-2 |
Latin alphabet part 2 |
Eastern Europe |
ISO-8859-3 |
Latin alphabet part 3 |
SE Europe, Esperanto, miscellaneous others |
ISO-8859-4 |
Latin alphabet part 4 |
Scandinavia/Baltics (and others not in ISO-8859-1) |
ISO-8859-5 |
Latin/Cyrillic alphabet part 5 |
The languages that are using a Cyrillic alphabet such as Bulgarian,
Belarusian, Russian and Macedonian |
ISO-8859-6 |
Latin/Arabic alphabet part 6 |
The languages that are using the Arabic alphabet |
ISO-8859-7 |
Latin/Greek alphabet part 7 |
The modern Greek language as well as mathematical symbols derived from
the Greek |
ISO-8859-8 |
Latin/Hebrew alphabet part 8 |
The languages that are using the Hebrew alphabet |
ISO-8859-9 |
Latin 5 alphabet part 9 |
The Turkish language. Same as ISO-8859-1 except Turkish characters
replace Icelandic ones |
ISO-8859-10 |
Latin 6 Lappish, Nordic, Eskimo |
The Nordic languages |
ISO-8859-15 |
Latin 9 (aka Latin 0) |
Similar to ISO 8859-1 but replaces some less common symbols with the
euro sign and some other missing characters |
ISO-2022-JP |
Latin/Japanese alphabet part 1 |
The Japanese language |
ISO-2022-JP-2 |
Latin/Japanese alphabet part 2 |
The Japanese language |
ISO-2022-KR |
Latin/Korean alphabet part 1 |
The Korean language |
The Unicode Standard
Because the character-sets listed above are
limited in size, and are not compatible in multilingual environments, the
Unicode Consortium developed the Unicode Standard.
The Unicode Standard covers all the characters, punctuations, and symbols in the
world.
Unicode enables processing, storage and interchange of text data no matter what
the platform, no matter what the program, no matter what the language.
The Unicode Consortium
The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets with its standard Unicode
Transformation Format (UTF).
The Unicode Standard has become a success and is implemented in XML, Java,
ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc. The Unicode standard is also
supported in many operating systems and all modern browsers.
The Unicode Consortium cooperates with the leading standards development
organizations, like ISO, W3C, and ECMA.
Unicode can be implemented by different character-sets. The most commonly used
encodings are UTF-8 and UTF-16:
Character-set |
Description |
UTF-8 |
A character in UTF8 can be from 1 to 4
bytes long. UTF-8 can represent any character in the Unicode standard.
UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred
encoding for e-mail and web pages |
UTF-16 |
16-bit Unicode Transformation Format is a variable-length
character encoding for Unicode, capable of encoding the entire Unicode
repertoire. UTF-16 is used in major operating systems and environments,
like Microsoft Windows 2000/XP/2003/Vista/CE and the Java and .NET byte
code environments |
Tip:
The first 256 characters of Unicode character-sets correspond to the 256
characters of ISO-8859-1.
Tip: All HTML 4 processors already support UTF-8, and all XHTML and XML processors support
UTF-8 and UTF-16!
Learn how your website performs under various load conditions
|
|
WAPT
is a load, stress and performance testing tool for websites and web-based applications.
In contrast to "800-pound gorilla" load testing tools, it is designed to minimize the learning
curve and give you an ability to create a heavy load from a regular workstation.
WAPT is able to generate up to 3000 simultaneously acting virtual users using standard hardware configuration.
Virtual users in each profile are fully customizable. Basic and NTLM authentication methods are supported.
Graphs and reports are shown in real-time at different levels of detail, thus helping to manage the testing process.
Download the free 30-day trial!
|
|