Wish List Another Beginning
Nigel Clarke
pmmail@rpglink.com
Thu, 16 Dec 1999 06:47:01 -0500 (EST)
On Thu, 16 Dec 1999 04:37:56 +0200, Cristian Secara wrote:
snip
>Few days ago I wrote a message in german language, using german
>national characters (=E4, =F6, =FC, =DF).
>Signing (and sending) that message, I found by chance that my (already
>sent) message had all national characters corrupted, something like
>=3D84, =3D81, =3DE1 instead.
>Further testings revealed that the only way to keep the characters
>unchanged, was PMMail -> Properties -> Encoding format =3D Quoted
>printable AND 'Do not perform character set translation' box left
>unchecked (off ? I suspect this switch acts inverse). This was the only=
>valid version, out of four possible.
>Corruption was also not similar on all tests, it depends on the above
>settings combination.
>
>That message and all tests with PMMail -> Properties -> Default
>character set =3D ISO 8859-2 (Latin 2).
>Without signing, national characters were not corrupted in any way.
>
I identified this problem in 1997 and with Mats Dufberg wrote a small
monograph on the problem which I attach. For a long time it was
available via mail from me and I passed it on to the SouthSide team
when I took a less active role in PMMail support.
Nigel
PMMail 1.9x, Encoding and ISO-8859 character sets Setup Information
Version 1.5 19 June 1997
Compiled by :-
Nigel Clarke <nclarke@bda-hp.bda.nasa.gov>
and Mats Dufberg <mats.dufberg@abc.se>.
PMMail Settings
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
PMMail ->Settings ->General ->Locale Settings ->Default Character Set
The Default Character Set as installed is US ASCII. Western European
language users should change this setting to ISO 8859-1 (Latin 1) in
most cases. This is also normally used by Scandinavian users in
preference to ISO 8859-10. See below for more information about
character sets. If you use the ISO 8859-1 character set you must
ensure that your primary code page setting in the config.sys file is
850 (for example CODEPAGE=3D850).
PMMail ->Settings ->General ->Locale Settings ->Encoding Format
Make sure that this option is set to Quoted-Printable rather than 8 Bit =
unless you have a specfic instruction to set 8 Bit here. Many US mail
gateways will mangle your message if you don't. See below for more
information.
PMMail ->Settings ->General ->Locale Settings
->Do NOT Perform Character Set Translation
Note that no help is available in PMMail 1.92 should you search on
'Character Set Translation'.
In the opinion of a well informed Scandanavian PMMail user (Mats)
this should never be set to on (the box should never be checked) as it
produces mail not in accordance with the MIME standard (RFC2045). See
below for more information.
Checking your PMMail Setup
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
You can use the services of the mime test service at
<mime-test@relay.surfnet.nl> by sending a message with a Subject
of "iso-8859-1" (no quotes).
This should send a reply to you containing the following (or similar).
"This is an example of a text/plain; charset=3Diso-8859-1 message.
Norwegian characters:
=C6 (ae ligature), =D8 (o slash), =C5 (a ring); lowercase: =E6 =F8 =E5
The complete ISO 8859-1 character set:
32: ! " # % & ' ( ) * + , - . /
48: 0 1 2 3 4 5 6 7 8 9 : ; < =3D > ?
64: @ A B C D E F G H I J K L M N O
80: P Q R S T U V W X Y Z [ \ ] ^ _
96: ` a b c d e f g h i j k l m n o
112: p q r s t u v w x y z { | } ~
160: =A0 =A1 =A2 =A3 =A4 =A5 =A6 =A7 =A8 =A9 =AA =AB =AC =AD =AE
176: =B0 =B1 =B2 =B3 =B4 =B6 =B7 =B9 =BA =BB =BC =BD =BE =BF
192: =C0 =C1 =C2 =C3 =C4 =C5 =C6 =C7 =C8 =C9 =CA =CB =CC =CD =CE =CF
208: =D0 =D1 =D2 =D3 =D4 =D5 =D6 =D7 =D8 =D9 =DA =DB =DC =DD =DE =DF
224: =E0 =E1 =E2 =E3 =E4 =E5 =E6 =E7 =E8 =E9 =EA =EB =EC =ED =EE =EF
240: =F0 =F1 =F2 =F3 =F4 =F5 =F6 =F7 =F8 =F9 =FA =FB =FC =FD =FE =FF
(Thanks to Harald Alvestrand of Uninett)
SURFnet EH'95"
If you have complete support for ISO-8859-1 all cells of the table
should contain one printable character (16 cells per row) with the
following exceptions. 32 and 160 are space characters, regular and
non-breaking, respectively. 127 has no printable character.
PMMail does not seem to be able to display codes 175, 181 and 184.
More locations for testing your MIME setup
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
From Part 3 of the MIME FAQ
"11.3) Where can I get some sample MIME messages?
Here are two sources:
ftp://thumper.bellcore.com/pub/nsb/samples/
http://www-dsed.llnl.gov/documents/tests/email.html
Here're more sources:
[ Patrik Faltstrom <paf@bunyip.com> 13-Dec-1994 ]
At 12:55 AM 12/11/94, Richard Willis wrote:
>Could someone tell me what the address of the person in Sweden
>is who kindly provided a set of MIME-conformancy tests via
>listserver...
My address is paf@bunyip.com, and the address of the listserver
is mimeback@bunyip.com. Send the command (actually the name of the
file you want) as the subject in the message. Start with the command=
"HELP".
---------------------------------------------------------
These diagrams should help clarify some of the possible options open to
users in setting up support for non US and non English language writing =
PMMail users.
Char. Set: =3D Character Set setting in mail header.
(Where iso-8859-1 is used any of the iso-8859 options can be
substituted).
Bit Range: =3D Actual bit range of characters used in the body of the=
message.
Encoding: =3D Encoding required by RFC 2045.
(Base64 is an acceptable alternative to Q-P).
CTE: =3D Content-Transfer-Encoding setting in mail header.
Typical path for mail from English writing US based users
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
/--------------------------\ /---------------------\
| Char. set: ASCII | <-> Internet <-> | |
| Bit range: 7 bit | | Any mail reader |
| Encoding: not necessary | | can read the mail |
| CTE: 7bit | | |
\--------------------------/ \---------------------/
This equates to the following PMMail Settings
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Default Character Set ->US ASCII
Encoding Format ->Q-P
and generates the outgoing PMMail or MIME mailer header
Content-Type: text/plain; charset=3D"us-ascii"
Content-Transfer-Encoding: 7bit
Typical path for mail from non native English writing users
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1) The PMMail user is connected to a traditional mail
(SMTP) server that cannot (reliably) handle high octets
(This case is usually found in the US)
/-------------------------------\
| MIME compliant mailer (PMMail)|
| Char. set: iso 8859-1 |
| Bit range: 8 bit | --> SMTP server -->
| Encoding: Quoted-Printable |
| CTE: Quoted-printable |
\-------------------------------/
/---------------------------------------\
SMTP | The mail can be read by any MIME |
--> server | compliant mail reader (if it supports |
POP --> | iso 8859-1). Receiving program will |
server | decode quoted-printable to correct |
| characters in iso 8859-1. |
\---------------------------------------/
This equates to the following PMMail Settings
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Default Character Set ->ISO-8859-1
Encoding Format ->Q-P
and generates the outgoing PMMail or MIME mailer header
Content-Type: text/plain; charset=3D"iso-8859-1"
Content-Transfer-Encoding: quoted-printable
2) The PMMail user is connected to a modern mail (SMTP)
server that can reliably handle high octets.
(This case is most commonly found in European countries)
/-------------------------------\
| MIME compliant mailer (PMMail)|
| Char. set: iso 8859-1 |
| Bit range: 8 bit | --->
| Encoding: not necessary |
| CTE: 8bit |
\-------------------------------/
/------------------------------------\
---> | ESMTP server with 8BITMIME support |
| The server will convert to quoted- | --> (E)SMTP -->
| printable if next (E)SMTP server |
| doesn't support 8BITMIME. |
\------------------------------------/
(Note: The conversion usually generates a line in the mail header.
Example:-
X-Mime-Autoconverted: from 8bit to quoted-printable by aaa.bbb.ccc)
/---------------------------------------\
SMTP | The mail can be read by any MIME |
--> server | compliant mail reader (if it supports |
POP --> | iso 8859-1). Receiving program will |
server | decode any quoted-printable to correct|
| charcters in iso 8859-1. |
\---------------------------------------/
This equates to the following PMMail Settings
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Default Character Set ->ISO-8859-1
Encoding Format ->8 Bit
and generates the outgoing PMMail or MIME mailer header
Content-Type: text/plain; charset=3D"iso-8859-1"
Content-Transfer-Encoding: 8bit
Note on non MIME compliant receiving mail readers. If
the receiving mail reader can handle 8 bit mail, and
no encoding has been done the program can correctly
display the mail. We cannot, however, guarantee that
the mail does not take a different route that will
trigger encoding.
3) The PMMail user is connected to a mail (SMTP) server
that can reliably handle high octets locally (non-ESMTP).
(This case is usually found in Europe where older or non-MIME
mail systems with 8 bit non English language support are found)
The situation is like number 2, but the difference is
that if the mail takes a route via a traditional SMTP
server, there won't be any encoding but all high octets
will be corrupted.
--------------------------------------------------------------
The material that follows goes into the subject of code pages, encoding
and character sets in considerable detail and can be considered for
reference only.
Character Sets.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
US ASCII was designed to transmit US English characters and as such uses=
binary sequences, that translate to the decimal numbers 0 through 127, t=
o
represent those characters.
The ASCII set consists of some control characters, a space character (va=
lue
32) and the following printable letters with values between 33 and 126.
!"#%&'()*+,-./0123456789:;<=3D>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
In order to represent the characters used in languages other than US
English IBM and Microsoft developed the concept of code pages containing=
character sets that use the numbers above 127 to represent different
letters. Different code pages use these values to produce the
appropriate characters on screen in conjunction with the correct country=
setting. Common code pages are 850 as a generic International code page
and 437 for US English.
ISO 8859 (International Organisation for Standardisation standard 8859)
lists character sets corresponding to the appropriate national
languages. These are normally registered via ECMA (European Computer
Manufacturers Association) with ISO.
The character sets available in PMMail 1.92 are:-
US ASCII Used in the United States
ISO 8859-1 (Latin 1) Used for Western Europe and Latin America
ISO 8859-2 (Latin 2) Used for Eastern Europe
ISO 8859-3 (Latin 3) Used for Southern Europe
ISO 8859-4 (Latin 4) Used for Scandanavia (also 8859-1 and 10)
ISO 8859-5 (Cyrillic)
ISO 8859-6 (Arabic)
ISO 8859-7 (Modern Greek)
ISO 8859-8 (Hebrew)
ISO 8859-9 (Latin 5) Used for Turkey (also 8859-3)
ISO 8859-10 (Latin 6) Used for Scandanavia (also 8859-1 and 4)
KOI8-R (Russian)
Now if you are writing in Arabic, Hebrew or Greek the choices are
obvious. I believe that KOI8-R is the preferred character set for use on=
a Russian computer. Complete support for Arabic, Greek and Hebrew (as
well as Thai and DBCS pages) is only found in those versions of Warp
that are specifically designed for these countries.
ISO 8859-1 (Latin 1) supports the following languages:
Afrikaans, Albanian, Catalan, Danish, Dutch, English, Faeroese,
Finnish, French, German, Galician, Irish, Icelandic, Italian, Norwegian,=
Portuguese, Spanish and Swedish. It doesn't cover Welsh (and possibly
Breton).
ISO 8859-2 (Latin 2) supports the following languages:
Albanian, Croat, Czech, German, Hungarian, Polish, Romanian, Slovak and =
Slovenian.
ISO 8859-3 (Latin 3) supports the following languages:
Esperanto, Galician, Maltese and Turkish.
ISO 8859-4 (Latin 4) supports the following languages:
Estonian, Latvian and Lithuanian.
It is an incomplete precursor of the ISO 8859-10 (Latin 6) set.
ISO 8859-5 (Cyrillic) supports the following languages:
Bulgarian, Byelorussian, Macedonian, Serbian and Ukrainian.
ISO 8859-9 (Latin 5) replaces the rarely used Icelandic letters from
ISO 8859-1 (Latin 1) with Turkish letters.
ISO 8859-10 (Latin 6) adds the last letters from Greenlandic and Lapp
which were missing in ISO 8859-4 (Latin 4) and thereby covers all
Scandinavia.
Code Page Setting's Effect on Displayed Characters
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
So far our experience is limited to code pages 850 and 437. We welcome
experience with other code page settings, and use of other characters se=
ts
than US-ASCII and ISO-8859-1.
If 850 primary code page is selected almost all ISO 8859-1 characters
are displayed correctly (characters with decimal codes 175, 181, 184 are=
not displayed).
If 850 is primary code page PMMail does not seem to be able to handle
ISO 8859-2, -3, -4 and -6 correctly. The OS/2 help on the CodePage
command shows that 852 is the correct page for Latin 2 (ISO 8859-2), 857=
for Latin 3 (ISO 8859-3), 921 or 922 for Latin 4 (ISO 8859-4).
------------------------------------------------------------------
Encoding Format - Quoted Printable
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Quoting from RFC 2045
"The Quoted-Printable encoding is intended to represent data that
largely consists of octets that correspond to printable characters in=
the US-ASCII character set. It encodes the data in such a way that
the resulting octets are unlikely to be modified by mail transport.
If the data being encoded are mostly US-ASCII text, the encoded form
of the data remains largely recognizable by humans. A body which is
entirely US-ASCII may also be encoded in Quoted-Printable to ensure
the integrity of the data should the message pass through a
character-translating, and/or line-wrapping gateway."
NOTE: PMMail only uses Quoted-Printable encoding if characters with a
numeric value greater than 127 are present in the message.
and also
"NOTE: The quoted-printable encoding represents something of a
compromise between readability and reliability in transport. Bodies
encoded with the quoted-printable encoding will work reliably over
most mail gateways, but may not work perfectly over a few gateways,
notably those involving translation into EBCDIC. A higher level of
confidence is offered by the base64 Content-Transfer-Encoding."
Mats Dufberg <mats.dufberg@abc.se> wrote:
"If your mail is quoted-printable encoded and the receiving
party has an MIME compliant mail reader (and support for
the character set ISO 8859-1) the encoded characters will
be translated back into their original shape. That is,
you won't even notice the encoding.
If the receiving party does not have an MIME compliant
mail reader, the characters will be presented in their
encoded form which means "=3DFC=3DDC=3DE4=3DC4=3DF6=3DD6" for "=FC=DC=E4=
=C4=F6=D6"
(the German Umlauts).
Transliterating them into ue, ae and oe has nothing to do
with MIME, but could be done automatically in PMMail
with some REXX program."
------------------------------------------------------------------
Character Set Translation
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
Mats Dufberg <mats.dufber@abc.se> wrote:
"The behavior of PMMail 1.91 is a little bit strange when it comes
to the character set translation setting. You'll actually have
three choices, "8bit", "quoted-printable" and "no translation".
When "no translation" is selected "8bit" or "quoted-printable"
gives the same result.
Let's say you'll compose an email with high octets with ISO 8859-1
as the selected character set. The result will be the following:
CHOICE CHARSET ENCODING Characters High Legal?
value value encoded? octets?
8bit iso-8859-1 8bit no yes YES
QP iso-8859-1 quoted-pr. yes no YES
no tr. us-ascii 7bit no yes NO
That is, when "no translation" is selected the mail header
says "all characters are ascii and 7bit" even if the mail
body contains high octets.
By selecting "no translation" PMMail produces email that are
NOT compliant with the MIME protocol. I hope it's a bug, not a
feature. :-) As it is it should not be used."
RFC2045 Section 6.2 states:-
"The proper Content-Transfer-Encoding label must always be used.
Labelling unencoded data containing 8bit characters as "7bit" is not
allowed, nor is labelling unencoded non-line-oriented data as
anything other than "binary" allowed."
------------------------------------------------------------------
More Information on Characters Sets and Electronic Mail
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
The following URL is very informative on the problems with 8 bit charact=
ers
and mail although it was specifically written for UNIX users.
<http://www.ioc.ee/home/tarvi/mime_pem/FAQ-ISO-8859-1.html>
You can find out more about ISO 8859 character sets at :-
<http://www.isoc.org:8080/codage/iso8859/jeuxiso.en.htm> in English
<http://www.isoc.org:8080/codage/iso8859/jeuxiso.fr.htm> in French
And also another site from Mats for European users:-
<http://www.uni-passau.de/~ramsch/iso8859-1.html>
The Online RFC web site contains RFC 1345 which defines various
character sets.
<http://info.internet.isi.edu/1s/in-notes/rfc/files>
The RFC's are also available for European users from:-
<http://ftp.sunet.se/pub/Internet-documents/rfc/>
Other character sets you may come across are:-
VISCII 8 bit Latin and Vietnamese
ISO-2022-JP Latin and Japanese
ISO-2022-KR Latin and Korean
UNICODE-1-1 Unicode
UNICODE-1-1-UTF-7 Mail-safe Unicode
ISO-2022-JP-2 Multilingual
As of PMMail 1.92 there is no support for these character sets.
---------------------------------------------------------------
Definitions
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RFC 2045 defines 7bit Data as:
" "7bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]. No octets with decimal values greater than 127
are allowed and neither are NULs (octets with decimal value 0). CR
(decimal value 13) and LF (decimal value 10) octets only occur as
part of CRLF line separation sequences."
and 8bit Data as:
" "8bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]), but octets with decimal values greater than 127=
may be used. As with "7bit data" CR and LF octets only occur as part=
of CRLF line separation sequences and no NULs are allowed."
--
Nigel J. Clarke