Report forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Cesar Eduardo Barros , 99324@bugs.debian.org Resent-From: Cesar Eduardo Barros Orignal-Sender: Cesar Eduardo Barros Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 02:19:17 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by submit@bugs.debian.org id=B.99127508413807 (code B ref -1); Thu, 31 May 2001 02:19:17 GMT From: Cesar Eduardo Barros To: Debian Bug Tracking System X-Reportbug-Version: 1.16 X-Mailer: reportbug 1.16 Date: Wed, 30 May 2001 23:11:20 -0300 Message-Id: Sender: Cesar Eduardo Barros Delivered-To: submit@bugs.debian.org Package: debian-policy Version: 3.5.4.0 Severity: wishlist I think Debian should start to move into using UTF-8 by default everywhere. Rationale: The current 'standard' default character set is ISO-8859-1. This works fine most of the time, however, it causes some problems. For instance, most of the documentation is in ISO-8859-1 -- except the ones which are in languages that need another charset. If a user is fluent in two languages which needs incompatible charsets, he would have to keep switching charsets all the time. To avoid this, a single charset which works for all languages should be used by default. Using UTF-8 by default in Debian would also have the added side benefit of forcing all programs to properly handle variable-length multibyte charsets. It could also be possible to add an UTF-8 encoded version for all locales (could be a simple matter of changing the commented defaults in the list of locales). Having a single charset means that people using wildly different charsets could be able to read/edit the same text files without having to recode them. Using UTF-8 in the name of files means that nobody's file name would look like gibberish for someone else just because the charsets being used are different. Last but not least, someone has to take the lead and be the first to do it. Debian has done it before in things like consistent keyboard handling. The structure of Debian makes it the ideal place to initiatives which touch a lot of things like this one. For people who aren't using UTF-8, if /usr/doc is consistently coded in UTF8, it'd be possible to use Apache tricks to force its enconding as UTF-8 when viewed using the http://localhost/doc/ default URL. -- System Information Debian Release: testing/unstable Architecture: i386 Kernel: Linux flower 2.4.5 #1 Tue May 29 18:09:30 BRT 2001 i686 Locale: LANG=en_US.ISO8859-1, LC_CTYPE=C Versions of packages debian-policy depends on: ii fileutils 4.1-2 GNU file management utilities.   Acknowledgement sent to Cesar Eduardo Barros <cesarb@nitnet.com.br>:
New Bug report received and forwarded. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Cesar Eduardo Barros Subject: Bug#99324: Acknowledgement (Default charset should be UTF-8) Message-ID: In-Reply-To: References: X-Debian-PR-Message: ack 99324 Thank you for the problem report you have sent regarding Debian. This is an automatically generated reply, to let you know your message has been received. It is being forwarded to the developers mailing list for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to submit further information on your problem, please send it to 99324@bugs.debian.org (and *not* to submit@bugs.debian.org). Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at submit) by bugs.debian.org; 31 May 2001 02:11:24 +0000 From cesarb@nitnet.com.br Wed May 30 21:11:24 2001 Return-path: Received: from itaipu.nitnet.com.br [200.255.111.241] by master.debian.org with smtp (Exim 3.12 1 (Debian)) id 155Hvn-0003ae-00; Wed, 30 May 2001 21:11:23 -0500 Received: (qmail 21441 invoked from network); 31 May 2001 02:11:21 -0000 Received: from salzburg.nitnet.com.br (HELO flower.cesarb) (200.198.84.62) by itaipu.nitnet.com.br with SMTP; 31 May 2001 02:11:21 -0000 Received: from cesarb by flower.cesarb with local (Exim 3.22 #1 (Debian)) id 155Hvk-0000uL-00; Wed, 30 May 2001 23:11:20 -0300 From: Cesar Eduardo Barros To: Debian Bug Tracking System Subject: Default charset should be UTF-8 X-Reportbug-Version: 1.16 X-Mailer: reportbug 1.16 Date: Wed, 30 May 2001 23:11:20 -0300 Message-Id: Sender: Cesar Eduardo Barros Delivered-To: submit@bugs.debian.org Package: debian-policy Version: 3.5.4.0 Severity: wishlist I think Debian should start to move into using UTF-8 by default everywhere. Rationale: The current 'standard' default character set is ISO-8859-1. This works fine most of the time, however, it causes some problems. For instance, most of the documentation is in ISO-8859-1 -- except the ones which are in languages that need another charset. If a user is fluent in two languages which needs incompatible charsets, he would have to keep switching charsets all the time. To avoid this, a single charset which works for all languages should be used by default. Using UTF-8 by default in Debian would also have the added side benefit of forcing all programs to properly handle variable-length multibyte charsets. It could also be possible to add an UTF-8 encoded version for all locales (could be a simple matter of changing the commented defaults in the list of locales). Having a single charset means that people using wildly different charsets could be able to read/edit the same text files without having to recode them. Using UTF-8 in the name of files means that nobody's file name would look like gibberish for someone else just because the charsets being used are different. Last but not least, someone has to take the lead and be the first to do it. Debian has done it before in things like consistent keyboard handling. The structure of Debian makes it the ideal place to initiatives which touch a lot of things like this one. For people who aren't using UTF-8, if /usr/doc is consistently coded in UTF8, it'd be possible to use Apache tricks to force its enconding as UTF-8 when viewed using the http://localhost/doc/ default URL. -- System Information Debian Release: testing/unstable Architecture: i386 Kernel: Linux flower 2.4.5 #1 Tue May 29 18:09:30 BRT 2001 i686 Locale: LANG=en_US.ISO8859-1, LC_CTYPE=C Versions of packages debian-policy depends on: ii fileutils 4.1-2 GNU file management utilities.   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Anthony Towns , 99324@bugs.debian.org Resent-From: Anthony Towns Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 03:18:08 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99127872919471 (code B ref 99324); Thu, 31 May 2001 03:18:08 GMT Date: Thu, 31 May 2001 13:11:58 +1000 To: Cesar Eduardo Barros , 99324@bugs.debian.org Message-ID: <20010531131158.B27395@azure.humbug.org.au> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="TakKZr9L6Hm6aLOc" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from cesarb@nitnet.com.br on Wed, May 30, 2001 at 11:11:20PM -0300 Organisation: Lacking X-PGP: http://azure.humbug.org.au/~aj/aj_key.asc From: Anthony Towns Delivered-To: 99324@bugs.debian.org --TakKZr9L6Hm6aLOc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 30, 2001 at 11:11:20PM -0300, Cesar Eduardo Barros wrote: > Package: debian-policy > Version: 3.5.4.0 > Severity: wishlist >=20 > I think Debian should start to move into using UTF-8 by default everywher= e. What, exactly, does this involve? (Now's probably a bad time, too) Cheers, aj --=20 Anthony Towns I don't speak for anyone save myself. GPG signed mail preferred. ``_Any_ increase in interface difficulty, in exchange for a benefit you do not understand, cannot perceive, or don't care about, is too much.'' -- John S. Novak, III (The Humblest Man on the Net) --TakKZr9L6Hm6aLOc Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iQCVAwUBOxW2fuRRvX9xctrtAQFhqgP+OqhYDBOl8REQT3Jomcju/MaMN2A69pd5 BPdunebV3DLVzuCQPf9An/qTOkTxxXn1SymYiat3BdIyqr7AYucY8/xL+fRN+rYg 7anEHW/ibrmv6mVjRcVK/5J2DluceHjlQOvFB7g7dDGpmSVJbu2kTXa4Y6cndQab QytnLcw9OYU= =Wv4r -----END PGP SIGNATURE----- --TakKZr9L6Hm6aLOc--   Acknowledgement sent to Anthony Towns <aj@azure.humbug.org.au>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Anthony Towns Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531131158.B27395@azure.humbug.org.au> References: <20010531131158.B27395@azure.humbug.org.au> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 03:12:09 +0000 From aj@azure.humbug.org.au Wed May 30 22:12:09 2001 Return-path: Received: from cpe-61-9-141-197.vic.bigpond.net.au (azure.humbug.org.au) [61.9.141.197] (mail) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155Isb-00053z-00; Wed, 30 May 2001 22:12:09 -0500 Received: from aj by azure.humbug.org.au with local (Exim 3.12 #1 (Debian)) id 155IsQ-0007BI-00; Thu, 31 May 2001 13:11:58 +1000 Date: Thu, 31 May 2001 13:11:58 +1000 To: Cesar Eduardo Barros , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531131158.B27395@azure.humbug.org.au> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="TakKZr9L6Hm6aLOc" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from cesarb@nitnet.com.br on Wed, May 30, 2001 at 11:11:20PM -0300 Organisation: Lacking X-PGP: http://azure.humbug.org.au/~aj/aj_key.asc From: Anthony Towns Delivered-To: 99324@bugs.debian.org --TakKZr9L6Hm6aLOc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 30, 2001 at 11:11:20PM -0300, Cesar Eduardo Barros wrote: > Package: debian-policy > Version: 3.5.4.0 > Severity: wishlist >=20 > I think Debian should start to move into using UTF-8 by default everywher= e. What, exactly, does this involve? (Now's probably a bad time, too) Cheers, aj --=20 Anthony Towns I don't speak for anyone save myself. GPG signed mail preferred. ``_Any_ increase in interface difficulty, in exchange for a benefit you do not understand, cannot perceive, or don't care about, is too much.'' -- John S. Novak, III (The Humblest Man on the Net) --TakKZr9L6Hm6aLOc Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iQCVAwUBOxW2fuRRvX9xctrtAQFhqgP+OqhYDBOl8REQT3Jomcju/MaMN2A69pd5 BPdunebV3DLVzuCQPf9An/qTOkTxxXn1SymYiat3BdIyqr7AYucY8/xL+fRN+rYg 7anEHW/ibrmv6mVjRcVK/5J2DluceHjlQOvFB7g7dDGpmSVJbu2kTXa4Y6cndQab QytnLcw9OYU= =Wv4r -----END PGP SIGNATURE----- --TakKZr9L6Hm6aLOc--   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Cesar Eduardo Barros , 99324@bugs.debian.org Resent-From: Cesar Eduardo Barros Orignal-Sender: Cesar Eduardo Barros Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 03:33:29 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99127953020475 (code B ref 99324); Thu, 31 May 2001 03:33:29 GMT Date: Thu, 31 May 2001 00:25:19 -0300 To: Anthony Towns Cc: 99324@bugs.debian.org Message-ID: <20010531002519.A4183@flower.cesarb> References: <20010531131158.B27395@azure.humbug.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531131158.B27395@azure.humbug.org.au> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 01:11:58PM +1000, Anthony Towns wrote: > On Wed, May 30, 2001 at 11:11:20PM -0300, Cesar Eduardo Barros wrote: > > Package: debian-policy > > Version: 3.5.4.0 > > Severity: wishlist > > > > I think Debian should start to move into using UTF-8 by default everywhere. > > What, exactly, does this involve? - Making sure everything works with UTF-8 charset - Adding UTF-8 charset for every locale - Converting (in debian/rules) documentation files to UTF-8 - Selecting en_US.UTF-8 (or something like that) as the default for LANG= - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 > (Now's probably a bad time, too) I know. I just wanted to present the idea while it was fresh. I too think it would only become possible after the next release. In fact, it's good timing since there's plenty of time for discussion. Making sure everything works with UTF-8 can be started already. Adding the UTF8 charset for every locale might be as simple as editing a config file (I don't know libc6 very well, but I hope it would recode the message catalogs if that is done). -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Acknowledgement sent to Cesar Eduardo Barros <cesarb@nitnet.com.br>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Cesar Eduardo Barros Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531002519.A4183@flower.cesarb> References: <20010531002519.A4183@flower.cesarb> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 03:25:30 +0000 From cesarb@nitnet.com.br Wed May 30 22:25:30 2001 Return-path: Received: from itaipu.nitnet.com.br [200.255.111.241] by master.debian.org with smtp (Exim 3.12 1 (Debian)) id 155J5U-0005KC-00; Wed, 30 May 2001 22:25:29 -0500 Received: (qmail 25679 invoked from network); 31 May 2001 03:25:24 -0000 Received: from salzburg.nitnet.com.br (HELO flower.cesarb) (200.198.84.62) by itaipu.nitnet.com.br with SMTP; 31 May 2001 03:25:24 -0000 Received: from cesarb by flower.cesarb with local (Exim 3.22 #1 (Debian)) id 155J5L-00016F-00; Thu, 31 May 2001 00:25:19 -0300 Date: Thu, 31 May 2001 00:25:19 -0300 To: Anthony Towns Cc: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531002519.A4183@flower.cesarb> References: <20010531131158.B27395@azure.humbug.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531131158.B27395@azure.humbug.org.au> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 01:11:58PM +1000, Anthony Towns wrote: > On Wed, May 30, 2001 at 11:11:20PM -0300, Cesar Eduardo Barros wrote: > > Package: debian-policy > > Version: 3.5.4.0 > > Severity: wishlist > > > > I think Debian should start to move into using UTF-8 by default everywhere. > > What, exactly, does this involve? - Making sure everything works with UTF-8 charset - Adding UTF-8 charset for every locale - Converting (in debian/rules) documentation files to UTF-8 - Selecting en_US.UTF-8 (or something like that) as the default for LANG= - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 > (Now's probably a bad time, too) I know. I just wanted to present the idea while it was fresh. I too think it would only become possible after the next release. In fact, it's good timing since there's plenty of time for discussion. Making sure everything works with UTF-8 can be started already. Adding the UTF8 charset for every locale might be as simple as editing a config file (I don't know libc6 very well, but I hope it would recode the message catalogs if that is done). -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Brian May , 99324@bugs.debian.org Resent-From: Brian May Orignal-Sender: bam@snoopy.apana.org.au Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 04:18:10 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99128188825476 (code B ref 99324); Thu, 31 May 2001 04:18:10 GMT Sender: bam@snoopy.apana.org.au To: Cesar Eduardo Barros Cc: 99324@bugs.debian.org, Anthony Towns References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> From: Brian May X-Home-Page: http://snoopy.apana.org.au/~bam/ Date: 31 May 2001 14:04:39 +1000 In-Reply-To: <20010531002519.A4183@flower.cesarb> Message-ID: <84d78qrwo8.fsf@snoopy.apana.org.au> Lines: 31 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (GTK) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Delivered-To: 99324@bugs.debian.org >>>>> "Cesar" == Cesar Eduardo Barros writes: Cesar> - Making sure everything works with UTF-8 charset Biggest problem for me, here (unless that has changed in the past month or so) is xemacs. Probably the same for emacs too, not sure. Once I opened a message, and Gnus had heart failure when it said it couldn't find the UTF-8 charset inside xemacs (actually, the message was ISO-8859-1, so it doesn't entirely make sense), (I don't have a UTF-8 font installed, but I haven't looked hard, yet; there are probably plenty of choices already packaged for Debian; however, I doubt that is the reason for xemacs not supporting it). Cesar> - Adding UTF-8 charset for every locale Cesar> - Converting (in debian/rules) documentation files to UTF-8 Cesar> - Selecting en_US.UTF-8 (or something like that) as the default for LANG= Cesar> - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 These sound like time consuming tasks, so the sooner we start, the better. Just don't expect to finish for a while (eg. aim for woody+1). First priority should be to ensure that all programs work with UTF-8. Ideally, this should be done for woody (but may not be possible). How do tools (eg. debconf) know what coding set to use when reading a file (eg. templates file)? Or, is ISO-8859-1 assumed? -- Brian May   Acknowledgement sent to Brian May <bam@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Brian May Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <84d78qrwo8.fsf@snoopy.apana.org.au> References: <84d78qrwo8.fsf@snoopy.apana.org.au> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 04:04:48 +0000 From bam@debian.org Wed May 30 23:04:48 2001 Return-path: Received: from snoopy.apana.org.au [202.12.87.129] (postfix) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155JhW-0006cq-00; Wed, 30 May 2001 23:04:47 -0500 Received: by snoopy.apana.org.au (Postfix, from userid 1003) id 16E32C99A; Thu, 31 May 2001 14:04:39 +1000 (EST) Sender: bam@snoopy.apana.org.au To: Cesar Eduardo Barros Cc: 99324@bugs.debian.org, Anthony Towns Subject: Re: Bug#99324: Default charset should be UTF-8 References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> From: Brian May X-Home-Page: http://snoopy.apana.org.au/~bam/ Date: 31 May 2001 14:04:39 +1000 In-Reply-To: <20010531002519.A4183@flower.cesarb> Message-ID: <84d78qrwo8.fsf@snoopy.apana.org.au> Lines: 31 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (GTK) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Delivered-To: 99324@bugs.debian.org >>>>> "Cesar" == Cesar Eduardo Barros writes: Cesar> - Making sure everything works with UTF-8 charset Biggest problem for me, here (unless that has changed in the past month or so) is xemacs. Probably the same for emacs too, not sure. Once I opened a message, and Gnus had heart failure when it said it couldn't find the UTF-8 charset inside xemacs (actually, the message was ISO-8859-1, so it doesn't entirely make sense), (I don't have a UTF-8 font installed, but I haven't looked hard, yet; there are probably plenty of choices already packaged for Debian; however, I doubt that is the reason for xemacs not supporting it). Cesar> - Adding UTF-8 charset for every locale Cesar> - Converting (in debian/rules) documentation files to UTF-8 Cesar> - Selecting en_US.UTF-8 (or something like that) as the default for LANG= Cesar> - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 These sound like time consuming tasks, so the sooner we start, the better. Just don't expect to finish for a while (eg. aim for woody+1). First priority should be to ensure that all programs work with UTF-8. Ideally, this should be done for woody (but may not be possible). How do tools (eg. debconf) know what coding set to use when reading a file (eg. templates file)? Or, is ISO-8859-1 assumed? -- Brian May   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Fumitoshi UKAI , 99324@bugs.debian.org Resent-From: Fumitoshi UKAI Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 14:33:25 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.991318824781 (code B ref 99324); Thu, 31 May 2001 14:33:25 GMT Date: Thu, 31 May 2001 23:18:38 +0900 Message-ID: <873d9lbo01.wl@mistral.ukai.org> From: Fumitoshi UKAI To: Brian May , 99324@bugs.debian.org Cc: Cesar Eduardo Barros , Anthony Towns In-Reply-To: <84d78qrwo8.fsf@snoopy.apana.org.au> References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> <84d78qrwo8.fsf@snoopy.apana.org.au> User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.0.103 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI) Organization: Debian JP Project MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Delivered-To: 99324@bugs.debian.org At 31 May 2001 14:04:39 +1000, Brian May wrote: > >>>>> "Cesar" == Cesar Eduardo Barros writes: > > Cesar> - Making sure everything works with UTF-8 charset > > Biggest problem for me, here (unless that has changed in the past > month or so) is xemacs. Probably the same for emacs too, not > sure. Once I opened a message, and Gnus had heart failure when it said > it couldn't find the UTF-8 charset inside xemacs (actually, the > message was ISO-8859-1, so it doesn't entirely make sense), AFAIK, emacsen could handle UTF-8 with mule-ucs package. If policy claims to make sure everything works with UTF-8 charset, should mule-ucs be merged into each emacsen? > Cesar> - Adding UTF-8 charset for every locale > Cesar> - Converting (in debian/rules) documentation files to UTF-8 > Cesar> - Selecting en_US.UTF-8 (or something like that) as the default for LANG= > Cesar> - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 Is kernel support UTF-8 other than latin chars? > These sound like time consuming tasks, so the sooner we start, the > better. Just don't expect to finish for a while (eg. aim for > woody+1). > > First priority should be to ensure that all programs work with > UTF-8. Ideally, this should be done for woody (but may not be > possible). I don't think it's possible for woody. > How do tools (eg. debconf) know what coding set to use when reading a > file (eg. templates file)? Or, is ISO-8859-1 assumed? debconf doesn't assume any encoding, does it? We're usually using EUC-JP charset for debconf. Regards, Fumitoshi UKAI   Acknowledgement sent to Fumitoshi UKAI <ukai@debian.or.jp>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Fumitoshi UKAI Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <873d9lbo01.wl@mistral.ukai.org> References: <873d9lbo01.wl@mistral.ukai.org> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 14:20:24 +0000 From ukai@debian.or.jp Thu May 31 09:20:24 2001 Return-path: Received: from ns.ukai.debian.gr.jp (lavender.ukai.org) [210.157.158.39] by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155TJI-0000Bs-00; Thu, 31 May 2001 09:20:24 -0500 Received: from mistral.ukai.org (mint.ukai.org [211.123.26.5]) by lavender.ukai.org (Postfix) with ESMTP id 028F6529CD; Thu, 31 May 2001 23:20:19 +0900 (JST) Date: Thu, 31 May 2001 23:18:38 +0900 Message-ID: <873d9lbo01.wl@mistral.ukai.org> From: Fumitoshi UKAI To: Brian May , 99324@bugs.debian.org Cc: Cesar Eduardo Barros , Anthony Towns Subject: Re: Bug#99324: Default charset should be UTF-8 In-Reply-To: <84d78qrwo8.fsf@snoopy.apana.org.au> References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> <84d78qrwo8.fsf@snoopy.apana.org.au> User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.0.103 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI) Organization: Debian JP Project MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Delivered-To: 99324@bugs.debian.org At 31 May 2001 14:04:39 +1000, Brian May wrote: > >>>>> "Cesar" == Cesar Eduardo Barros writes: > > Cesar> - Making sure everything works with UTF-8 charset > > Biggest problem for me, here (unless that has changed in the past > month or so) is xemacs. Probably the same for emacs too, not > sure. Once I opened a message, and Gnus had heart failure when it said > it couldn't find the UTF-8 charset inside xemacs (actually, the > message was ISO-8859-1, so it doesn't entirely make sense), AFAIK, emacsen could handle UTF-8 with mule-ucs package. If policy claims to make sure everything works with UTF-8 charset, should mule-ucs be merged into each emacsen? > Cesar> - Adding UTF-8 charset for every locale > Cesar> - Converting (in debian/rules) documentation files to UTF-8 > Cesar> - Selecting en_US.UTF-8 (or something like that) as the default for LANG= > Cesar> - Echoing some magic sequence on every getty to convert the kernel mode to UTF8 Is kernel support UTF-8 other than latin chars? > These sound like time consuming tasks, so the sooner we start, the > better. Just don't expect to finish for a while (eg. aim for > woody+1). > > First priority should be to ensure that all programs work with > UTF-8. Ideally, this should be done for woody (but may not be > possible). I don't think it's possible for woody. > How do tools (eg. debconf) know what coding set to use when reading a > file (eg. templates file)? Or, is ISO-8859-1 assumed? debconf doesn't assume any encoding, does it? We're usually using EUC-JP charset for debconf. Regards, Fumitoshi UKAI   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Cesar Eduardo Barros , 99324@bugs.debian.org Resent-From: Cesar Eduardo Barros Orignal-Sender: Cesar Eduardo Barros Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 20:04:48 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99133913031703 (code B ref 99324); Thu, 31 May 2001 20:04:48 GMT Date: Thu, 31 May 2001 16:58:43 -0300 To: Fumitoshi UKAI Cc: Brian May , 99324@bugs.debian.org, Anthony Towns Message-ID: <20010531165843.A518@flower.cesarb> References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> <84d78qrwo8.fsf@snoopy.apana.org.au> <873d9lbo01.wl@mistral.ukai.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <873d9lbo01.wl@mistral.ukai.org> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 11:18:38PM +0900, Fumitoshi UKAI wrote: > > How do tools (eg. debconf) know what coding set to use when reading a > > file (eg. templates file)? Or, is ISO-8859-1 assumed? > > debconf doesn't assume any encoding, does it? > We're usually using EUC-JP charset for debconf. > I think debconf should use UTF-8 for the templates and recode on the fly. There's nothing worse than having gibberish in ten different charsets in the same template file. Of course, using UTF-8 would make things much harder for translators unless Debian used UTF-8 by default... :-| -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Acknowledgement sent to Cesar Eduardo Barros <cesarb@nitnet.com.br>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Cesar Eduardo Barros Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531165843.A518@flower.cesarb> References: <20010531165843.A518@flower.cesarb> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 19:58:50 +0000 From cesarb@nitnet.com.br Thu May 31 14:58:49 2001 Return-path: Received: from itaipu.nitnet.com.br [200.255.111.241] by master.debian.org with smtp (Exim 3.12 1 (Debian)) id 155Yan-0008FE-00; Thu, 31 May 2001 14:58:49 -0500 Received: (qmail 303 invoked from network); 31 May 2001 19:58:44 -0000 Received: from salzburg.nitnet.com.br (HELO flower.cesarb) (200.198.84.62) by itaipu.nitnet.com.br with SMTP; 31 May 2001 19:58:44 -0000 Received: from cesarb by flower.cesarb with local (Exim 3.22 #1 (Debian)) id 155Yah-00008o-00; Thu, 31 May 2001 16:58:43 -0300 Date: Thu, 31 May 2001 16:58:43 -0300 To: Fumitoshi UKAI Cc: Brian May , 99324@bugs.debian.org, Anthony Towns Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531165843.A518@flower.cesarb> References: <20010531131158.B27395@azure.humbug.org.au> <20010531002519.A4183@flower.cesarb> <84d78qrwo8.fsf@snoopy.apana.org.au> <873d9lbo01.wl@mistral.ukai.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <873d9lbo01.wl@mistral.ukai.org> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 11:18:38PM +0900, Fumitoshi UKAI wrote: > > How do tools (eg. debconf) know what coding set to use when reading a > > file (eg. templates file)? Or, is ISO-8859-1 assumed? > > debconf doesn't assume any encoding, does it? > We're usually using EUC-JP charset for debconf. > I think debconf should use UTF-8 for the templates and recode on the fly. There's nothing worse than having gibberish in ten different charsets in the same template file. Of course, using UTF-8 would make things much harder for translators unless Debian used UTF-8 by default... :-| -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Christian Kurz , 99324@bugs.debian.org Resent-From: Christian Kurz Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 22:06:59 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99134642729532 (code B ref 99324); Thu, 31 May 2001 22:06:59 GMT Date: Thu, 31 May 2001 21:06:10 +0200 From: Christian Kurz To: Cesar Eduardo Barros , 99324@bugs.debian.org Message-ID: <20010531210610.A762@seteuid.getuid.de> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9jxsPFA5p3P2qPhR" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i Organization: set_eugid(getuid(), getgid()) Mail-Copies-To: never Delivered-To: 99324@bugs.debian.org --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On 01-05-30 Cesar Eduardo Barros wrote: > Package: debian-policy > Version: 3.5.4.0 > Severity: wishlist > > I think Debian should start to move into using UTF-8 by default everywhere. May I ask why we want to choose UTF-8 instead of UTF-5 or UTF-16? And why should we exactly switch to Unicode? How many real world system or applications currently support and/or use unicode? Christian -- Debian Developer (http://www.debian.org) 1024/26CC7853 31E6 A8CA 68FC 284F 7D16 63EC A9E6 67FF 26CC 7853 --9jxsPFA5p3P2qPhR Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5a (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsWliIACgkQqeZn/ybMeFOpZQCfWqYfpbn65eN9Q0BMtolJBWqr gP4An3Q77vYjJKJYClTF2RILkN0gUeOG =kNXb -----END PGP SIGNATURE----- --9jxsPFA5p3P2qPhR--   Acknowledgement sent to Christian Kurz <shorty@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Christian Kurz Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531210610.A762@seteuid.getuid.de> References: <20010531210610.A762@seteuid.getuid.de> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 22:00:27 +0000 From shorty@debian.org Thu May 31 17:00:27 2001 Return-path: Received: from p3e99067d.dip.t-dialin.net (seteuid.getuid.de) [62.153.6.125] ([zNqj3arC9+wsqscBKPxqlOT0Zvj4Pfhp]) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155aUU-0007bw-00; Thu, 31 May 2001 17:00:27 -0500 Received: by seteuid.getuid.de (Postfix, from userid 1000) id C9B1518002; Thu, 31 May 2001 21:06:10 +0200 (CEST) Date: Thu, 31 May 2001 21:06:10 +0200 From: Christian Kurz To: Cesar Eduardo Barros , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531210610.A762@seteuid.getuid.de> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9jxsPFA5p3P2qPhR" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i Organization: set_eugid(getuid(), getgid()) Mail-Copies-To: never Delivered-To: 99324@bugs.debian.org --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On 01-05-30 Cesar Eduardo Barros wrote: > Package: debian-policy > Version: 3.5.4.0 > Severity: wishlist > > I think Debian should start to move into using UTF-8 by default everywhere. May I ask why we want to choose UTF-8 instead of UTF-5 or UTF-16? And why should we exactly switch to Unicode? How many real world system or applications currently support and/or use unicode? Christian -- Debian Developer (http://www.debian.org) 1024/26CC7853 31E6 A8CA 68FC 284F 7D16 63EC A9E6 67FF 26CC 7853 --9jxsPFA5p3P2qPhR Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5a (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsWliIACgkQqeZn/ybMeFOpZQCfWqYfpbn65eN9Q0BMtolJBWqr gP4An3Q77vYjJKJYClTF2RILkN0gUeOG =kNXb -----END PGP SIGNATURE----- --9jxsPFA5p3P2qPhR--   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Cesar Eduardo Barros , 99324@bugs.debian.org Resent-From: Cesar Eduardo Barros Orignal-Sender: Cesar Eduardo Barros Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Thu, 31 May 2001 22:33:06 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.9913483799060 (code B ref 99324); Thu, 31 May 2001 22:33:06 GMT Date: Thu, 31 May 2001 19:32:52 -0300 To: Christian Kurz Cc: 99324@bugs.debian.org Message-ID: <20010531193252.A1735@flower.cesarb> References: <20010531210610.A762@seteuid.getuid.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531210610.A762@seteuid.getuid.de> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 09:06:10PM +0200, Christian Kurz wrote: > On 01-05-30 Cesar Eduardo Barros wrote: > > Package: debian-policy > > Version: 3.5.4.0 > > Severity: wishlist > > > > I think Debian should start to move into using UTF-8 by default everywhere. > > May I ask why we want to choose UTF-8 instead of UTF-5 or UTF-16? And > why should we exactly switch to Unicode? How many real world system or > applications currently support and/or use unicode? Ask the IETF. They seem to like UTF8 a lot. Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? Seriously, tough: - UTF8 preserves the meaning of all the 0-127 range, so no problems with special chars and escapes (as long as you use only ASCII you don't even notice it's there). The other UTF encodings weren't designed for that. The RFC which describes UTF8 (RFC 2279) says: # UTF-8, the object of this memo, has # the characteristic of preserving the full US-ASCII range, providing # compatibility with file systems, parsers and other software that rely # on US-ASCII values but are transparent to other values. - UTF8 allow the full range of Unicode (which means that any charset should be a subset of it. All other UTFs do it too.) - Where did you find that UTF-5? I just knew of UTF-7, UTF-8 and UTF-16... UTF7 is a standard which doesn't uses the high bit, so if an app isn't aware of it it might misinterpret the high characters are a string of normal chars. UTF16, AFAIK, is a standard for enconding the 32-bit UCS4 in 16 bit words. As to why to switch to UTF8: first, it's only a default (Debian is all about choice) and a standard charset for /usr/share/doc to be into. Second, it's a general tendency to be moving away from having to switch charsets all the time and towards using Unicode. Applications which currently use Unicode which I know of are Mozilla (since UTF8 is the default charset of XML IIRC, and used in a lot of places in the net), and the Win32 systems (including Wine. M$ seems to have done it in a horribly messy way, but I can't judge since I'm not a Windows programmer). I'm sure there are many more. But support is not a problem; we have the source don't we? Using UTF8 as a default is more of a long-term suggestion to be done in 1 or 2 Debian releases (which probably means at least four full years =) ) -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Acknowledgement sent to Cesar Eduardo Barros <cesarb@nitnet.com.br>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Cesar Eduardo Barros Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531193252.A1735@flower.cesarb> References: <20010531193252.A1735@flower.cesarb> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 31 May 2001 22:32:59 +0000 From cesarb@nitnet.com.br Thu May 31 17:32:59 2001 Return-path: Received: from itaipu.nitnet.com.br [200.255.111.241] by master.debian.org with smtp (Exim 3.12 1 (Debian)) id 155azy-0002M5-00; Thu, 31 May 2001 17:32:58 -0500 Received: (qmail 10895 invoked from network); 31 May 2001 22:32:53 -0000 Received: from salzburg.nitnet.com.br (HELO flower.cesarb) (200.198.84.62) by itaipu.nitnet.com.br with SMTP; 31 May 2001 22:32:53 -0000 Received: from cesarb by flower.cesarb with local (Exim 3.22 #1 (Debian)) id 155azs-0000Tk-00; Thu, 31 May 2001 19:32:52 -0300 Date: Thu, 31 May 2001 19:32:52 -0300 To: Christian Kurz Cc: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531193252.A1735@flower.cesarb> References: <20010531210610.A762@seteuid.getuid.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531210610.A762@seteuid.getuid.de> User-Agent: Mutt/1.3.18i From: Cesar Eduardo Barros Sender: Cesar Eduardo Barros Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 09:06:10PM +0200, Christian Kurz wrote: > On 01-05-30 Cesar Eduardo Barros wrote: > > Package: debian-policy > > Version: 3.5.4.0 > > Severity: wishlist > > > > I think Debian should start to move into using UTF-8 by default everywhere. > > May I ask why we want to choose UTF-8 instead of UTF-5 or UTF-16? And > why should we exactly switch to Unicode? How many real world system or > applications currently support and/or use unicode? Ask the IETF. They seem to like UTF8 a lot. Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? Seriously, tough: - UTF8 preserves the meaning of all the 0-127 range, so no problems with special chars and escapes (as long as you use only ASCII you don't even notice it's there). The other UTF encodings weren't designed for that. The RFC which describes UTF8 (RFC 2279) says: # UTF-8, the object of this memo, has # the characteristic of preserving the full US-ASCII range, providing # compatibility with file systems, parsers and other software that rely # on US-ASCII values but are transparent to other values. - UTF8 allow the full range of Unicode (which means that any charset should be a subset of it. All other UTFs do it too.) - Where did you find that UTF-5? I just knew of UTF-7, UTF-8 and UTF-16... UTF7 is a standard which doesn't uses the high bit, so if an app isn't aware of it it might misinterpret the high characters are a string of normal chars. UTF16, AFAIK, is a standard for enconding the 32-bit UCS4 in 16 bit words. As to why to switch to UTF8: first, it's only a default (Debian is all about choice) and a standard charset for /usr/share/doc to be into. Second, it's a general tendency to be moving away from having to switch charsets all the time and towards using Unicode. Applications which currently use Unicode which I know of are Mozilla (since UTF8 is the default charset of XML IIRC, and used in a lot of places in the net), and the Win32 systems (including Wine. M$ seems to have done it in a horribly messy way, but I can't judge since I'm not a Windows programmer). I'm sure there are many more. But support is not a problem; we have the source don't we? Using UTF8 as a default is more of a long-term suggestion to be done in 1 or 2 Debian releases (which probably means at least four full years =) ) -- Cesar Eduardo Barros cesarb@nitnet.com.br cesarb@dcc.ufrj.br   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Marco d'Itri , 99324@bugs.debian.org Resent-From: Marco d'Itri Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 02:03:48 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99135996917041 (code B ref 99324); Fri, 01 Jun 2001 02:03:48 GMT Date: Fri, 1 Jun 2001 03:31:53 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Message-ID: <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531193252.A1735@flower.cesarb> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Cesar Eduardo Barros wrote: >Ask the IETF. They seem to like UTF8 a lot. Because it's ASCII-compatible. This is not relevant. >Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? Because it's ASCII-compatibile. This is not relevant. UTF-8 maybe be useful for things like debconf templates as long as it's able to recode on the fly to the $LC_CTYPE encoding, but don't you dare fucking with the locales for languages and countries you don't know about. Making sure applications can deal with UTF-8 is ok, things like recoding the documentation are not. Most people (with the possible exception of part of the CJK community) do not want to use unicode yet, deal with it. -- ciao, Marco   Acknowledgement sent to Marco d'Itri <md@Linux.IT>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Marco d'Itri Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601033152.A6231@wonderland.linux.it> References: <20010601033152.A6231@wonderland.linux.it> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 01:46:09 +0000 From md@linux.it Thu May 31 20:46:09 2001 Return-path: Received: from attila.bofh.it [213.92.8.2] (postfix) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155e0v-0004Qm-00; Thu, 31 May 2001 20:46:09 -0500 Received: by attila.bofh.it (Postfix, from userid 10) id 47DD06098A; Fri, 1 Jun 2001 03:46:07 +0200 (CEST) Received: by wonderland.linux.it (Postfix/Md, from userid 1001) id 1D16F18186; Fri, 1 Jun 2001 03:31:53 +0200 (CEST) Date: Fri, 1 Jun 2001 03:31:53 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531193252.A1735@flower.cesarb> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Cesar Eduardo Barros wrote: >Ask the IETF. They seem to like UTF8 a lot. Because it's ASCII-compatible. This is not relevant. >Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? Because it's ASCII-compatibile. This is not relevant. UTF-8 maybe be useful for things like debconf templates as long as it's able to recode on the fly to the $LC_CTYPE encoding, but don't you dare fucking with the locales for languages and countries you don't know about. Making sure applications can deal with UTF-8 is ok, things like recoding the documentation are not. Most people (with the possible exception of part of the CJK community) do not want to use unicode yet, deal with it. -- ciao, Marco   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Joey Hess , 99324@bugs.debian.org Resent-From: Joey Hess Orignal-Sender: Joey Hess Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 03:33:03 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99136581630837 (code B ref 99324); Fri, 01 Jun 2001 03:33:03 GMT Date: Thu, 31 May 2001 23:27:41 -0400 From: Joey Hess To: Fumitoshi UKAI , 99324@bugs.debian.org Message-ID: <20010531232741.C12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <873d9lbo01.wl@mistral.ukai.org> User-Agent: Mutt/1.3.18i Sender: Joey Hess Delivered-To: 99324@bugs.debian.org Fumitoshi UKAI wrote: > debconf doesn't assume any encoding, does it? > We're usually using EUC-JP charset for debconf. No, debconf knows about as little about l10n and i18n as I. I'm glad to hear the Japanese stuff works btw. -- see shy jo   Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Joey Hess Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531232741.C12862@kitenet.net> References: <20010531232741.C12862@kitenet.net> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 03:23:36 +0000 From joey@silk.kitenet.net Thu May 31 22:23:36 2001 Return-path: Received: from as5800-82-107.access.naxs.com (silk.kitenet.net) [216.98.82.107] by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155fXD-00081K-00; Thu, 31 May 2001 22:23:36 -0500 Received: from joey by silk.kitenet.net with local (Exim 3.22 #1 (Debian)) id 155fbC-0006E4-00; Thu, 31 May 2001 23:27:42 -0400 Date: Thu, 31 May 2001 23:27:41 -0400 From: Joey Hess To: Fumitoshi UKAI , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531232741.C12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <873d9lbo01.wl@mistral.ukai.org> User-Agent: Mutt/1.3.18i Sender: Joey Hess Delivered-To: 99324@bugs.debian.org Fumitoshi UKAI wrote: > debconf doesn't assume any encoding, does it? > We're usually using EUC-JP charset for debconf. No, debconf knows about as little about l10n and i18n as I. I'm glad to hear the Japanese stuff works btw. -- see shy jo   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Joey Hess , 99324@bugs.debian.org Resent-From: Joey Hess Orignal-Sender: Joey Hess Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 03:33:07 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99136596431105 (code B ref 99324); Fri, 01 Jun 2001 03:33:07 GMT Date: Thu, 31 May 2001 23:30:07 -0400 From: Joey Hess To: Cesar Eduardo Barros , 99324@bugs.debian.org Message-ID: <20010531233007.D12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531165843.A518@flower.cesarb> User-Agent: Mutt/1.3.18i Sender: Joey Hess Delivered-To: 99324@bugs.debian.org Cesar Eduardo Barros wrote: > I think debconf should use UTF-8 for the templates and recode on the fly. Well, if you send in a patch, I will consider it. > There's nothing worse than having gibberish in ten different charsets in the > same template file. This is why the template file is the "compiled" form. You're intended to merge together a bunch of files that just have one (or 2, with English) languages in them. See debconf-mergetemplate(1). -- see shy jo   Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Joey Hess Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010531233007.D12862@kitenet.net> References: <20010531233007.D12862@kitenet.net> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 03:26:04 +0000 From joey@silk.kitenet.net Thu May 31 22:26:04 2001 Return-path: Received: from as5800-82-107.access.naxs.com (silk.kitenet.net) [216.98.82.107] by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155fZb-00085Z-00; Thu, 31 May 2001 22:26:04 -0500 Received: from joey by silk.kitenet.net with local (Exim 3.22 #1 (Debian)) id 155fdY-0006Ej-00; Thu, 31 May 2001 23:30:08 -0400 Date: Thu, 31 May 2001 23:30:07 -0400 From: Joey Hess To: Cesar Eduardo Barros , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010531233007.D12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010531165843.A518@flower.cesarb> User-Agent: Mutt/1.3.18i Sender: Joey Hess Delivered-To: 99324@bugs.debian.org Cesar Eduardo Barros wrote: > I think debconf should use UTF-8 for the templates and recode on the fly. Well, if you send in a patch, I will consider it. > There's nothing worse than having gibberish in ten different charsets in the > same template file. This is why the template file is the "compiled" form. You're intended to merge together a bunch of files that just have one (or 2, with English) languages in them. See debconf-mergetemplate(1). -- see shy jo   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Sam TH , 99324@bugs.debian.org Resent-From: Sam TH Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 07:37:25 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.9913805024363 (code B ref 99324); Fri, 01 Jun 2001 07:37:25 GMT Date: Fri, 1 Jun 2001 02:46:43 -0500 From: Sam TH To: Marco d'Itri , 99324@bugs.debian.org Message-ID: <20010601024643.E11115@uchicago.edu> References: <20010531193252.A1735@flower.cesarb> <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NtwzykIc2mflq5ck" Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601033152.A6231@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 03:31:53AM +0200 Delivered-To: 99324@bugs.debian.org --NtwzykIc2mflq5ck Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 01, 2001 at 03:31:53AM +0200, Marco d'Itri wrote: > Most people (with the possible exception of part of the CJK community) > do not want to use unicode yet, deal with it. Actually, most people who aren't using a Latin or Cyrillic alphabet want Unicode. Which is most people, period. Bad i18n is a bug, and a serious one. Deal with it is not a reasonable response. =20 =20 sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/ OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key DeCSS: http://samth.dyndns.org/decss --NtwzykIc2mflq5ck Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7F0hit+kM0Mq9M/wRAuj7AKC/6KhWt+XsIJzjChDQTvK3cmCG+gCePujD mJzyWx6qzuGzcj1xaF3RweU= =j4Cr -----END PGP SIGNATURE----- --NtwzykIc2mflq5ck--   Acknowledgement sent to Sam TH <sam@uchicago.edu>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Sam TH Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601024643.E11115@uchicago.edu> References: <20010601024643.E11115@uchicago.edu> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 07:28:22 +0000 From sam@uchicago.edu Fri Jun 01 02:28:22 2001 Return-path: Received: from bur-jud-175-135.rh.uchicago.edu (bur-jud-175-135) [128.135.175.135] by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155jM6-00018K-00; Fri, 01 Jun 2001 02:28:22 -0500 Received: by bur-jud-175-135 (Postfix, from userid 1000) id 396866A28; Fri, 1 Jun 2001 02:46:43 -0500 (CDT) Date: Fri, 1 Jun 2001 02:46:43 -0500 From: Sam TH To: Marco d'Itri , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601024643.E11115@uchicago.edu> References: <20010531193252.A1735@flower.cesarb> <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NtwzykIc2mflq5ck" Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601033152.A6231@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 03:31:53AM +0200 Delivered-To: 99324@bugs.debian.org --NtwzykIc2mflq5ck Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 01, 2001 at 03:31:53AM +0200, Marco d'Itri wrote: > Most people (with the possible exception of part of the CJK community) > do not want to use unicode yet, deal with it. Actually, most people who aren't using a Latin or Cyrillic alphabet want Unicode. Which is most people, period. Bad i18n is a bug, and a serious one. Deal with it is not a reasonable response. =20 =20 sam th --- sam@uchicago.edu --- http://www.abisource.com/~sam/ OpenPGP Key: CABD33FC --- http://samth.dyndns.org/key DeCSS: http://samth.dyndns.org/decss --NtwzykIc2mflq5ck Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7F0hit+kM0Mq9M/wRAuj7AKC/6KhWt+XsIJzjChDQTvK3cmCG+gCePujD mJzyWx6qzuGzcj1xaF3RweU= =j4Cr -----END PGP SIGNATURE----- --NtwzykIc2mflq5ck--   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Radovan Garabik , 99324@bugs.debian.org Resent-From: Radovan Garabik Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 11:18:02 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99139427312543 (code B ref 99324); Fri, 01 Jun 2001 11:18:02 GMT Date: Fri, 1 Jun 2001 13:17:43 +0200 From: Radovan Garabik To: Joey Hess , 99324@bugs.debian.org Cc: debian-policy@lists.debian.org Message-ID: <20010601131743.B26998@melkor.dnp.fmph.uniba.sk> References: <20010531165843.A518@flower.cesarb> <20010531233007.D12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010531233007.D12862@kitenet.net>; from joeyh@debian.org on Thu, May 31, 2001 at 11:30:07PM -0400 Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 11:30:07PM -0400, Joey Hess wrote: > Cesar Eduardo Barros wrote: > > I think debconf should use UTF-8 for the templates and recode on the fly. > > Well, if you send in a patch, I will consider it. Probably libc ought to support it (when there is all the i18n stuff built in). Maybe it already does, who knows... > > > There's nothing worse than having gibberish in ten different charsets in the > > same template file. > > This is why the template file is the "compiled" form. You're intended to > merge together a bunch of files that just have one (or 2, with English) > languages in them. See debconf-mergetemplate(1). even so... For the beginning I would propose something like this to go into policy: Documentation of debian packages, if written in language requiring characters outside of 7-bit ASCII range, should use either well-established encoding for the given language (such as ISO-8859-2 for some central- and easter europe languages, KOI8-R for Russian, JIS for Japanese etc...), or UTF-8 encoding. Maintainers are being encouradged to use UTF-8, having in mind the general tendency toward unified character encoding. Original upstream documentation, if in encoding other than UTF-8 _or_ the well-established encoding for the particular language, should be converted either to UTF-8 or to the well-established encoding. Choice between UTF-8 and other encoding is left at the maintainer discretion, however, one package should (must?) have all the documentation in one consistent encoding. Names of maintainers, upstream authors and other data in packages' descriptions and related data files (such as debian/changelog, debian/control), as well as in English language documentation, should be either transliterated or transcribed to ASCII, or used in UTF-8 encoding - again, at the discretion of the maintainer. However, for names in scripts based on non-latin alphabets, ASCII (or suitable latin-script) version should be provided along with original name. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Acknowledgement sent to Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Radovan Garabik Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601131743.B26998@melkor.dnp.fmph.uniba.sk> References: <20010601131743.B26998@melkor.dnp.fmph.uniba.sk> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 11:17:53 +0000 From garabik@melkor.dnp.fmph.uniba.sk Fri Jun 01 06:17:53 2001 Return-path: Received: from atlas15.dnp.fmph.uniba.sk (melkor.dnp.fmph.uniba.sk) [158.195.25.215] (mail) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155mwD-0003GG-00; Fri, 01 Jun 2001 06:17:53 -0500 Received: from garabik by melkor.dnp.fmph.uniba.sk with local (Exim 3.22 #1 (Debian)) id 155mw3-0007FZ-00; Fri, 01 Jun 2001 13:17:43 +0200 Date: Fri, 1 Jun 2001 13:17:43 +0200 From: Radovan Garabik To: Joey Hess , 99324@bugs.debian.org Cc: debian-policy@lists.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601131743.B26998@melkor.dnp.fmph.uniba.sk> References: <20010531165843.A518@flower.cesarb> <20010531233007.D12862@kitenet.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010531233007.D12862@kitenet.net>; from joeyh@debian.org on Thu, May 31, 2001 at 11:30:07PM -0400 Delivered-To: 99324@bugs.debian.org On Thu, May 31, 2001 at 11:30:07PM -0400, Joey Hess wrote: > Cesar Eduardo Barros wrote: > > I think debconf should use UTF-8 for the templates and recode on the fly. > > Well, if you send in a patch, I will consider it. Probably libc ought to support it (when there is all the i18n stuff built in). Maybe it already does, who knows... > > > There's nothing worse than having gibberish in ten different charsets in the > > same template file. > > This is why the template file is the "compiled" form. You're intended to > merge together a bunch of files that just have one (or 2, with English) > languages in them. See debconf-mergetemplate(1). even so... For the beginning I would propose something like this to go into policy: Documentation of debian packages, if written in language requiring characters outside of 7-bit ASCII range, should use either well-established encoding for the given language (such as ISO-8859-2 for some central- and easter europe languages, KOI8-R for Russian, JIS for Japanese etc...), or UTF-8 encoding. Maintainers are being encouradged to use UTF-8, having in mind the general tendency toward unified character encoding. Original upstream documentation, if in encoding other than UTF-8 _or_ the well-established encoding for the particular language, should be converted either to UTF-8 or to the well-established encoding. Choice between UTF-8 and other encoding is left at the maintainer discretion, however, one package should (must?) have all the documentation in one consistent encoding. Names of maintainers, upstream authors and other data in packages' descriptions and related data files (such as debian/changelog, debian/control), as well as in English language documentation, should be either transliterated or transcribed to ASCII, or used in UTF-8 encoding - again, at the discretion of the maintainer. However, for names in scripts based on non-latin alphabets, ASCII (or suitable latin-script) version should be provided along with original name. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Radovan Garabik , 99324@bugs.debian.org Resent-From: Radovan Garabik Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 12:03:35 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99139623818888 (code B ref 99324); Fri, 01 Jun 2001 12:03:35 GMT Date: Fri, 1 Jun 2001 13:50:24 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Cc: debian-policy@lists.debian.org Message-ID: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> References: <20010531193252.A1735@flower.cesarb> <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601033152.A6231@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 03:31:53AM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 03:31:53AM +0200, Marco d'Itri wrote: > On Jun 01, Cesar Eduardo Barros wrote: > > >Ask the IETF. They seem to like UTF8 a lot. > Because it's ASCII-compatible. This is not relevant. > > >Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? > Because it's ASCII-compatibile. This is not relevant. This IS relevant. We just CANNOT forget ASCII, as much as we would like to. And, when we already want something (mostly) upwards compatible with ASCII _and_ being the one and only unified encoding, why not to choose UTF-8, since it already IS a standard? > > UTF-8 maybe be useful for things like debconf templates as long as it's > able to recode on the fly to the $LC_CTYPE encoding, but don't you dare > fucking with the locales for languages and countries you don't know > about. All the locales should be in UTF-8. Point. Otherwise you are just contributing to the great mess. Recently I wanted to adjust a computer for one particular user, who speaks Lithuanian, Slovak, Russian (in this order), but only a bit of english So my naive[*] guess wat to try something like export LANGUAGE=lt:sk:cs:ru Guess what? IT CAN NEVER WORK! lt, sk and ru locales use three different encodings. [*] I'd like to type naive properly, with i-diaeresis, but I just cannot, since it is not in ISO-8859-2 encoding my console is switched to I have a user who speaks slovak and italian... again no luck, she even cannot write mails properly with diacritics in both languages, since that requires to change the console font, chanke the keymap, AND TO EDIT /etc/Muttrc file. > Making sure applications can deal with UTF-8 is ok, things like recoding > the documentation are not. I stumbled across a package, uhm, more of them actually, one had documentation in Polish, one in Russian... now if I had not known before that Polish uses (fortunately) the same ISO-8859-2 encoding as Slovak, and that the Russian documentation is most[**] likely written in KOI8-R (and even more fortunately I've already set up russian fonts), I would have had a difficult time reading the documentation. [**] It happens to me more often than I would like: I download a program, with documentation in (guess) Russian. Switch my console to KOI-8, type less... uhm... gibberish. Use konwert to convert from CP1251 to koi8-r, pipe to less... still gibberish... try ISO-8859-5, still nothing.. Some head scratches, ls /usr/share/konwert/filters, pondering about what could be other russian encodings... trying CP866, still gibberish... giving up in disgust, maybe it was japanese or chinese, god knows Several days later found out it was in ECMA-cyrillic. > > Most people (with the possible exception of part of the CJK community) > do not want to use unicode yet, deal with it. > Most people do not care about the encoding, they just want to USE the computer productively. THEY CANNOT because of the mess in encodings. I resigned from diacritics in my name for the purpose of communicating (yes, there IS diacritics in my name). How would you feel if you HAD TO write your name Marco Ditri, just because there happened to be no common encoding where the apostrophe has the same numerical representation? I recently needed to set up database of some people... guess what, most of them had A LOT of Slovak diacritics in names, but some of them were French with french diacritics... UTF-8 was about the only way, even if it was a bit painful (and the later need to add cyrillic names went perfectly without glitches, thanks to UTF-8) Americans have it easy. People using languages falling into ISO Latin group category have it a bit difficult, but acceptable... as long as they do not want to communicate with people from some OTHER ISO Lating groups. Then is turns to be hell. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Acknowledgement sent to Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Radovan Garabik Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> References: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 11:50:38 +0000 From garabik@melkor.dnp.fmph.uniba.sk Fri Jun 01 06:50:38 2001 Return-path: Received: from atlas15.dnp.fmph.uniba.sk (melkor.dnp.fmph.uniba.sk) [158.195.25.215] (mail) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155nRu-0004uV-00; Fri, 01 Jun 2001 06:50:38 -0500 Received: from garabik by melkor.dnp.fmph.uniba.sk with local (Exim 3.22 #1 (Debian)) id 155nRg-0007Ly-00; Fri, 01 Jun 2001 13:50:24 +0200 Date: Fri, 1 Jun 2001 13:50:24 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Cc: debian-policy@lists.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> References: <20010531193252.A1735@flower.cesarb> <20010601033152.A6231@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601033152.A6231@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 03:31:53AM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 03:31:53AM +0200, Marco d'Itri wrote: > On Jun 01, Cesar Eduardo Barros wrote: > > >Ask the IETF. They seem to like UTF8 a lot. > Because it's ASCII-compatible. This is not relevant. > > >Ask Linus too. The UTF8 support is in the kernel since, what, 2.0.x? > Because it's ASCII-compatibile. This is not relevant. This IS relevant. We just CANNOT forget ASCII, as much as we would like to. And, when we already want something (mostly) upwards compatible with ASCII _and_ being the one and only unified encoding, why not to choose UTF-8, since it already IS a standard? > > UTF-8 maybe be useful for things like debconf templates as long as it's > able to recode on the fly to the $LC_CTYPE encoding, but don't you dare > fucking with the locales for languages and countries you don't know > about. All the locales should be in UTF-8. Point. Otherwise you are just contributing to the great mess. Recently I wanted to adjust a computer for one particular user, who speaks Lithuanian, Slovak, Russian (in this order), but only a bit of english So my naive[*] guess wat to try something like export LANGUAGE=lt:sk:cs:ru Guess what? IT CAN NEVER WORK! lt, sk and ru locales use three different encodings. [*] I'd like to type naive properly, with i-diaeresis, but I just cannot, since it is not in ISO-8859-2 encoding my console is switched to I have a user who speaks slovak and italian... again no luck, she even cannot write mails properly with diacritics in both languages, since that requires to change the console font, chanke the keymap, AND TO EDIT /etc/Muttrc file. > Making sure applications can deal with UTF-8 is ok, things like recoding > the documentation are not. I stumbled across a package, uhm, more of them actually, one had documentation in Polish, one in Russian... now if I had not known before that Polish uses (fortunately) the same ISO-8859-2 encoding as Slovak, and that the Russian documentation is most[**] likely written in KOI8-R (and even more fortunately I've already set up russian fonts), I would have had a difficult time reading the documentation. [**] It happens to me more often than I would like: I download a program, with documentation in (guess) Russian. Switch my console to KOI-8, type less... uhm... gibberish. Use konwert to convert from CP1251 to koi8-r, pipe to less... still gibberish... try ISO-8859-5, still nothing.. Some head scratches, ls /usr/share/konwert/filters, pondering about what could be other russian encodings... trying CP866, still gibberish... giving up in disgust, maybe it was japanese or chinese, god knows Several days later found out it was in ECMA-cyrillic. > > Most people (with the possible exception of part of the CJK community) > do not want to use unicode yet, deal with it. > Most people do not care about the encoding, they just want to USE the computer productively. THEY CANNOT because of the mess in encodings. I resigned from diacritics in my name for the purpose of communicating (yes, there IS diacritics in my name). How would you feel if you HAD TO write your name Marco Ditri, just because there happened to be no common encoding where the apostrophe has the same numerical representation? I recently needed to set up database of some people... guess what, most of them had A LOT of Slovak diacritics in names, but some of them were French with french diacritics... UTF-8 was about the only way, even if it was a bit painful (and the later need to add cyrillic names went perfectly without glitches, thanks to UTF-8) Americans have it easy. People using languages falling into ISO Latin group category have it a bit difficult, but acceptable... as long as they do not want to communicate with people from some OTHER ISO Lating groups. Then is turns to be hell. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Marco d'Itri , 99324@bugs.debian.org Resent-From: Marco d'Itri Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 12:18:24 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99139732628665 (code B ref 99324); Fri, 01 Jun 2001 12:18:24 GMT Date: Fri, 1 Jun 2001 13:56:42 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Message-ID: <20010601135642.B1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601133112.B7815@cibalia.gkvk.hr> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Josip Rodin wrote: >Nice things these general tendencies... in my country we still have problems >using ISO 8859-2 because Windows 1250 has polluted everything. Adding >another one to the pile is likely to screw things up even more. This is the reason we can't just switch the terminals to UTF-8, there are way too many programs which can't correctly recode ISO-8859-* text, because they are broken or because the charset is unlabeled. Let's first fix the software, then we'll talk about using UTF-8 by default for everybody. -- ciao, Marco   Acknowledgement sent to Marco d'Itri <md@Linux.IT>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Marco d'Itri Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601135642.B1987@wonderland.linux.it> References: <20010601135642.B1987@wonderland.linux.it> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 12:08:46 +0000 From md@linux.it Fri Jun 01 07:08:46 2001 Return-path: Received: from attila.bofh.it [213.92.8.2] (postfix) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155njR-0007SE-00; Fri, 01 Jun 2001 07:08:45 -0500 Received: by attila.bofh.it (Postfix, from userid 10) id 18E675FF07; Fri, 1 Jun 2001 14:08:44 +0200 (CEST) Received: by wonderland.linux.it (Postfix/Md, from userid 1001) id 7719C18273; Fri, 1 Jun 2001 13:56:42 +0200 (CEST) Date: Fri, 1 Jun 2001 13:56:42 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601135642.B1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601133112.B7815@cibalia.gkvk.hr> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Josip Rodin wrote: >Nice things these general tendencies... in my country we still have problems >using ISO 8859-2 because Windows 1250 has polluted everything. Adding >another one to the pile is likely to screw things up even more. This is the reason we can't just switch the terminals to UTF-8, there are way too many programs which can't correctly recode ISO-8859-* text, because they are broken or because the charset is unlabeled. Let's first fix the software, then we'll talk about using UTF-8 by default for everybody. -- ciao, Marco   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Marco d'Itri , 99324@bugs.debian.org Resent-From: Marco d'Itri Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 12:19:34 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99139732628669 (code B ref 99324); Fri, 01 Jun 2001 12:19:34 GMT Date: Fri, 1 Jun 2001 14:07:59 +0200 From: Marco d'Itri Cc: 99324@bugs.debian.org Message-ID: <20010601140759.D1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Radovan Garabik wrote: >[*] I'd like to type naive properly, with i-diaeresis, but I just cannot, >since it is not in ISO-8859-2 encoding my console is switched to I'm not arguing about this. I agree that in a perfect world everybody would be using unicode, encoded as UTF-8 or UTF-16. My point is that there is too much broken software to switch now to UTF-8. >[**] It happens to me more often than I would like: I download a program, >with documentation in (guess) Russian. Switch my console to KOI-8, type less... >uhm... gibberish. Use konwert to convert from CP1251 to koi8-r, pipe to less... >still gibberish... try ISO-8859-5, still nothing.. >Some head scratches, ls /usr/share/konwert/filters, pondering about what could be >other russian encodings... trying CP866, still gibberish... giving up in disgust, >maybe it was japanese or chinese, god knows > >Several days later found out it was in ECMA-cyrillic. This is a good point, but it's something which should be discussed among russian developers (do we have any?). I'm not going to dictate them the encoding they should use. Maybe right now what they really need is a patched less which can invoke iconv, I don't know. -- ciao, Marco   Acknowledgement sent to Marco d'Itri <md@Linux.IT>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Marco d'Itri Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601140759.D1987@wonderland.linux.it> References: <20010601140759.D1987@wonderland.linux.it> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 12:08:46 +0000 From md@linux.it Fri Jun 01 07:08:46 2001 Return-path: Received: from attila.bofh.it [213.92.8.2] (postfix) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155njR-0007SF-00; Fri, 01 Jun 2001 07:08:45 -0500 Received: by attila.bofh.it (Postfix, from userid 10) id 4ABEC5FFE9; Fri, 1 Jun 2001 14:08:44 +0200 (CEST) Received: by wonderland.linux.it (Postfix/Md, from userid 1001) id C224818273; Fri, 1 Jun 2001 14:07:59 +0200 (CEST) Date: Fri, 1 Jun 2001 14:07:59 +0200 From: Marco d'Itri Cc: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601140759.D1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010601135024.C26998@melkor.dnp.fmph.uniba.sk> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Radovan Garabik wrote: >[*] I'd like to type naive properly, with i-diaeresis, but I just cannot, >since it is not in ISO-8859-2 encoding my console is switched to I'm not arguing about this. I agree that in a perfect world everybody would be using unicode, encoded as UTF-8 or UTF-16. My point is that there is too much broken software to switch now to UTF-8. >[**] It happens to me more often than I would like: I download a program, >with documentation in (guess) Russian. Switch my console to KOI-8, type less... >uhm... gibberish. Use konwert to convert from CP1251 to koi8-r, pipe to less... >still gibberish... try ISO-8859-5, still nothing.. >Some head scratches, ls /usr/share/konwert/filters, pondering about what could be >other russian encodings... trying CP866, still gibberish... giving up in disgust, >maybe it was japanese or chinese, god knows > >Several days later found out it was in ECMA-cyrillic. This is a good point, but it's something which should be discussed among russian developers (do we have any?). I'm not going to dictate them the encoding they should use. Maybe right now what they really need is a patched less which can invoke iconv, I don't know. -- ciao, Marco   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Marco d'Itri , 99324@bugs.debian.org Resent-From: Marco d'Itri Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 12:19:43 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.99139732628670 (code B ref 99324); Fri, 01 Jun 2001 12:19:43 GMT Date: Fri, 1 Jun 2001 14:00:34 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Message-ID: <20010601140034.C1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87y9rc5yxv.fsf@cachemir.echo-net.net> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Roland Mas wrote: >> Most people (with the possible exception of part of the CJK >> community) do not want to use unicode yet, deal with it. > >Excuse me? "With the possible exception of the CJK community"? What >about people speaking (and writing/typing) Arabic, Hebrew, Greek, >Russian and whatnot? I don't know about Arabic and Hebrew, but russian people don't like unicode and do not want to switch from the KOI-8 and KOI-8r encodings. > I gather you're Italian, so you might need some >accents (I remember seeing some "è"). Yes, and I already have a correctly working national encoding, thank you. > In *my* experience, most people *do* want to use Unicode. I would Just don't force it on everybody else. I'd love to be able to use unicode for everything, but the reality is that the software is not yet good enough to support it to the same level of the national encodings. -- ciao, Marco   Acknowledgement sent to Marco d'Itri <md@Linux.IT>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Marco d'Itri Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601140034.C1987@wonderland.linux.it> References: <20010601140034.C1987@wonderland.linux.it> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 12:08:46 +0000 From md@linux.it Fri Jun 01 07:08:46 2001 Return-path: Received: from attila.bofh.it [213.92.8.2] (postfix) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155njR-0007SD-00; Fri, 01 Jun 2001 07:08:45 -0500 Received: by attila.bofh.it (Postfix, from userid 10) id 316C15FF54; Fri, 1 Jun 2001 14:08:44 +0200 (CEST) Received: by wonderland.linux.it (Postfix/Md, from userid 1001) id 63D8818273; Fri, 1 Jun 2001 14:00:34 +0200 (CEST) Date: Fri, 1 Jun 2001 14:00:34 +0200 From: Marco d'Itri To: 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601140034.C1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87y9rc5yxv.fsf@cachemir.echo-net.net> User-Agent: Mutt/1.3.18i Delivered-To: 99324@bugs.debian.org On Jun 01, Roland Mas wrote: >> Most people (with the possible exception of part of the CJK >> community) do not want to use unicode yet, deal with it. > >Excuse me? "With the possible exception of the CJK community"? What >about people speaking (and writing/typing) Arabic, Hebrew, Greek, >Russian and whatnot? I don't know about Arabic and Hebrew, but russian people don't like unicode and do not want to switch from the KOI-8 and KOI-8r encodings. > I gather you're Italian, so you might need some >accents (I remember seeing some "è"). Yes, and I already have a correctly working national encoding, thank you. > In *my* experience, most people *do* want to use Unicode. I would Just don't force it on everybody else. I'd love to be able to use unicode for everything, but the reality is that the software is not yet good enough to support it to the same level of the national encodings. -- ciao, Marco   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Radovan Garabik , 99324@bugs.debian.org Resent-From: Radovan Garabik Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 12:49:02 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.9913995272170 (code B ref 99324); Fri, 01 Jun 2001 12:49:02 GMT Date: Fri, 1 Jun 2001 14:45:12 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Message-ID: <20010601144512.A28881@melkor.dnp.fmph.uniba.sk> References: <20010601133112.B7815@cibalia.gkvk.hr> <20010601135642.B1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601135642.B1987@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 01:56:42PM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 01:56:42PM +0200, Marco d'Itri wrote: > On Jun 01, Josip Rodin wrote: > > >Nice things these general tendencies... in my country we still have problems > >using ISO 8859-2 because Windows 1250 has polluted everything. Adding > >another one to the pile is likely to screw things up even more. > This is the reason we can't just switch the terminals to UTF-8, there > are way too many programs which can't correctly recode ISO-8859-* text, > because they are broken or because the charset is unlabeled. so we first make them work with ISO-8859-*, then work on making applications work with UTF-8, then work on making those terminals display UTF-8? I can see a shortcut here... > Let's first fix the software, then we'll talk about using UTF-8 by > default for everybody. but fix it which way? To support CP1250? ISO-8859-2? CP852? Or KOI8-R? CP1251? EMCA? CP866? better concentrate on UTF-8, it indeed does solve many problems better let the old terminals die... if we had not let the old Czechoslovak Kamenicky encoding to die, but focused instead on fixing the software, we would have much bigger mess than there is now. All the i18n stuff in glibc is a bit flawed... it assumes you NEVER want to change the default locale while the program is running, and it assumes everybody has correct terminal. Have you seen konwert package? It is really nice, glibc's iconv(3) should have been like this and there would be one problem less... Ideal (under the circumstances) would be: have glibc work internaly in UTF-8 unconditionaly Output is transliterated according to terminal charset (ideally UTF-8, so no conversion is necessary). Terminal charset can be switched over _on the fly_, maybe via SIG-SOMETHING to glibc locales are in UTF-8 unconditionaly isprint(3) returns 1 for UTF-8 characters (fuzzy here... but it definitely should not be tied to locale), actual displaying the character is handled by konwert-like output routine You want your ISO-8859-1 console with locales? No problem, do export OUTPUTCHARSER=ISO-8859-1 and glibc will transliterate eventual russian fortunes into latin script... and strip diacritics from Slovak names. readline, stty & co. are UTF-8 aware input can be recoded to UTF-8 if necessary, but ideally it is already coming in UTF-8 (the biggest problems are text editors here, maybe use filterm(1) for old applications) Input encoding, too, can be changed on the fly. have X work in UTF-8 xkb sends UTF-8, default font encoding is UTF-8 allow UTF-8 in /etc/passwd. Damn. I was bitten by this a few days ago. 8-bit chars in GECOS behave unpredictably -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Acknowledgement sent to Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Radovan Garabik Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601144512.A28881@melkor.dnp.fmph.uniba.sk> References: <20010601144512.A28881@melkor.dnp.fmph.uniba.sk> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 12:45:27 +0000 From garabik@melkor.dnp.fmph.uniba.sk Fri Jun 01 07:45:27 2001 Return-path: Received: from atlas15.dnp.fmph.uniba.sk (melkor.dnp.fmph.uniba.sk) [158.195.25.215] (mail) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155oIw-0000Yx-00; Fri, 01 Jun 2001 07:45:26 -0500 Received: from garabik by melkor.dnp.fmph.uniba.sk with local (Exim 3.22 #1 (Debian)) id 155oIi-0007go-00; Fri, 01 Jun 2001 14:45:12 +0200 Date: Fri, 1 Jun 2001 14:45:12 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601144512.A28881@melkor.dnp.fmph.uniba.sk> References: <20010601133112.B7815@cibalia.gkvk.hr> <20010601135642.B1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601135642.B1987@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 01:56:42PM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 01:56:42PM +0200, Marco d'Itri wrote: > On Jun 01, Josip Rodin wrote: > > >Nice things these general tendencies... in my country we still have problems > >using ISO 8859-2 because Windows 1250 has polluted everything. Adding > >another one to the pile is likely to screw things up even more. > This is the reason we can't just switch the terminals to UTF-8, there > are way too many programs which can't correctly recode ISO-8859-* text, > because they are broken or because the charset is unlabeled. so we first make them work with ISO-8859-*, then work on making applications work with UTF-8, then work on making those terminals display UTF-8? I can see a shortcut here... > Let's first fix the software, then we'll talk about using UTF-8 by > default for everybody. but fix it which way? To support CP1250? ISO-8859-2? CP852? Or KOI8-R? CP1251? EMCA? CP866? better concentrate on UTF-8, it indeed does solve many problems better let the old terminals die... if we had not let the old Czechoslovak Kamenicky encoding to die, but focused instead on fixing the software, we would have much bigger mess than there is now. All the i18n stuff in glibc is a bit flawed... it assumes you NEVER want to change the default locale while the program is running, and it assumes everybody has correct terminal. Have you seen konwert package? It is really nice, glibc's iconv(3) should have been like this and there would be one problem less... Ideal (under the circumstances) would be: have glibc work internaly in UTF-8 unconditionaly Output is transliterated according to terminal charset (ideally UTF-8, so no conversion is necessary). Terminal charset can be switched over _on the fly_, maybe via SIG-SOMETHING to glibc locales are in UTF-8 unconditionaly isprint(3) returns 1 for UTF-8 characters (fuzzy here... but it definitely should not be tied to locale), actual displaying the character is handled by konwert-like output routine You want your ISO-8859-1 console with locales? No problem, do export OUTPUTCHARSER=ISO-8859-1 and glibc will transliterate eventual russian fortunes into latin script... and strip diacritics from Slovak names. readline, stty & co. are UTF-8 aware input can be recoded to UTF-8 if necessary, but ideally it is already coming in UTF-8 (the biggest problems are text editors here, maybe use filterm(1) for old applications) Input encoding, too, can be changed on the fly. have X work in UTF-8 xkb sends UTF-8, default font encoding is UTF-8 allow UTF-8 in /etc/passwd. Damn. I was bitten by this a few days ago. 8-bit chars in GECOS behave unpredictably -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#99324; Package debian-policy.   debian-bugs-dist@lists.debian.orgDebian Policy List  Subject: Bug#99324: Default charset should be UTF-8 Reply-To: Radovan Garabik , 99324@bugs.debian.org Resent-From: Radovan Garabik Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: Debian Policy List Resent-Date: Fri, 01 Jun 2001 14:09:43 GMT Resent-Message-ID: Resent-Sender: owner@bugs.debian.org X-Debian-PR-Message: report 99324 X-Debian-PR-Package: debian-policy X-Debian-PR-Keywords: X-Loop: owner@bugs.debian.org Received: via spool by 99324-submit@bugs.debian.org id=B99324.9913998953191 (code B ref 99324); Fri, 01 Jun 2001 14:09:43 GMT Date: Fri, 1 Jun 2001 14:51:20 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Message-ID: <20010601145120.B28881@melkor.dnp.fmph.uniba.sk> References: <87y9rc5yxv.fsf@cachemir.echo-net.net> <20010601140034.C1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601140034.C1987@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 02:00:34PM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 02:00:34PM +0200, Marco d'Itri wrote: > On Jun 01, Roland Mas wrote: > > >> Most people (with the possible exception of part of the CJK > >> community) do not want to use unicode yet, deal with it. > > > >Excuse me? "With the possible exception of the CJK community"? What > >about people speaking (and writing/typing) Arabic, Hebrew, Greek, > >Russian and whatnot? > I don't know about Arabic and Hebrew, but russian people don't like > unicode and do not want to switch from the KOI-8 and KOI-8r encodings. > UNLESS they need to communicate with people from other charset groups (It directly concerns me - I need(ed) to work with both russian and slovak) > > I gather you're Italian, so you might need some > >accents (I remember seeing some "?"). > Yes, and I already have a correctly working national encoding, thank you. > Do you? Just send me a mail in italian... and we'll see how much of its accents will display as '?' And I already have a correctly working national encoding, too. > > In *my* experience, most people *do* want to use Unicode. I would > Just don't force it on everybody else. I agree... as long as you do not force ISO-8859-1 (or anything) on everybody, as it is currently in debian. Remember, debian is international > > I'd love to be able to use unicode for everything, but the reality is > that the software is not yet good enough to support it to the same level > of the national encodings. > So let's fix the software. All my programs are if not unicode aware, then at least not unicode hostile. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!   Acknowledgement sent to Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>.   -t  From: owner@bugs.debian.org (Debian Bug Tracking System) To: Radovan Garabik Subject: Bug#99324: Info received (was Bug#99324: Default charset should be UTF-8) Message-ID: In-Reply-To: <20010601145120.B28881@melkor.dnp.fmph.uniba.sk> References: <20010601145120.B28881@melkor.dnp.fmph.uniba.sk> X-Debian-PR-Message: ack-info-maintonly 99324 Thank you for the additional information you have supplied regarding this problem report. It has been forwarded to the developer(s) and to the developers mailing list to accompany the original report. Your message has been sent to the package maintainer(s): Debian Policy List If you wish to continue to submit further information on your problem, please send it to 99324@bugs.debian.org, as before. Please do not reply to the address at the top of this message, unless you wish to report a problem with the Bug-tracking system. Darren Benham (administrator, Debian Bugs database)   Received: (at 99324) by bugs.debian.org; 1 Jun 2001 12:51:35 +0000 From garabik@melkor.dnp.fmph.uniba.sk Fri Jun 01 07:51:35 2001 Return-path: Received: from atlas15.dnp.fmph.uniba.sk (melkor.dnp.fmph.uniba.sk) [158.195.25.215] (mail) by master.debian.org with esmtp (Exim 3.12 1 (Debian)) id 155oOt-0000pO-00; Fri, 01 Jun 2001 07:51:35 -0500 Received: from garabik by melkor.dnp.fmph.uniba.sk with local (Exim 3.22 #1 (Debian)) id 155oOe-0007j4-00; Fri, 01 Jun 2001 14:51:20 +0200 Date: Fri, 1 Jun 2001 14:51:20 +0200 From: Radovan Garabik To: Marco d'Itri , 99324@bugs.debian.org Subject: Re: Bug#99324: Default charset should be UTF-8 Message-ID: <20010601145120.B28881@melkor.dnp.fmph.uniba.sk> References: <87y9rc5yxv.fsf@cachemir.echo-net.net> <20010601140034.C1987@wonderland.linux.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010601140034.C1987@wonderland.linux.it>; from md@Linux.IT on Fri, Jun 01, 2001 at 02:00:34PM +0200 Delivered-To: 99324@bugs.debian.org On Fri, Jun 01, 2001 at 02:00:34PM +0200, Marco d'Itri wrote: > On Jun 01, Roland Mas wrote: > > >> Most people (with the possible exception of part of the CJK > >> community) do not want to use unicode yet, deal with it. > > > >Excuse me? "With the possible exception of the CJK community"? What > >about people speaking (and writing/typing) Arabic, Hebrew, Greek, > >Russian and whatnot? > I don't know about Arabic and Hebrew, but russian people don't like > unicode and do not want to switch from the KOI-8 and KOI-8r encodings. > UNLESS they need to communicate with people from other charset groups (It directly concerns me - I need(ed) to work with both russian and slovak) > > I