DSL Ideas and Suggestions :: Damn Small Characters for Interlanguage Interchang



Damn Small Characters for Interlingual Interchange (DaSCII)

-- a proposal --

THE PROBLEM

The majority languages spoken by over 50% of people worldwide are(in order): Chinese (962 million), English (322m), Spanish (266m), Russian (170m), Portuguese (170m), Japanese (125m), German (98m), Bengali (189m) & Hindi 182m).  Neither 7-bit nor 8-bit systems provide enough characters to directly cover these languages.

THE SOLUTION

By examining the transliteration systems for each of the languages, the number of unique characters can be drastically reduced.  The dravidian languages (Bengali, Hindi, et cetera) present the greatest difficulty due to a great number of diacritical and other marks.  Therefore:

1. Start with the IBM 8-bit character set.

2. Insert characters 177 - 250 from the Indian Script Code for Information Interchange.

3. Add the following characters:

  158 --- (the Euro symbol)
  166 - z (with a tail underneath, used in Arabic transliteration)
  167 - t (with a tail underneath, used in Arabic transliteration)
  169 - e rising tone (the other rising-tone characters are covered)
  170 - a falling-rising tone
  235 - e falling-rising tone
  236 - i falling-rising tone
  237 - o falling-rising tone
  238 - u falling-falling tone
  251 - s (with a tail underneath, used in Arabic transliteration)
  252 - d (with a tail underneath, used in Arabic transliteration)
  253 - o (with a shallow u-shaped mark above, used in Korean transliteration)
  254 - u (with a shallow u-shaped mark above, used in Korean transliteration)

USAGE

1. Where a character is found in DaSCII, use it.
2. Where a character is not found in DaSCII, use transliterations, for example:

  German - Use dipthongs suggested in the BGN/PCGN 2000 Agreement
  Russian - Use dipthongs suggested in the BGN/PCGN 1947 Agreement
  Romanji - Use the umlauted character for the high tone.

== This will allow transliterations of over 50% of human languages with _one_ font ==

what about the other 50% ?

why can't everyone communicate with just one language ?

why add yet another system ?

Quote (humpty @ Mar. 11 2008,22:41)
what about the other 50% ?

why can't everyone communicate with just one language ?

why add yet another system ?

Answering your questions in order:

1. Actually, the 50% figure came from looking at population statistics.  Looking at the actual transliteration systems, the figure will be far greater than 50%.

2. Social evolution.

3. It's _not_ "another system."  What I am proposing is to maximise the usefulness of the system we have, the 8-bit character set.

Remember, I am _not_ proposing a _language_ font, but a _transliteration_ font.  It's purpose is to make Damn Small Linux accessable to the greatest number of people.  And, therefore, maximally successful world-wide.

I've looked at the Anglo/American systems, the United Nations systems and the ex-Soviet Block systems.  The difficulty comes from the addition of marks to  the roman consenants - the increase is explosive!  Use of dipthongs decreases the necessity for extra consenants.  

The political choice is between providing a so-so solution that includes all the vowel variants and leaving South-Asia in the lurch versus including the South-Asian characters and using dipthongs to cover the rest of the systems.

Ultimately, there is a physical limit = 256 characters.

It *IS* another system. How many people who don't know English or use a Latin or Cyrillic alphabet use transliterated Latin characters to communicate?

Those already familiar with this particular alphabet most likely already speak the (pardon me) lingua franca of the Internet and of most programming, English. I don't see what's so bloody important about this subject that it requires at least two polls and yet another thread.

"Social evolution" isn't tied to transliteration but literacy and actual translation. If you want more people who don't speak English (or use Latin/Cyrillic alphabets) to use DSL, perhaps you can help add the characters they actually know and use.

Quote (lucky13 @ Mar. 12 2008,12:29)
It *IS* another system. How many people who don't know English or use a Latin or Cyrillic alphabet use transliterated Latin characters to communicate?

Those already familiar with this particular alphabet most likely already speak the (pardon me) lingua franca of the Internet and of most programming, English. I don't see what's so bloody important about this subject that it requires at least two polls and yet another thread.

"Social evolution" isn't tied to transliteration but literacy and actual translation. If you want more people who don't speak English (or use Latin/Cyrillic alphabets) to use DSL, perhaps you can help add the characters they actually know and use.

Actually, a lot of people learn transliterations, for the purpose of access to computers as a means to communicate with others.  For example, the following is a true statement about myself:  "Wo shi zhung wen xue shung."  It is meaningless without the vowel marks.  With the vowel marks, is is understandable to millions that "I am a Chinese language student."

Your use of "lingua franca" illustrates the issue:  French was promoted as an "international" language when France was an economic power.  Loss of economic power reduced linguistic dominance to the historic relic of lingua franca.

The US has been declining as a percentage of the gross international product since 1970.  Soon, lingua yankee may be all that is left of our claim to linguistic dominance.

Ultimately, the issue is neither nations nor languages, but the limits of the 8-bit byte, which looks to be more durable than nations or languages.

It would have been much easier if we had gone with the PDP-8 and 12-bit bytes.  4096 characters could have phonetically covered the world.

At this point, this is becoming one of those "lighter-more filling" arguments that DSL seems to inspire.  I come down on the "lighter" side (8-bit font) for initial access to DSL.  Once one has discovered that DSL is really useful, one can switch to the "more filling" camp and use UNICODE.

Next Page...
original here.