Russkaja Latinica Specification
Copyright (c) 1995 Alexy Khrabrov and Serge Winitzki
This is the complete specification of Russkaya Latinica. For a less
formal description, see Overview and User's Guide.
General
All Russian letters are represented by one or two Latin characters. (With
the exception of "tschcha", which can be represented as either "q", "w", or
"shch".)
To each Russian letter in a given context, there corresponds at least one
Latin combination, and to each Latin letter or combination there
corresponds only one Russian letter. The defined two-letter Latin
combinations are always interpreted as one Russian letter.
Latin characters are either "robust" or "malleable". Robust ones always
translate to the same Russian characters. Malleable ones are translated to
different Russian characters depending on the word context. All two-letter
Latin combinations are robust. All Latin letters are robust. Only the
following symbols are malleable:
' ` ~ @
More than one combination of Latin letters sometimes correspond to one and
the same Russian letter. Therefore, one can use different combinations
according to personal preference.
Specification
While converting from a native encoding to Latinica, Russian
alphabet is transliterated as follows:
Uppercase:
A, B, V, G, D, E, YO, ZH, Z, I, J, K, L, M, N, O, P, R, S, T, U, F, X,
C, CH, SH, W, ~, Y, ', E', YU, YA
Lowercase:
a, b, v, g, d, e, yo, zh, z, i, j, k, l, m, n, o, p, r, s, t, u, f, x,
c, ch, sh, w, ~, y, ', e', yu, ya
When translating to Latinica, there are several alternatives to
certain letters in the alphabet above. The letter "tshcha" is rendered by a
single letter q or w or the combination sj or
shch. Also, there is an option to make w synomymous to
v instead of "tshcha", and an option to switch off sj.
The soft sign is ' or `, and the hard sign is ~. The
letter "E oborotnoe" can be aliased by @.
Malleable symbols
The malleable symbols @, ', ` and ~ (meaning
"e'" and the soft and the hard sign) are lowercase or uppercase depending
on the case of surrounding letters, according to these rules:
- A symbol is considered to be surrounded by lowercase letters
if at least one of its neighbor letters is lowercase. A symbol is
surrounded by uppercase letters if all of its neighbor
letters are uppercase.
- If a malleable symbol is surrounded by lowercase letters or used
alone, it's lowercase; if surrounded by uppercase letters, it's uppercase.
The malleable symbol ` stands for the soft sign with the same
uppercase or lowercase variations, unless it is used at the beginning of
the word; in this case it is translated as the back quote (i.e. not
changed).
The combinations of ^ and _ with the malleable symbols lead
to fixed upper- and lowercase rendering of those symbols.
Aliased combinations
The following two-letter combinations are defined to be identical:
- Ch = CH (as in "CHechnya")
- Sh = SH (as in "SHahmaty")
- Zh = ZH (as in "ZHiguli")
- jo = yo (as in "prijom")
- ja = ya (as in "zajac")
- ju = ju (as in "v'juga")
- YO = Yo = Jo = JO
- YA = Ya = Ja = JA
- YU = Yu = Ju = JU
- JE = Je = E (as in Jel'cin)
- je = e (but not "ye"!)
- E` = E'
- e` = e' (as in "e`poha")
- J' = J` = J
- j' = j` = j (as in "maj`or" or "j'oga")
- SJ = Sj = Q = W
The following digraphs have special meaning:
- \' = ' [single quote]
- \` = ' [single back quote]
- \@ = @
- \~ = ~
- \\ = \ [single backslash]
- \ [backslash + space] switches between English and Russian
modes.
- ^ with malleable symbols (@ ` ' ~)
- _ with malleable symbols (@ ` ' ~)
The following characters are also defined:
- \ = [no character output] - used to break digraphs
- @ = e' or E' (it's a malleable symbol)
- h = x (as in "xorosho") unless preceded by C, S, Z, c, s, z.
- H = X unless preceded by C, S, Z.
By default, the letter w is defined as "tshcha",
- w = q = sj [lowercase "tshcha"]
- W = Q = SJ [uppercase "tshcha"]
However, there is an option to make it an alias for v.
Combining English and Russian text
To include a portion of English text, enclose it in \ signs
(backslash with a following space). After a \ , no translation is
done until another \ is encountered. The \ combination
itself produces no output.
Emphasizing parts of text
The symbols * (star) and _ (underscore) are not used by
Russkaja Latinica; they may be used for emphasizing words or parts
of text. Capitalization of words or sentences is considered bad NETiquette
and is not advised.
Back to home page