How to make tr aware of non-ascii(unicode) characters? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questiontr not replacing apostropheHow do I turn accented lowercase letters to uppercase? - Using the 'tr' commandHow can I convert Persian numerals in UTF-8 to European numerals in ASCII?tr analog for unicode characters?How to translate Unicode characters?How do I extract only alphanumeric characters from a given text file and print them?Character count of language X in mixed text file?Print out binary data as is without breaking the terminalRemove new line, space from fileHow to do a regex search in a UTF-16LE file while in a UTF-8 locale?Non-ASCII printable characters in sshd bannerHow can I make the TTY use the appropriate charset?How to make the login shell xterm use utf-8?Unicode support in talk?Detect how much of Unicode my terminal supports, even through screenWhy doesn't my Perl play nice with Unicode?tr analog for unicode characters?How to translate Unicode characters?Removing characters with sed

What is the meaning of the new sigil in Game of Thrones Season 8 intro?

Should I discuss the type of campaign with my players?

Can a USB port passively 'listen only'?

Error "illegal generic type for instanceof" when using local classes

Why light coming from distant stars is not discreet?

How to tell that you are a giant?

Resolving to minmaj7

Can an alien society believe that their star system is the universe?

Why are Kinder Surprise Eggs illegal in the USA?

Single word antonym of "flightless"

Identifying polygons that intersect with another layer using QGIS?

String `!23` is replaced with `docker` in command line

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

When do you get frequent flier miles - when you buy, or when you fly?

Book where humans were engineered with genes from animal species to survive hostile planets

How to deal with a team lead who never gives me credit?

Echoing a tail command produces unexpected output?

51k Euros annually for a family of 4 in Berlin: Is it enough?

Why is my conclusion inconsistent with the van't Hoff equation?

Sci-Fi book where patients in a coma ward all live in a subconscious world linked together

How do pianists reach extremely loud dynamics?

Can a non-EU citizen traveling with me come with me through the EU passport line?

Check which numbers satisfy the condition [A*B*C = A! + B! + C!]

What does an IRS interview request entail when called in to verify expenses for a sole proprietor small business?

How to make tr aware of non-ascii(unicode) characters?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Community Moderator Election Results

Why I closed the “Why is Kali so hard” questiontr not replacing apostropheHow do I turn accented lowercase letters to uppercase? - Using the 'tr' commandHow can I convert Persian numerals in UTF-8 to European numerals in ASCII?tr analog for unicode characters?How to translate Unicode characters?How do I extract only alphanumeric characters from a given text file and print them?Character count of language X in mixed text file?Print out binary data as is without breaking the terminalRemove new line, space from fileHow to do a regex search in a UTF-16LE file while in a UTF-8 locale?Non-ASCII printable characters in sshd bannerHow can I make the TTY use the appropriate charset?How to make the login shell xterm use utf-8?Unicode support in talk?Detect how much of Unicode my terminal supports, even through screenWhy doesn't my Perl play nice with Unicode?tr analog for unicode characters?How to translate Unicode characters?Removing characters with sed

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm trying to remove some characters from file(UTF-8). I'm using tr for this purpose:

tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

File contains some foreign characters (like "Латвийская" or "àé"). tr doesn't seem to understand them: it treats them as non-alpha and removes too.

I've tried changing some of my locale settings:

LC_CTYPE=C LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=ru_RU.UTF-8 tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

Unfortunately, none of these worked.

How can I make tr understand Unicode?

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

add a comment |

I'm trying to remove some characters from file(UTF-8). I'm using tr for this purpose:

tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

File contains some foreign characters (like "Латвийская" or "àé"). tr doesn't seem to understand them: it treats them as non-alpha and removes too.

I've tried changing some of my locale settings:

LC_CTYPE=C LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=ru_RU.UTF-8 tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

Unfortunately, none of these worked.

How can I make tr understand Unicode?

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

add a comment |

I'm trying to remove some characters from file(UTF-8). I'm using tr for this purpose:

tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

File contains some foreign characters (like "Латвийская" or "àé"). tr doesn't seem to understand them: it treats them as non-alpha and removes too.

I've tried changing some of my locale settings:

LC_CTYPE=C LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=ru_RU.UTF-8 tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

Unfortunately, none of these worked.

How can I make tr understand Unicode?

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

I'm trying to remove some characters from file(UTF-8). I'm using tr for this purpose:

tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

File contains some foreign characters (like "Латвийская" or "àé"). tr doesn't seem to understand them: it treats them as non-alpha and removes too.

I've tried changing some of my locale settings:

LC_CTYPE=C LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=C tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat
LC_CTYPE=ru_RU.UTF-8 LC_COLLATE=ru_RU.UTF-8 tr -cs '[[:alpha:][:space:]]' ' ' <testdata.dat

Unfortunately, none of these worked.

How can I make tr understand Unicode?

linux text-processing unicode tr

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

edited Sep 9 '15 at 14:53

Toby Speight

5,61811234

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

asked Sep 9 '15 at 12:57

MatthewRock

4,07331849

add a comment |

1 Answer
1

active

oldest

votes

That's a known (1, 2, 3, 4, 5, 6) limitation of the GNU implementation of tr.

It's not as much that it doesn't support foreign, non-English or non-ASCII characters, but that it doesn't support multi-byte characters.

Those Cyrillic characters would be treated OK, if written in the iso8859-5 (single-byte per character) character set (and your locale was using that charset), but your problem is that you're using UTF-8 where non-ASCII characters are encoded in 2 or more bytes.

GNU's got a plan (see also) to fix that and work is under way but not there yet.

FreeBSD or Solaris tr don't have the problem.

In the mean time, for most use cases of tr, you can use GNU sed or GNU awk which do support multi-byte characters.

For instance, your:

tr -cs '[[:alpha:][:space:]]' ' '

could be written:

gsed -E 's/( |[^[:space:][:alpha:]])+/ /'

or:

gawk -v RS='( |[^[:space:][:alpha:]])+' 'printf "%s", sep $0; sep=" "'

To convert between lower and upper case (tr '[:upper:]' '[:lower:]'):

gsed 's/[[:upper:]]/l&/g'

(that l is a lowercase L, not the 1 digit).

or:

gawk 'print tolower($0)'

For portability, perl is another alternative:

perl -Mopen=locale -pe 's/([^[:space:][:alpha:]]| )+/ /g'
perl -Mopen=locale -pe '$_=lc$_'

If you know the data can be represented in a single-byte character set, then you can process it in that charset:

(export LC_ALL=ru_RU.iso88595
 iconv -f utf-8 |
 tr -cs '[:alpha:][:space:]' ' ' |
 iconv -t utf-8) < Russian-file.utf8

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

1

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

3

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

9

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

|
show 1 more comment

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f228558%2fhow-to-make-tr-aware-of-non-asciiunicode-characters%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

That's a known (1, 2, 3, 4, 5, 6) limitation of the GNU implementation of tr.

It's not as much that it doesn't support foreign, non-English or non-ASCII characters, but that it doesn't support multi-byte characters.

GNU's got a plan (see also) to fix that and work is under way but not there yet.

FreeBSD or Solaris tr don't have the problem.

In the mean time, for most use cases of tr, you can use GNU sed or GNU awk which do support multi-byte characters.

For instance, your:

tr -cs '[[:alpha:][:space:]]' ' '

could be written:

gsed -E 's/( |[^[:space:][:alpha:]])+/ /'

or:

gawk -v RS='( |[^[:space:][:alpha:]])+' 'printf "%s", sep $0; sep=" "'

To convert between lower and upper case (tr '[:upper:]' '[:lower:]'):

gsed 's/[[:upper:]]/l&/g'

(that l is a lowercase L, not the 1 digit).

or:

gawk 'print tolower($0)'

For portability, perl is another alternative:

perl -Mopen=locale -pe 's/([^[:space:][:alpha:]]| )+/ /g'
perl -Mopen=locale -pe '$_=lc$_'

If you know the data can be represented in a single-byte character set, then you can process it in that charset:

(export LC_ALL=ru_RU.iso88595
 iconv -f utf-8 |
 tr -cs '[:alpha:][:space:]' ' ' |
 iconv -t utf-8) < Russian-file.utf8

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

1

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

3

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

9

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

|
show 1 more comment

That's a known (1, 2, 3, 4, 5, 6) limitation of the GNU implementation of tr.

It's not as much that it doesn't support foreign, non-English or non-ASCII characters, but that it doesn't support multi-byte characters.

GNU's got a plan (see also) to fix that and work is under way but not there yet.

FreeBSD or Solaris tr don't have the problem.

In the mean time, for most use cases of tr, you can use GNU sed or GNU awk which do support multi-byte characters.

For instance, your:

tr -cs '[[:alpha:][:space:]]' ' '

could be written:

gsed -E 's/( |[^[:space:][:alpha:]])+/ /'

or:

gawk -v RS='( |[^[:space:][:alpha:]])+' 'printf "%s", sep $0; sep=" "'

To convert between lower and upper case (tr '[:upper:]' '[:lower:]'):

gsed 's/[[:upper:]]/l&/g'

(that l is a lowercase L, not the 1 digit).

or:

gawk 'print tolower($0)'

For portability, perl is another alternative:

perl -Mopen=locale -pe 's/([^[:space:][:alpha:]]| )+/ /g'
perl -Mopen=locale -pe '$_=lc$_'

If you know the data can be represented in a single-byte character set, then you can process it in that charset:

(export LC_ALL=ru_RU.iso88595
 iconv -f utf-8 |
 tr -cs '[:alpha:][:space:]' ' ' |
 iconv -t utf-8) < Russian-file.utf8

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

1

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

3

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

9

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

|
show 1 more comment

That's a known (1, 2, 3, 4, 5, 6) limitation of the GNU implementation of tr.

It's not as much that it doesn't support foreign, non-English or non-ASCII characters, but that it doesn't support multi-byte characters.

GNU's got a plan (see also) to fix that and work is under way but not there yet.

FreeBSD or Solaris tr don't have the problem.

In the mean time, for most use cases of tr, you can use GNU sed or GNU awk which do support multi-byte characters.

For instance, your:

tr -cs '[[:alpha:][:space:]]' ' '

could be written:

gsed -E 's/( |[^[:space:][:alpha:]])+/ /'

or:

gawk -v RS='( |[^[:space:][:alpha:]])+' 'printf "%s", sep $0; sep=" "'

To convert between lower and upper case (tr '[:upper:]' '[:lower:]'):

gsed 's/[[:upper:]]/l&/g'

(that l is a lowercase L, not the 1 digit).

or:

gawk 'print tolower($0)'

For portability, perl is another alternative:

perl -Mopen=locale -pe 's/([^[:space:][:alpha:]]| )+/ /g'
perl -Mopen=locale -pe '$_=lc$_'

If you know the data can be represented in a single-byte character set, then you can process it in that charset:

(export LC_ALL=ru_RU.iso88595
 iconv -f utf-8 |
 tr -cs '[:alpha:][:space:]' ' ' |
 iconv -t utf-8) < Russian-file.utf8

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

That's a known (1, 2, 3, 4, 5, 6) limitation of the GNU implementation of tr.

It's not as much that it doesn't support foreign, non-English or non-ASCII characters, but that it doesn't support multi-byte characters.

GNU's got a plan (see also) to fix that and work is under way but not there yet.

FreeBSD or Solaris tr don't have the problem.

In the mean time, for most use cases of tr, you can use GNU sed or GNU awk which do support multi-byte characters.

For instance, your:

tr -cs '[[:alpha:][:space:]]' ' '

could be written:

gsed -E 's/( |[^[:space:][:alpha:]])+/ /'

or:

gawk -v RS='( |[^[:space:][:alpha:]])+' 'printf "%s", sep $0; sep=" "'

To convert between lower and upper case (tr '[:upper:]' '[:lower:]'):

gsed 's/[[:upper:]]/l&/g'

(that l is a lowercase L, not the 1 digit).

or:

gawk 'print tolower($0)'

For portability, perl is another alternative:

perl -Mopen=locale -pe 's/([^[:space:][:alpha:]]| )+/ /g'
perl -Mopen=locale -pe '$_=lc$_'

If you know the data can be represented in a single-byte character set, then you can process it in that charset:

(export LC_ALL=ru_RU.iso88595
 iconv -f utf-8 |
 tr -cs '[:alpha:][:space:]' ' ' |
 iconv -t utf-8) < Russian-file.utf8

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

edited 10 hours ago

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

answered Sep 9 '15 at 13:47

Stéphane Chazelas

315k57597955

1

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

3

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

9

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

|
show 1 more comment

1

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

3

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

9

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

I've accepted your question because of information about tr. I've solved the problem, and removed question about how to solve it(so people looking for tr will find only information about tr, not some arbitrary problem). If you could please remove solution too, since it's no longer needed, I'd be thankful.

– MatthewRock
Sep 9 '15 at 14:41

@MatthewRock I've kept it but reworded it and made more generic as giving a word around would be useful to people with the same problem.

– Stéphane Chazelas
Sep 9 '15 at 15:22

Where do you get an idea that Cyrillic is (customarily) encoded in ISO 8859-5? Did you ever see a Russian text in anything but Unicode?

– Incnis Mrsi
Sep 9 '15 at 16:35

@IncnisMrsi, all that matters here is that ISO 8859-5 is one of those singe-byte charsets that has those Cyrillic characters. Whether it's in widespread use or not is irrelevant here. If you have a locale with KOI-R or window-1251 charset, by all means, use it instead.

– Stéphane Chazelas
Sep 9 '15 at 16:43

@IncnisMrsi Russian on the web is almost always encoded in UTF-8 (or occasionally in Windows-1251), but only because we’ve felt the pain of many single-byte encodings early on. Here’s an ancient (circa 1998) web page with a (non-functional) encoding switcher: sch57.ru/collect.

– Alex Shpilkin
Apr 18 '18 at 21:06

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

-linux, text-processing, tr, unicode

搜尋此網誌

Ttyjfyk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1