ctnx.misc module

ctnx.misc.normalize(text: str) str[source]

Converts combining Unicode characters to theirs equivalent precomposed characters.

ctnx.misc.normalize_confusables(text: str) str[source]

Converts a confusable text to a potentially normal text.

Replace similar-looking characters and homoglyphs with theirs equivalent Vietnamese characters. Small cap letters will be converted to lowercase.

ctnx.misc.remove_diacritics(text: str) str[source]

Remove all diacritics from text.

Replace characters with diacritics with theirs equivalent ASCII characters.

ctnx.misc.remove_tones(text: str) str[source]

Remove tone marks from text.

Replace characters with tone marks with theirs equivalent non-toned characters. Other diacritics will be kept.

ctnx.misc.sep_tone_from_char(char: str)[source]

Extract the tone mark from a character.

The returned tone is denoted as the following: ‘’: unmarked (ngang) ‘/’: acute accent (sắc) ‘': grave accent (huyền) ‘?’: hook above (hỏi) ‘~’: tilde (ngã) ‘.’: dot below (nặng)

Parameters:

char (str) – The character from which the tone will be extracted

Returns:

a tuple of the same character without tone mark and its tone

Return type:

tuple

ctnx.misc.separate_tone(text: str, all=False)[source]

Extract the tone mark from text.

The returned tone is denoted as the following: ‘’: unmarked (ngang) ‘/’: acute accent (sắc) ‘': grave accent (huyền) ‘?’: hook above (hỏi) ‘~’: tilde (ngã) ‘.’: dot below (nặng)

Parameters:
  • char (str) – The text from which the tone will be extracted

  • all (bool, default : False) – If set to True, the last tone will be returned instead of the first one

Returns:

a tuple of the text without tone marks and its tone

Return type:

tuple