ctnx.misc module¶
- ctnx.misc.normalize(text: str) str[source]¶
Converts combining Unicode characters to theirs equivalent precomposed characters.
- ctnx.misc.normalize_confusables(text: str) str[source]¶
Converts a confusable text to a potentially normal text.
Replace similar-looking characters and homoglyphs with theirs equivalent Vietnamese characters. Small cap letters will be converted to lowercase.
- ctnx.misc.remove_diacritics(text: str) str[source]¶
Remove all diacritics from text.
Replace characters with diacritics with theirs equivalent ASCII characters.
- ctnx.misc.remove_tones(text: str) str[source]¶
Remove tone marks from text.
Replace characters with tone marks with theirs equivalent non-toned characters. Other diacritics will be kept.
- ctnx.misc.sep_tone_from_char(char: str)[source]¶
Extract the tone mark from a character.
The returned tone is denoted as the following: ‘’: unmarked (ngang) ‘/’: acute accent (sắc) ‘': grave accent (huyền) ‘?’: hook above (hỏi) ‘~’: tilde (ngã) ‘.’: dot below (nặng)