Unicodedata – Unicode Database in Python:

We can define Unicode Character Database (UCD) using Unicode Standard Annex #44 which defines character properties of all Unicode characters. This Python module gives access to UCD and uses the same names and symbols as is define through the UCD for Data Science.

Functions define modules:
  • Unicodedata.lookup(name) – It is a function that looks up for the character through name. In case the character with name is found then the character is return; otherwise a Keyerror is raise.Unicodedata - Unicode Database in Python for Data Science - PST Analyst
  • Unicoddata.name(chr[, default]) – It is a function that returns the name assign to the character as a string. In case no name is define, default is return through function. ValueError will be raise in case no name is there.
  • Unicoddata.decimal(chr[, default]) – It is a function that returns the decimal value which was assign to the character as integer. In case no value is define, default is return. In other cases, ValueError is raise if no value is there.Unicodedata - Unicode Database in Python for Data Science - PST Analyst
  • Unicoddata.digit(chr[, default]) – It is a function which returns the digit value assign to the character as integer. In case no value is there a default is return. In case no value is there ValueError is raise.
  • Unicodedata.numeric(chr[, default]) – It is a function which returns the numeric value assign to the character as integer. . In case no value is there a default is return. In case no value is there ValueError is raise.Unicodedata - Unicode Database in Python for Data Science - PST Analyst
  • Unicodedata.category(chr) – It is a function which returns the general category assign to the character as string. Example: It will return ‘u’ for uppercase and ‘L’ for love.
  • Unicodedata.bidirectional(chr) – It is a function which returns bidirectional class assigned to the character as string. In case no value is there an empty string is return.
  • Unicodedata.normalize(form, unistr) –

    It is a function which returns the normal form for the Unicode string unistr. The valid value forms are ‘NFC’, ‘NFKC’, and ‘NFKD’.

To learn more about unicode in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.