Regular Expression in Python – Set 1:

The Regular Expression(RE) module specifies set of strings that matches it in python and that we use for data science.

In order to understand the RE analogy, MetaCharacters are very useful and important.

There are a total of 14 metacharacters:

Regular Expression in Python for Data Science – Set 1 - PST Analytics

  • Function Compile():

    The regular expressions are compile into pattern objects. This has methods for various operations like searching pattern matches or performing string substitutions.

The metacharacter backslash ‘\’ is very important as it signals various sequences. In order to use the backslash without any special meaning, we use ‘\\’.

Set class [\s,.] matches any whitespace character, ‘,’, or,’.’.

Regular Expression in Python for Data Science – Set 1 - PST Analytics

  • Function split():

    In order to split string by occurrence of a character or a pattern, on finding this pattern, the characters which are remaining are return as a part of the resulting list.

Syntax:

  1. split(pattern, string, maxsplit=0, flags=0)

Here the first parameter ‘pattern’ denotes regular expression. The ‘string’ parameter gives the string in which the pattern will be search and in which splitting occurs. In case the ‘maxsplit’ is not given it is set as zero and in case the value is given, then at most that many splits will occur. The ‘flag’ parameter is very useful, and it helps to shorten the code, but it is not necessary.

  • Function sub()

Syntax:

The function ‘sub’ stands for SubString. The parameter string searches for a regular expression pattern, and when the substring pattern is found, repl replaces it. The count parameter checks and maintains the number of times this occurs.

Regular Expression in Python for Data Science – Set 1 - PST Analytics

  • Function subn():

Syntax:

The functions sub() and subn() is similar but it is different in providing output. It will return a tuple with a count of total of replacement and a new string rather than just the string.

Regular Expression in Python for Data Science – Set 1 - PST Analytics

  • Function escape():

Syntax:

re.escape(string)

It returns a string with all the non-alphanumeric backslashed; this is very much useful in order to match arbitrary literal string having regular expression metacharacters in it.

To learn more about regular expression in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.