The Regular Expression(RE) module specifies set of strings that matches it in python and that we use for data science.
In order to understand the RE analogy, MetaCharacters are very useful and important.
There are a total of 14 metacharacters:
The regular expressions are compile into pattern objects. This has methods for various operations like searching pattern matches or performing string substitutions.
The metacharacter backslash ‘\’ is very important as it signals various sequences. In order to use the backslash without any special meaning, we use ‘\\’.
Set class [\s,.] matches any whitespace character, ‘,’, or,’.’.
In order to split string by occurrence of a character or a pattern, on finding this pattern, the characters which are remaining are return as a part of the resulting list.
- split(pattern, string, maxsplit=0, flags=0)
Here the first parameter ‘pattern’ denotes regular expression. The ‘string’ parameter gives the string in which the pattern will be search and in which splitting occurs. In case the ‘maxsplit’ is not given it is set as zero and in case the value is given, then at most that many splits will occur. The ‘flag’ parameter is very useful, and it helps to shorten the code, but it is not necessary.
The function ‘sub’ stands for SubString. The parameter string searches for a regular expression pattern, and when the substring pattern is found, repl replaces it. The count parameter checks and maintains the number of times this occurs.
The functions sub() and subn() is similar but it is different in providing output. It will return a tuple with a count of total of replacement and a new string rather than just the string.
It returns a string with all the non-alphanumeric backslashed; this is very much useful in order to match arbitrary literal string having regular expression metacharacters in it.