How to use Flags in Python Regex

Many of the Regex Functions in Python such as sub(), findall() and match() can take various “flags” which can trigger special behaviors for special conditions. For example, one very popular flag that is often used in Python Regex is the multiline flag. This changes the default behavior of Regex when it is detecting patterns in strings with multiple lines of text.

We will be discussing briefly discussing all of the Flags available to us in Python Regex, along with a few code examples to demonstrate their use.

List of all Regex Flags

Here is a list of all the available Regex flags, along with a short description about their usage.

Short Form	Full Form	Description
`re.A`	`re.ASCII`	Ignores Unicode characters, and matches only ASCII.
`re.I`	`re.IGNORECASE`	Ignore differences in uppercase and lowercase while matching patterns.
`re.M`	`re.MULTILINE`	Used with the metacharacters `^` (caret) and `$` (dollar). When this flag is used, `^` will match the pattern at beginning of the string and each newline’s beginning (`\n`). Similarly, `$` will match the pattern at the end of the string and the end of each new line (`\n`).
`re.S`	`re.DOTALL`	By default, the DOT special character (`.`) will match anything except a newline. This flag allows it to match all characters as well as new lines. This flag is often used when dealing with multiline text.
`re.X`	`re.VERBOSE`	Enables the use of comments inside regular expressions.
`re.L`	`re.LOCALE`	Perform case-insensitive matching dependent on the current locale. Use only with bytes patterns.

Regex Flags – Examples

Here are a few code examples involving the use of the some of the above-mentioned flags.

MULTILINE Flag

Here is an Regex example where we are trying to find sentences that begin with the word “This”. Our example first shows us the output that we get without the MULTILINE flag, and then the output with the MULTILINE flag included.

mystring="""This is some random text.
Hello World.
This is Goodbye.
"""

print(re.findall("^This.*", mystring))
print(re.findall("^This.*", mystring, re.MULTILINE))

['This is some random text.']
['This is some random text.', 'This is Goodbye.']

The difference here is that without MULTILINE the entire string is regarded as a single sentence. Hence the pattern ^ is applied only to start of the string. But with MULTILINE the pattern ^ is applied to all three lines. (A line is here is not determined by a full stop, rather the newline character)

You can find a detailed tutorial on Multiline Regex by following this link to one of our tutorials.

IGNORECASE Flag

As mentioned earlier in the table, this Flag ignores uppercasing and lowercasing while matching patterns. This can actually help us shorten our regular expressions because we don’t have to account for both cases. (Of course, this is only in situations where case does not matter)

import re

text = "John is running. john tripped over a rock."
print(re.findall("john", text, flags = re.IGNORECASE))

['John', 'john']

`DOTALL` Flag

Here we have the DOTALL flag.

By default, the . metacharacter matches every character except the newline character (\n). The DOTALL operator changes its behavior to include the newline character as well. Lets take a look at an example.

text = "Hello\n My name is John\n How are you?"
print(re.findall(".+", text))

The normal output is the following:

['Hello', ' My name is John', ' How are you?']

Each line is counted separately, because the . stopped at the newline character. With the DOTALL flag however, we get:

['Hello\n My name is John\n How are you?']

This kind of behavior can be useful, if you have a sentence split across multiple lines. If you want to match that sentence completely, you need to use the DOTALL operator, otherwise you will get multiple matches (depending on the number of lines)

Here is another example showing this behavior. (The question mark is there for non-greedy searches, otherwise it go for the largest possible pattern)

text = "This sentence was\n split in two.\n This one wasn't."
print(re.findall(".+?\.", text, flags=re.DOTALL))

'This sentence was\n split in two.', "\n This one wasn't."]

We also escaped the . character in our regex pattern, because we were looking for sentences ending with a full stop.

VERBOSE Flag

This flag is useful when you have very long regular expressions that are difficult to understand. With this flag you can break up your regex into smaller portions and comment each part to make it more readable and understandable.

In one of our Regex tutorials, we created the following code for Email Validation.

[a-zA-Z0-9_\.-]+@[a-zA-Z0-9_\.-]+\.[a-zA-Z0-9_]+

Such a regex pattern is difficult to understand, so it is a good place to use the Verbose flag. Let us see how we can do so.

import re

regex_email = re.compile(r"""
            [a-zA-Z0-9_\.-]+         # Email Name
            @                        # @ symbol
            [a-zA-Z0-9_\.-]+         # Domain name
            \.                       # dot
            [a-zA-Z_]+               # Top level Domain  
            """, flags= re.VERBOSE )

As you can see, this will correctly compile the regex pattern, while ignoring the comments.

This marks the end of the Flags in Python Regex Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section below.