Python Email validation with Regex

Regex (Regular Expressions) are used to detect patterns within strings. If you think about it, an Email is actually just a (slightly complex) pattern. Following this logic, it is natural for us to use Regex as a means of Email validation in Python.

In this tutorial we will come up with a basic regex implementation for detecting emails. If our regex pattern returns a match, the email was valid, else it was invalid.


Email validation with Regex

We will start off by making a very unoptimized regex pattern, and slowly work on making it simpler using various shortcuts and special characters which regex offers us. Lets begin.

So here is our test email, “[email protected]“. There are essentially 5 components to every email.

  1. abc – The name
  2. @ – this symbol
  3. gmail – Name of email provider
  4. . – The dot character
  5. com – The type of email (edu, net, etc.)

We need to create a regex pattern that accommodates for and validates all 5 components. We will do this by creating regex patterns for each component individually, then combining them at the end.


Building our Regex Pattern

For the very first part, we can use the following regex pattern.

[a-zA-Z0-9_\.-]+

What this means, is that there must be “one or more” characters, where an acceptable character is anything from a to z (lowercase), A – Z (uppercase), 0 – 9 (digits), _ (underscore) or a dot ( . ). We had to use an escape character for the dot, otherwise regex would have misunderstood us, and assumed we meant the metacharacter dot.

Basically everything in the [ ] square brackets denotes a possible character. Including the + after the closing bracket, means “one or more of the characters in these square brackets”.

The second part is pretty easy. We just need to append a @ character onto our regex pattern. So our resulting regex pattern is now this:

[a-zA-Z0-9_\.-]+@

The third part needs to accommodate the email providers name. Just like the first part, this can be anything as long as it as a combination of words, numbers and the three allowed characters (dot, dash and underscore). Hence, we will just duplicate the first parts regex pattern.

Our regex pattern is now the following:

[a-zA-Z0-9_\.-]+@[a-zA-Z0-9_\.-]+

The fourth component is the dot. Once again, we will just append this to our regex pattern. Just remember, we need to escape it otherwise it will be counted as a metacharacter.

[a-zA-Z0-9_\.-]+@[a-zA-Z0-9_\.-]+\.

The fifth part is much like the first and third, but it doesn’t allow the dash character (atleast, I don’t think it does?) We will also remove the dot operator from it. All in all, after adding everything, we end up with the following code:

[a-zA-Z0-9_\.-]+@[a-zA-Z0-9_\.-]+\.[a-zA-Z0-9_]+

It likely has several cases on which it will fail, but it should pass for the vast majority of emails. We can optimize this further, by using \w instead of [a-zA-Z0-9_]. \w is actually a special character in regex, used for detecting words. It automatically accounts for all alpha-numeric characters, and the underscore character.

Using this character, we can shorten our email validation regex, to the following:

[\w\.-]+@[\w\.-]+\.[\w]+

Much better right?


Testing our Regex Validation for Emails

Here is a short script we wrote, where we test 5 different emails. 2 of them are valid, and 3 of them are not.

import re

emails = [ "[email protected]","@gmail.com", "abcgmail.com",
           "abc@gmailcom", "[email protected]" ]

for email in emails: 
    if re.findall(r'[\w\.-]+@[\w\.-]+\.[\w]+', email):
        print("Valid Email")
    else:
        print("Invalid Email")
Valid Email
Invalid Email
Invalid Email
Invalid Email
Valid Email

This marks the end of the Python Email validation with Regex article. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the article content can be asked in the comments section below.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments