I have a regex that I am using to validate email addresses. I like this regex because it is fairly relax and has proven to work quite well.
Here is the regex:
Ok great, basically all reasonably valid email addresses that you can throw at it will validate. I know that maybe even some invalid ones will fall through but that is ok for my specific use-case.
Now it happens to be the case that [email protected] does not validate. And guess what x.com is actually a domain name that exists (owned by paypall).
Looking at the regex part that validates the domain name:
It looks like this should be able to parse the x.com domain name, but it doesn't. The culprit is the part that checks that a domain name can not begin with a dot (such as [email protected])
If I remove the [^.] part of my regex the domain x.com validates but now the regex allows domains names beginning with a dot, such as .test.com; this is a little bit too relax for me ;-)
So my question is how can the negative character list part affect my single character check, basically the way I am reading the regex is: "make sure this string does not start with a dot", but apparantly it does more.
Any help would be appreciated.
As Luis suggested, you can use
[^\.][\w\.\-]* to match the domtain name, however it will now also match addresses like
[email protected] and
[email protected]@.com. You might want to make sure that there is only one period at a time, and that the first character after the @ is more restricted than just not being a period.
Match the domain name and the period (and subdomains and their periods) using:
So your pattern would be: