Minimum and maximum limit for e-mail with regex but error

advertisements

Is there any mistake with the following regex:

^(?=.{1,32}$)\w+([-+.]\w+)*@\w+([-+.]\w+)*.\w+([-+.]\w+)*$

I use this regex pattern to check email format and length, but execute with the error message below:

Invalid use of repetition operators such as using '*' as the first character.

I try to use \ in the front of my regex to escape ?, the regcomp() can compile success but the result is wrong.

Here is my check strings:

[email protected] --> failed, wrong result

test:[email protected] --> failed, right result

Environment

Operating System :

Linux debian8 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) i686 GNU/Linux

GCC

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/i586-linux-gnu/4.9/lto-wrapper Target: i586-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-i386/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-i386 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-i386 --with-arch-directory=i386 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-targets=all --enable-multiarch --with-arch-32=i586 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=i586-linux-gnu --host=i586-linux-gnu --target=i586-linux-gnu Thread model: posix gcc version 4.9.2 (Debian 4.9.2-10)


Based on your tags and the hint that you are using regcomp, I'm assuming that you are using the standard Posix library regcomp and regexec functions to do the regular expression matching.

The regular expression syntax used by regcomp is fully documented in man 7 regex (or in Posix itself, which I find slightly more readable). There are many regex libraries in other languages which implement a larger variety of regex syntaxes, but you're not using those other languages. So if the syntax you're using is not in those documents, it won't work. That includes:

  • Forward lookahead assertions like (?=.{1,32}$). In fact, there are no lookaround assertions, nor any other syntax starting (?.
  • The use of \w to mean alphanumeric characters. If you use Extended Regular Expressions (by providing REG_EXTENDED as the third argument to regcomp -- which you should always do), then the only thing that \ does is prevent the following regular expression operator character from having special meaning. However, the Gnu implementation does offer some extension. It handles back-references, even though Posix only defines them in basic regular expressions. Some versions do handle \w and friends, but that might not work on other Posix regex implementations, such as Mac OS X.

You can use Posix character classes to get the effect of \w, \W, \s, etc. For example, word characters (\w) can be written as the character class [_[:alnum:]], while non-space characters (\S) can be written as [^[:space:]]. Using this syntax is fully portable.

There is no workaround for lookahead assertions, except for creating a separate regular expression and matching it as well, starting at the correct point. But if you just want to check the length of the string, you don't need anything complex. Just check the length of the string:

size_t len = strnlen(str, maxlen + 1);
if (len >= minlen && len <= maxlen &&
    regexec(&preg, str, 0, 0, 0)) {
  /* The string matched, and its length is between minlen and maxlen */
} else {
  /* Not a match, or too short or too long */
}

(I used strnlen, which is in Posix 2008; it's implemented in glibc. The advantage is that if you only need to know that the string is not too long, strnlen avoids looking at too many characters. That is, if I'm going to reject a string with more than 32 characters, and the string I'm looking at is a megabyte, it would be silly to compute strlen(str), which needs to look at every character in the string. strnlen(str, 33) will only look at the first 33 characters, and if the result is 33 I know the string was too long.)

If I understand what you're trying to check correctly, you could use the following slightly simpler regular expression:

[_[:alnum:]]([-+.]?[_[:alnum:]])*@[_[:alnum:]]([-+.]?[_[:alnum:]])*

which insists that -, + and ., if present, must be preceded and followed by word characters (so they can't be at the beginning or end, and you can't have two of them in a row.)