Why does my regex work on RegexPlanet and regex101 but not in my code?

advertisements

Given the string #100=SAMPLE('Test','Test', I want to extract 100 and Test. I created the regular expression ^#(\d+)=SAMPLE\('([\w-]+)'.* for this purpose.

I tested the regex on RegexPlanet and regex101. Both tools give me the expected results, but when I try to use it in my code I don't get matches. I used the following snippet for testing the regex:

final String line = "#100=SAMPLE('Test','Test',";
final Pattern pattern = Pattern.compile("^#(\\d+)=SAMPLE\\('([\\w-]+)'.*");
final Matcher matcher = pattern.matcher(line);

System.out.println(matcher.matches());
System.out.println(matcher.find());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));

The output is

true
false
Exception in thread "main" java.lang.IllegalStateException: No match found
    at java.util.regex.Matcher.group(Matcher.java:536)
    at java.util.regex.Matcher.group(Matcher.java:496)
    at Test.main(Test.java:15)

I used Java 8 for compiling and running the program. Why does the regex work with the online tools but not in my program?


A Matcher object allows you to query it several times, so that you can find the expression, get the groups, find the expression again, get the groups, and so on.

This means that it keeps state after each call - both for the groups that resulted from a successful match, and the position where to continue searching.

When you run two matching/finding methods consecutively, what you have is:

  1. matches() - Matches at the beginning of the string, sets the groups.
  2. find() - tries to find the next occurrence of the pattern after the previously matched/found occurrence, sets the groups.

But of course, in your case, the text doesn't contain two occurrences of the pattern, only one. So although matches() was successful and set proper groups, the find() then fails to find another match, and the groups are invalid (the groups are not accessible after a failed match/find).

And that's why you get the error message.

Now, if you're just playing around with this, to see the difference between matches and find, then there is nothing wrong with having both of them in the program. But you need to use reset() between them, which will cause find() not to try to continue from where matches() stopped (which will always fail if matches() succeeded). Instead, it will start scanning from the start, as if you had a fresh Matcher. And it will succeed and give you groups.

But as other answers here hinted, if you're not just trying to compare the results of matches and find, but just wanted to match your pattern and get the results, then you should choose only one of them.

  • matches() will try to match the entire string. For this reason, if it succeeds, running find() after it will never succeed - because it starts searching at the end of the string. If you use matches(), you don't need anchors like ^ and $ at the beginning and the end of your pattern.
  • find() will try to match anywhere in the string. It will start scanning from the left, but doesn't require that the actual match start there. It is also possible to use it more than once.
  • lookingAt() will try to match at the beginning of the string, but will not necessarily match the complete string. It's like having an ^ anchor at the beginning of your pattern.

So you choose which one of these is appropriate for you, and use it, and then you can use the groups. Always test that the match succeeded before attempting to use the groups!