I have a form field, that most contain only inline ordered list:
1. This item may be contain characters, symbols or numbers. 2. And this item also...
The following code not working for user input validation (users can input only inline ordered list):
definiton_re = re.compile(r'^(?:\d\.\s(?:.+?))+$')
validate_definiton = RegexValidator(definiton_re, _("Enter a valid 'definition' in format: 1. meaning #1, 2. meaning #2...etc"), 'invalid')
P.S.: Here i'm using RegexValidator class from Django framework to validate form field value.
Nice solution from OP. To push it further, let's do some regex optimization / golfing.
(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)
Here's what's new:
(?:^|\s)
Matches with backtracking between the alternation. Here we use(?<!\S)
instead, to assert that we're not in front of a non-whitespace character.\d{1,2}\.\s
doesn't have to be within a non-capturing group.(.+?)(?=(?:, \d{1,2}\.)|$)
is too bulky. We change this bit to:(
Capturing group(?:
(?!
Negative lookahead: Assert that our position is NOT:,\s\d{1,2}\.
A comma, whitespace character, then a list index.)
,?[^,]*
Here's the interesting optimization:- We match a comma if there is one. Because we knew from our lookahead assertion that this position does not start a new list index. Therefore, we can safely assume that the remaining bit of the non-comma sequences (if there are any) are not related to the next element, hence we roll over them with the
*
quantifier, and there's no backtracking.
- We match a comma if there is one. Because we knew from our lookahead assertion that this position does not start a new list index. Therefore, we can safely assume that the remaining bit of the non-comma sequences (if there are any) are not related to the next element, hence we roll over them with the
- This is a significant improvement over
(.+?)
.
- This is a significant improvement over
)+
Keep repeating the group until the negative lookahead assertion fails.)
You can use that in place of the regex in the other answer, and here's a regex demo!
Though, at first glance, this problem is better solved with re.split()
while parsing:
input = '1. List item #1, 2. List item 2, 3. List item #3.';
lines = re.split('(?:^|, )\d{1,2}\. ', input);
# Gives ['', 'List item #1', 'List item 2', 'List item #3.']
if lines[0] == '':
lines = lines[1:];
# Throws away the first empty element from splitting.
print lines;
Here is an online code demo.
Unfortunately, for the validation you would have to follow the regex matching approach, just compile the regex upstairs:
regex = re.compile(r'(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)')