Extract several different lines corresponding to a VBScript text file

advertisements

I need to extract orders from a PDF file, I have converted the PDF into text but I am having trouble understanding Expressions could someone give me a small example of how to build an expression that would look for a block of text held on different lines.

Sequence:

ORDER NUMBER :   SO773175            Ship Date: 23-Nov-15

      Style Desc : CURTAINS CR 46X54
      Linecode : J855566
      Qty              36

It doesn't matter if I just save the values after the : or the whole block, the block of text is repeated for each individual order so could be 5 or could be 50 orders in one file, but these blocks are only repeated once in the entire file.


I suspect that you're having problems with the multiple lines, \n is the regex newline character, so, if you are using a regex engine that does perl like regular expressions (most of them), then this should work.

ORDER NUMBER :\s+[^\s]+\s+Ship Date:\s+[^\s]+\n\n\s+Style Desc : .+\n\s+Linecode : .+\n\s+Qty\s+.+

I would recommend https://regex101.com/, or any of the other regex testing sites out there as a good place to test out creating regex expressions.