sed or awk deleting the lines between the pattern matches, excluding the line of the second token

advertisements

I have a sed command which will successfully print lines matching two patterns:

 sed -n '/PAGE 2/,/\x0c/p' filename.txt

What I haven't figured out, is that I want it to print all the lines from the first token, up until the second token. The \x0c token is a record separator on a big flat file, and I need to keep THAT line intact.

In between the two tokens, the data is completely variable, and I do not have a reliable anchor to work with.

[CLARIFICATION] Right now it prints all the lines between /PAGE 2/ and /\x0c/ inclusive. I want it to print /PAGE 2/ up until the next /\x0c/ in the record.

[test data] The /x0c will be at the start of the first line, and the beginning of the last line of this record.

I need to delete the first line of the record, through the line just before the beginning of the next record.

^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 2
TERM: 200610               Student Billing Statement                     SUMDATA
99999

Foo bar                                                              R0000000
999 Geese Rural Drive                                           DUE: 15-OCT-2012
Columbus, NE 90210

--------------------------------------------------------------------------------
       Balance equal to or greater than $5000.00    $200.00
       Billing inquiries may be directed to 444/555-1212 or by
       email to [email protected]  Financial Aid inquiries should
       be directed to 444/555-1212 or [email protected]
^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 1

[expected result]

 ^L20-SEP-2006 01:54:08 PM         Foobars College                          PAGE 1

There will be multiple such records in the file. I can rely only on the /PAGE 2/ token, and the /x0c/ token.

[solution]:

Following Choruba's lead, I edited his command to:

sed '/PAGE [2-9]/,/\x0c/{/\x0c$/!d}'

The rule in the curly brackets was applying itself to any line containing a ^L and was selectively ignoring them.


EDIT: New answer for the new question the OP asked (how to delete records:

Given a file with control-Ls delimiting records and a desire to print specific lines from specific records, just set your record separator to control-L and your field separator to "\n" and print whatever you like. For example, to get the output the OP says he wants from the input he posted would just be:

awk -v RS='^L' -F'\n' 'NR==3{print $1}' file

^L shown here represents a literal control-L, and it's the 3rd record because there's an empty record before te first control-L in the input file.

#

This is the answer to the original question the OP asked:

You want this:

awk '/PAGE 2/ {f=1} /\x0c/{f=0} f' file

but also try these to see the difference (for the future):

awk '/PAGE 2/ {f=1} f; /\x0c/{f=0}' file
awk 'f; /PAGE 2/ {f=1} /\x0c/{f=0}' file

And finally, FYI, The following idioms describe how to select a range of records given a specific pattern to match:

a) Print all records from some pattern:

awk '/pattern/{f=1}f' file

b) Print all records after some pattern:

awk 'f;/pattern/{f=1}' file

c) Print the Nth record after some pattern:

awk 'c&&!--c;/pattern/{c=N}' file

d) Print every record except the Nth record after some pattern:

awk 'c&&!--c{next}/pattern/{c=N}1' file

e) Print the N records after some pattern:

awk 'c&&c--;/pattern/{c=N}' file

f) Print every record except the N records after some pattern:

awk 'c&&c--{next}/pattern/{c=N}1' file

g) Print the N records from some pattern:

awk '/pattern/{c=N}c&&c--' file

I changed the variable name from "f" for "found" to "c" for "count" where appropriate as that's more expressive of what the variable actually IS.