Removing portions of files that match a multiple line regular expression can be tricky, unless you’re using perl. Let’s take an example file:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
header 1 ======== some data header 2 ======== header 3 ======== some data header 4 ======== some data some more data even more data header 5 ======== header 6 ======== some data |
We would like to remove all headers that are not followed by any data. Not much of an example but it’ll demonstrate the technique nethertheless! The easiest way to do this with perl is to read the entire file into a single scalar variable, and then just parse that substituting our multiple line regular expression with nothing. Observe:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
#!/usr/bin/perl open( FH, "/path/to/the/file" ) || die "Couldn't open file...\n"; while ( <FH> ) { $data .= $_; } $data =~ s/[^\n]*\n=+\n\n//g; print $data; close( FH ); |
Obviously you’ll need to redirect the output of this file, or just write $data out to a new file within the perl script itself.
Running the script on the example data gives the expected output:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
$ ./parse_it.pl header 1 ======== some data header 3 ======== some data header 4 ======== some data some more data even more data header 6 ======== some data |
Using this method, you can easily modify the regular expression in the perl script to suit your needs.