Tag Archives: perl

Text Manipulation: Multiple Line Regex Deletions Using Perl

Removing portions of files that match a multiple line regular expression can be tricky, unless you’re using perl. Let’s take an example file:

We would like to remove all headers that are not followed by any data. Not much of an example but it’ll demonstrate the technique nethertheless! The easiest way to do this with perl is to read the entire file into a single scalar variable, and then just parse that substituting our multiple line regular expression with nothing. Observe:

Obviously you’ll need to redirect the output of this file, or just write $data out to a new file within the perl script itself.

Running the script on the example data gives the expected output:

Using this method, you can easily modify the regular expression in the perl script to suit your needs.

Text Manipulation: How to Delete the First Line of Text in a Large File

Editing a very large file can be a resource- (and time-) consuming nightmare. Having a requirement to delete the first line in such a file in-place whilst avoiding opening the file up in SomeEditor(TM) can be done in various ways, with various resource overheads.

Let me introduce you to the three methods we’ll be trying. The first uses GNU sed and the -i (inplace) option to edit the file in-place.

The second methods uses perl to get the job done.

The final example uses printf (so use a shell that supports it) and ex (command-line vi).

Let’s use these methods and time them…

So the moral of this tip? Use perl for performing edits on extremely large files!

Text Manipulation: How to Globally Delete Lines Matching a Certain Pattern

In order to delete complete lines that match a certain pattern, you can use various tools. I find that the easiest tools to use in this situation are perl or our humble elderly friend ed.

For example, say we want to delete all lines containing the string “delete_me”, appearing at the very beginning of the line. The following two commands will have the desired effect of deleting the required lines.

Using perl:

Using ed:

Of course, there are many more ways of achieving our goal but these are my two personal favourites. It goes without saying that you can modify the search pattern to meet your exact requirements.