Cutting down on the use of pipes

2007-04-18 14:38:00

One of the obvious down sides to using a scripting language like ksh as opposed to a "real" programming language like Perl or PHP (or C for that matter) is that, for each command that you string together, you're forking off a new process.

This isn't much of a problem when your script isn't too convoluted or when your dataset isn't too large. However, when you start processing 40-50MB log files with multiple FOR loops containing a few IF statements for each line, then you start running into performance issues.

And as I'm running into just that I'm trying to find ways to cut down on the forking, which means getting rid of as many IFs and pipes as possible. Here's a few examples of what has worked for me so far...

Instead of running:

[ expr1 ] && command1

[ expr2 ] && command1

Run:

[ (expr1) && (expr2) ] && command1

Why? Because if test works the way I expect it to, it'll die if the first expression is untrue, meaning that it won't even try the second expression. If you have multiple commands that complement eachother then you ought to be able to fit them into a set of parentheses after test cutting down on more forks.

Instead of running:

if [ `echo $STRING | grep $QUERY | wc -l` -gt 0 ]; then

Run:

if [ ! -z `echo $STRING | grep $QUERY` ]; then

More ideas to follow soon. Maybe I ought to start learning a "real" programming language? :D

EDIT:

OMG! I can't believe that I've just learnt this now, after eight years in the field! When using the Korn shell use [[ expr ]] for your tests as opposed to [ expr ].

Why? Because the [ expr ] is a throw-back to Bourne shell compatibility that makes use of the external test binary, as opposed to the built-in test function. This should speed up things considerably!


kilala.nl tags: , , , ,

View or add comments (curr. 0)