[KLUG Programming] sorting, uniq'ing, and grepping

Erik Gillespie programming@kalamazoolinux.org
Mon, 14 Jul 2003 18:42:45 -0400 (EDT)


Hey Tony, here's my opinion for what it's worth:

I'd go with C because I know that better than Perl but either would be a 
good cross-platform solution.  The downfall of Perl is that you have to 
make sure that Perl is installed.  With C you just need the compiled code.

Just like you were basically doing with your scripts I would read both 
files into a single array, call qsort() (#include <stdlib.h>, it's POSIX 
compliant and every C implementation I've seen has it), then make a single 
pass through the array and flag duplicates.

You could hand-code a very elegant algorithm to do everything at once by 
hand but the sorting of the data is the clincher.  Unless your grandpa was 
Dijkstra you'll be stuck with a complexity of O(n log n) regardless of
whether you write your own l33t algorithm or you do things one step at a
time.

Erik

-- 
Word of the Day:
  febrile: feverish.

Days until Matrix Revolutions is released: 114