In unix, we usually use the 'sort' command to sort input. I cannot find direct info for the computational complexity of unix sort command, but definitely it shows quick slow esp. when the dataset is large.
Here are several feasible solutions for quick sorting I googled:
1. "split & merge" http://arnab.org/blog/quick-and-easy-multicore-sort
split -l5000000 data.tsv '_tmp';
ls -1 _tmp* | while read FILE; do sort $FILE -o $FILE & done;
Followed by:
sort -m _tmp* -o data.tsv.sorted
arnab suggested to split the large file into small pieces and sort them individually, then use 'sort -m' to merge them. I've tried it, works very well, at least the "sort -m" is worthy to try. (Note: -merge is actually sort instead of merge. Here is what 'info sort' says:
`--merge'
Merge the given files by sorting them as a group. Each input file
must always be individually sorted. It always works to sort
instead of merge; merging is provided because it is faster, in the
case where it works.
2. "quicksort" http://www.bashguru.com/2011/01/quicksort-shell-script.html
This is another implementation for quicksort in bash. I've not tested it yet.
3. BSD seems already have lib for other quick sorting algorithms:
No comments:
Post a Comment