External Sorting

Nicolae Vartolomei · 2023/08

Recently, I needed to implement an external sorting algorithm that would work well on “modern” hardware (lots of CPU cores, fast NVMe devices). I did come up with an interesting variation of External Merge Sort.

I’ve made a visualization for the 3 stages of the algorithm:

  1. Run sorting:
    1. Split the input into chunks (runs) that fit in memory.
    2. Sort the run in memory.
    3. Sample values as pivot candidates.
  2. Parallel multi-way selection:
    1. Select global pivots from the pivot candidates chosen in the first stage.
    2. Using binary search, find the lower bound for each global pivot in each sorted run.
  3. Parallel merging:
    1. Using the global pivot offsets in the sorted runs, compute the length and offset.
    2. Run parallel merge for each global pivot.

This algorithm efficiently utilizes all available CPUs and IO channels throughout each stage and is asymptotically optimal.



  • A. Aggarwal and J. S. Vitter, “The input/output complexity of sorting and related problems,” Commun. ACM 31, 9 (Sept. 1988)
  • J. S. Vitter, “Algorithms and Data Structures for External Memory, Foundation and Trends in Theoretical Computer Science,” vol 2, no 4, 2006
  • G. Graefe, “Implementing sorting in database systems,” ACM Comput. Surv. 38, 3 (2006)
  • L. Arge, M. T. Goodrich, M. Nelson, and N. Sitchinava, “Fundamental parallel algorithms for private-cache chip multiprocessors,” SPAA 2008
  • W. Chen,Y. Liu, Z. Chen, F. Liu and N. Xiao, “External Sorting Algorithm: State-of-the-Art and Future Directions,” IOP Conf. Mater. Sci. Eng. 2020
  • M. Rahn, P. Sanders and J. Singler, “Scalable distributed-memory external sorting,” 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA, USA, 2010
  • https://en.wikipedia.org/wiki/External_sorting
  • https://en.wikipedia.org/wiki/Merge_sort#Parallel_multiway_merge_sort
  • https://en.wikipedia.org/wiki/Bucket_sort#Shuffle_sort