Algorithm catalog

Sorting algorithms / interactive lesson

Bitonic Sort

Learn bitonic sort through an animated sorting network and a simultaneous hypercube view. Reveal comparator layers one by one and watch how neighbor compare-exchange operations create and merge bitonic sequences.

University Guided lesson
Snapshot 1/64

Sorting network

Bitonic construction + final merge

Values are printed at the input and after every completed layer, like the textbook network diagram.

depth 6 = log₂(8) · (log₂(8) + 1) / 2
wiresinputmake bitonic BM[2]1 step1 stepmake bitonic BM[4]2 steps2 stepssort BM[8]3 steps3 steps000001010011100101110111102059381214L1j=1, b0L2j=2, b1L3j=1, b0L4j=4, b2L5j=2, b1L6j=1, b0
Decision timelineproblem

About the algorithm

Bitonic sort is a classic parallel sorting-network algorithm. It repeatedly builds bitonic sequences and then merges them through a fixed pattern of compare-exchange layers, which makes it especially suitable for SIMD hardware, GPUs, and interconnection networks such as hypercubes.

History

Kenneth E. Batcher introduced bitonic sort in 1968 as part of his work on sorting networks. The algorithm became a canonical example in parallel algorithms because its comparator structure is regular, analyzable, and easy to map onto hardware and theoretical processor networks.

How it works

A sequence is bitonic if it first rises and then falls, or is a cyclic shift of such a sequence. Bitonic sort first constructs bitonic subsequences of length 2, 4, 8, and so on. Visually this is the left part of the network: blocks that alternate increasing and decreasing merges. Only after this construction phase does the final bitonic merge sort the whole sequence. Then each bitonic block is merged by repeated compare-exchange layers. In the standard index-based implementation, a layer uses partner distance j = 2^d, so each processor i talks to processor i xor j. Because j flips exactly one bit, every compare-exchange happens between binary-neighbor processors on a hypercube dimension.

Complexity and mapping to a hypercube

For n = 2^k elements, the sorting network depth is k(k + 1)/2 = Θ(log² n). If we place one element on each of p = n processors and each compare-exchange costs Θ(1), then the parallel running time is Θ(log² n). Since each partner differs in exactly one bit, the communication follows edges of a hypercube, so every layer can be viewed as local neighbor exchange on one active dimension.

Code companion

Connect each visual decision to an implementation pattern.

for k from 2 to n doubling:
  for j from k/2 down to 1 halving:
    in parallel for i from 0 to n - 1:
      partner = i xor j
      if partner > i:
        if (i & k) == 0:
          compare-exchange(i, partner, ascending)
        else:
          compare-exchange(i, partner, descending)