Algorithm catalog

Parallel algorithms / interactive lesson

Parallel Radix Sort

Sort integer keys without pairwise comparisons. Watch workers classify decimal digits or binary bits, coordinate through an exclusive prefix scan, and perform a stable parallel scatter.

University Guided lesson
Snapshot 1/17

Stable order entering this phase

Decimal digits / pass 1 of 3 / reading 1s digit

#0

170

#1

045

#2

075

#3

090

#4

802

#5

024

#6

002

#7

066

#8

501

#9

031

#10

015

#11

128

Parallel workers

Workers are waiting for the first digit pass.

Global buckets and exclusive starts

bucket 0

waiting for scan

bucket 1

waiting for scan

bucket 2

waiting for scan

bucket 3

waiting for scan

bucket 4

waiting for scan

bucket 5

waiting for scan

bucket 6

waiting for scan

bucket 7

waiting for scan

bucket 8

waiting for scan

bucket 9

waiting for scan

Parallel decision timeline

Prepare 12 values for base-10 radix sort

About the algorithm

Parallel radix sort orders non-negative integers without comparing pairs of values. It processes one digit or bit position at a time. During every stable pass, workers classify their local values, combine bucket counts with a prefix scan, and scatter values into collision-free output slots.

From cards to processors

Radix sorting predates electronic computers: punched-card tabulators grouped records by one column at a time. Parallel machines keep the same idea but split the input among workers. Each worker classifies its own block simultaneously.

How it works

Start at the least significant position. Workers build local bucket histograms. A global exclusive prefix scan converts those counts into stable output ranges. Workers then scatter values to their reserved slots. Repeat for every more significant position.

Binary and decimal modes

Decimal mode uses ten buckets and usually needs fewer passes. Binary mode uses only two buckets, so each pass is especially simple and maps naturally to low-level parallel hardware. Both modes depend on the same stability guarantee.

Where it is used

Parallel radix sort is useful for GPU sorting, database processing, integer keys, graph analytics, and high-throughput data pipelines. It is strongest when keys have a bounded number of digits or bits.

Code companion

Connect each visual decision to an implementation pattern.

base = mode == binary ? 2 : 10

for each position from least significant to most significant:
  in parallel for each worker:
    count local values in each bucket

  bucketStarts = exclusivePrefixScan(all local bucket counts)

  in parallel for each worker:
    for each local value in stable order:
      destination = next reserved slot for value.digit
      output[destination] = value

  swap(input, output)

return input