Reservoir Sampling
Algorithm to randomly choose k samples from an INFINITE list or STREAM of data, or a list of size N, where N is UNKNOWN or large enough that the list DOESN'T FIT into main memory.
-
Algorithms and Data Structures: TheAlgorist.com
-
System Design: www.System.Design
-
Low Level Design: LowLevelDesign.io
-
Frontend Engineering: FrontendEngineering.io
জয় শ্রী রাম
🕉
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.
Algorithm to randomly select k samples from n elements:
- Take an array reservoir[] of length k and initialize it with the first k elements from the given array input[] of size n.
-
for index i := k to (n - 1) repeat:
-
Generate a random number. Let's say the random number is m.
If m > k : do nothing
else: swap reservoir[m] with input[i].
-
Generate a random number. Let's say the random number is m.
- return reservoir array.
Login to Access Content
We could also implement the algorithm recursively as shown below.
Suppose we have an algorithm that can pull a random set of k elements from an array of size (n - 1). How can you use this algorithm to pull a random set of k elements from an array of size n ?
We can first pull a random set of size k, say reservoir[0...(k - 1)], from the first (n - 1) elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it, to keep the size k). An easy way to do this is to pick a random number m from 0 through n. If m < k, then replace reservoir[m] with array[n].
Login to Access Content