Reservoir Sampling
Algorithm to randomly choose k samples from an INFINITE list or STREAM of data, or a list of size N, where N is UNKNOWN or large enough that the list DOESN'T FIT into main memory.

Algorithms and Data Structures: TheAlgorist.com

System Design: www.System.Design

Low Level Design: LowLevelDesign.io

Frontend Engineering: FrontendEngineering.io
জয় শ্রী রাম
🕉
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.
Algorithm to randomly select k samples from n elements:
 Take an array reservoir[] of length k and initialize it with the first k elements from the given array input[] of size n.

for index i := k to (n  1) repeat:

Generate a random number. Let's say the random number is m.
If m > k : do nothing
else: swap reservoir[m] with input[i].

Generate a random number. Let's say the random number is m.
 return reservoir array.
Login to Access Content
We could also implement the algorithm recursively as shown below.
Suppose we have an algorithm that can pull a random set of k elements from an array of size (n  1). How can you use this algorithm to pull a random set of k elements from an array of size n ?
We can first pull a random set of size k, say reservoir[0...(k  1)], from the first (n  1) elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it, to keep the size k). An easy way to do this is to pick a random number m from 0 through n. If m < k, then replace reservoir[m] with array[n].
Login to Access Content