What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list? Edit: Thank You all for give me the answer, now I understand.
4,927 36 36 gold badges 35 35 silver badges 51 51 bronze badges asked Oct 19, 2016 at 23:32 241 1 1 gold badge 2 2 silver badges 4 4 bronze badges If it's not stated in the specification, it's presumably implementation-dependent. Commented Oct 19, 2016 at 23:40I can't imagine any reason why it wouldn't be O(1) . It just needs to pick a random number i from 0 to len(list) , then return list[i] . They're all constant-time operations.
Commented Oct 19, 2016 at 23:43If Python lists were implemented as linked lists, it would be O(n) , since getting the length of a list is linear, as is accessing a selected element. But since they're actually arrays, everything is constant.
Commented Oct 19, 2016 at 23:44O(1) . Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and list has O(1) random access indexing (as does tuple ). Simplified, all it does is seq[random.randrange(len(seq))] , which is obviously equivalent to a single index lookup operation.
An example where it would be O(n) is collections.deque , where indexing in the middle of the deque is O(n) (with a largish constant divisor though, so it's not that expensive unless the deque is reaching the thousands of elements range or higher). So basically, don't use a deque if it's going to be large and you plan to select random elements from it repeatedly, stick to list , tuple , str , byte / bytearray , array.array and other sequence types with O(1) indexing.
answered Oct 20, 2016 at 0:00 ShadowRanger ShadowRanger 153k 12 12 gold badges 202 202 silver badges 297 297 bronze badges What do you mean with "largish" constant divisor? Commented Oct 20, 2016 at 0:33@StefanPochmann: Implementation details here, but CPython's deque is a linked list of blocks that each store up to 64 values (the left and right blocks may be incomplete). Until the length crests 66, no lookup can require any block traversal, and even up to a thousand elements, you're still looking at a maximum of 17 blocks (at most half of which must be traversed for indexing), and the Python interpreter overhead generally swamps small C level costs like traversing 8 hops in a linked list.
Commented Oct 20, 2016 at 0:39 Oh cool, I had no idea it does that. But makes sense. Thanks, nice to know. Commented Oct 20, 2016 at 0:47@CGFoX: Weighted choice for choices is O(n log m) , where n is the number of choices being made, and m is the total number of weights (it bisects the cumulative weights for each choice, which is O(log m) work, and does that n times). So it's still O(1) per selection in terms of the sequence being selected from (excluding cases like deque ), it's just a tiny bit more expensive to weight the choice.
Commented Mar 5, 2020 at 13:56@CGFoX: Providing cumulative weights instead of relative weights saves preprocessing from relative to cumulative. Improving on that would require you to make a side-band list that described each index x number of times (e.g. if index 0 was 10% likely, and 1 was 90% likely, you'd make a list [0,1,1,1,1,1,1,1,1,1] , do a O(1) selection from that and then do a O(1) lookup of the original source list ). But that side-band list has a minimum size equal to the original list , and for weird percentage weights, it could occupy all your memory. O(log m) is cheap enough.
Commented Mar 5, 2020 at 14:16Though the question is about random.choice and previous answers on it have several explanations, when I searched for the complexity of np.random.choice , I didn't find an answer, so I decide to explain about np.random.choice .
choice(a, size=None, replace=True, p=None). Assume a.shape=(n,) and size=m .
When with replacement:
The complexity for np.random.choice is O(m) if p is not specified (assuming it as uniform distribution), and is O(n + n log m ) if p is specified.
The github code can be find here np.random.choice.
When p is not specified, choice generates an index array by randint and returns a[index] , so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).)
When p is specified, the function first computes prefix sum of p . Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items.
When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.)
If p is not specified, the complexity is the same as np.permutation(n) , even when m is only 1. See more here.
If p is specified, the expected complexity is at least $n \log n \log\frac$. (This is an upperbound, but not tight.)