# Random selection of an item in a fixed-cardinal set

### Typical application

Select can­di­date rules in a gen­er­a­tive gram­mar ran­dom­ly and apply them to the work­string until there is no more can­di­date rule.

### Problem

To stop the process it is nec­es­sary to scan the entire gram­mar and make sure that there is no more can­di­date rule. Each time the work­string has changed, the "can­di­date / non can­di­date" sta­tus of each rule in the gram­mar may vary. Since scan­ning "N" rules in the gram­mar is time con­sum­ing, it is prefer­able to do it only after "max­try" unsuc­cess­ful attempts to find a can­di­date. The prob­lem inves­ti­gat­ed here is to find a min­i­mum "max­try" mak­ing it almost sure that there is no more candidate.

### Related problem

Sometimes a select­ed rule must be applied repeat­ed­ly until it is no more can­di­date. If the posi­tion of deriva­tion in the work­string is ran­dom then the prob­lem amounts to select­ing one amongst "can­di­date" or "non can­di­date" posi­tions in a set whose car­di­nal "N" is the length of the workstring.

### To summarize…

The two exam­ples above may be abstract­ed to the repeat­ed ran­dom selec­tion of an item in a set con­taing N items. The algo­rithm of a gen­er­al pro­ce­dure is giv­en below. The prob­lem is to eval­u­ate "max­try" so that in the worst case the prob­a­bil­i­ty of heuris­tic [2] being true is greater than a giv­en prob­a­bil­i­ty "p".

```ScanSet()
Return number of candidate items;

FoundItem()
Select an item;
Return "yes" if item is candidate, "no" otherwise;

Procedure()
Evaluate maxtry;
Start:
nb_candidates = ScanSet();
if (nb_candidates = 0) exit;
if (nb_candidates = 1)
Use candidate item found;
Go to Start;
[Heuristic 1: if there was a unique candidate item
then next time there should be less than two.]
Again:
repeat with try = 1 to maxtry
if (FoundItem()) then
Use candidate item found;
Go to Again;
[Heuristic 2: if try < maxtry there may still be
candidate items.]
Go to Start;```

### Solution

Let "k" be the num­ber of can­di­date items. The prob­a­bil­i­ty to find a can­di­date item after (maxtry-1) unsuc­cess­ful attempts is:

``  1 - (1 - k/N)^maxtry``

If k = 1 then the Scan() pro­ce­dure will find the unique can­di­date item. Therefeore the worst case is k = 2 and the con­di­tion is:

``  1 - (1 - 2/N)^maxtry > p``

yield­ing:

``  maxtry > Log(1 - p) / Log(1 - 2/N)``

i.e., if 2/N is much small­er than 1,

``  maxtry > - N/2 Log(1 - p)``

In Bol Processor, we imposed p = 0.64 so that max­try > N/2.

Indeed, this method is only valid if items have equal weights.