Random selection of an item in a fixed-cardinal set

Typical application

Select can­di­date rules in a gen­er­a­tive gram­mar ran­dom­ly and apply them to the work­string until there is no more can­di­date rule.

Problem

To stop the process it is nec­es­sary to scan the entire gram­mar and make sure that there is no more can­di­date rule. Each time the work­string has changed, the "can­di­date / non can­di­date" sta­tus of each rule in the gram­mar may vary. Since scan­ning "N" rules in the gram­mar is time con­sum­ing, it is prefer­able to do it only after "max­try" unsuc­cess­ful attempts to find a can­di­date. The prob­lem inves­ti­gat­ed here is to find a min­i­mum "max­try" mak­ing it almost sure that there is no more candidate.

Related problem

Sometimes a select­ed rule must be applied repeat­ed­ly until it is no more can­di­date. If the posi­tion of deriva­tion in the work­string is ran­dom then the prob­lem amounts to select­ing one amongst "can­di­date" or "non can­di­date" posi­tions in a set whose car­di­nal "N" is the length of the workstring.

To summarize…

The two exam­ples above may be abstract­ed to the repeat­ed ran­dom selec­tion of an item in a set con­taing N items. The algo­rithm of a gen­er­al pro­ce­dure is giv­en below. The prob­lem is to eval­u­ate "max­try" so that in the worst case the prob­a­bil­i­ty of heuris­tic [2] being true is greater than a giv­en prob­a­bil­i­ty "p".

ScanSet()
    Return number of candidate items;

FoundItem()
    Select an item;
    Return "yes" if item is candidate, "no" otherwise;

Procedure()
    Evaluate maxtry;
    Start:
    nb_candidates = ScanSet();
    if (nb_candidates = 0) exit;
    if (nb_candidates = 1)
        Use candidate item found;
        Go to Start;
        [Heuristic 1: if there was a unique candidate item
then next time there should be less than two.]
    Again:
    repeat with try = 1 to maxtry
        if (FoundItem()) then
            Use candidate item found;
            Go to Again;
            [Heuristic 2: if try < maxtry there may still be
candidate items.]
Go to Start;

Solution

Let "k" be the num­ber of can­di­date items. The prob­a­bil­i­ty to find a can­di­date item after (maxtry-1) unsuc­cess­ful attempts is:

  1 - (1 - k/N)^maxtry

If k = 1 then the Scan() pro­ce­dure will find the unique can­di­date item. Therefeore the worst case is k = 2 and the con­di­tion is:

  1 - (1 - 2/N)^maxtry > p

yield­ing:

  maxtry > Log(1 - p) / Log(1 - 2/N)

i.e., if 2/N is much small­er than 1,

  maxtry > - N/2 Log(1 - p)

In Bol Processor, we imposed p = 0.64 so that max­try > N/2.

Indeed, this method is only valid if items have equal weights.

Leave a Reply

Your email address will not be published. Required fields are marked *