oasis.PassiveSampler¶
- class oasis.PassiveSampler(alpha, predictions, oracle, max_iter=None, identifiers=None, replace=True, debug=False)¶
- Passive sampling for estimation of the weighted F-measure - Estimates the quantity: - TP / (alpha * (TP + FP) + (1 - alpha) * (TP + FN)) - on a finite pool by sampling items uniformly and querying their labels from an oracle (which must be provided). - Parameters
- alphafloat
- Weight for the F-measure. Valid weights are on the interval [0, 1]. - alpha == 1corresponds to precision,- alpha == 0corresponds to recall, and- alpha == 0.5corresponds to the balanced F-measure.
- predictionsarray-like, shape=(n_items,n_class)
- Predicted labels for the items in the pool. Rows represent items and columns represent different classifiers under evaluation (i.e. more than one classifier may be evaluated in parallel). Valid labels are 0 or 1. 
- oraclefunction
- Function that returns ground truth labels for items in the pool. The function should take an item identifier as input (i.e. its corresponding row index) and return the ground truth label. Valid labels are 0 or 1. 
- max_iterint, optional, default None
- Maximum number of iterations to expect for pre-allocating arrays. Once this limit is reached, sampling can no longer continue. If no value is given, defaults to n_items. 
- replacebool, optional, default True
- Whether to sample with or without replacement. 
 
- Other Parameters
- identifiersarray-like, optional, default None
- Unique identifiers for the items in the pool. Must match the row order of the “predictions” parameter. If no value is given, defaults to [0, 1, …, n_items]. 
- debugbool, optional, default True
- Whether to print out verbose debugging information. 
 
- Attributes
- estimate_numpy.ndarray
- F-measure estimates for each iteration. 
- queried_oracle_numpy.ndarray
- Records whether the oracle was queried at each iteration (True) or whether a cached label was used (False). 
- cached_labels_numpy.ndarray, shape=(n_items,)
- Previously sampled ground truth labels for the items in the pool. Items which have not had their labels queried are recorded as NaNs. The order of the items matches the row order for the “predictions” parameter. 
- t_int
- Iteration index. 
 
 - Methods - reset()- Resets the sampler to its initial state - sample(n_to_sample, **kwargs)- Sample a sequence of items from the pool - sample_distinct(n_to_sample, **kwargs)- Sample a sequence of items from the pool until a minimum number of distinct items are queried - __init__(alpha, predictions, oracle, max_iter=None, identifiers=None, replace=True, debug=False)¶
- Initialize self. See help(type(self)) for accurate signature. 
 
