Other samplers

The OASIS package also implements three alternative sampling-based estimation methods. These alternative methods are all non-adaptive—i.e. they don’t learn from the labels as they are received from the oracle. Experiments comparing these alternative methods with OASIS in the entity resolution domain are presented in [Marchant17].

Passive sampler

This method is the simplest approach and involves choosing items to label by sampling uniformly from the pool. It supports sampling with or without replacement through the replace parameter.

For further information, see the oasis.PassiveSampler class.

Non-adaptive importance sampler

This method is based on importance sampling using a similar “optimal” instrumental distribution to the one used in OASIS (see [Sawade09]). The key difference is that this method approximates the “optimal” instrumental distribution based solely on the classifier scores, which may be inaccurate. It cannot adapt based on the incoming ground truth labels.

For further information, see the oasis.ImportanceSampler class.

Stratified sampler

This method is based on unbiased stratified sampling (see [Druck11]). Like OASIS, it depends on a partitioning of the pool into a collection of strata. In practice, it tends to perform similarly to passive sampling.

For further information, see the oasis.DruckSampler class.

References

Druck11

G. Druck and A. McCallum, “Toward Interactive Training and Evaluation,” in Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 947–956, 2011.

Sawade09

C. Sawade, N. Landwehr, and T. Scheffer, “Active Estimation of F-Measures,” in Advances in Neural Information Processing Systems, pp. 2083–2091, 2010.

Marchant17

N. G. Marchant and B. I. P. Rubinstein, “In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling,” in Proceedings of the VLDB Endowment, vol. 10, no. 11, pp. 1322-1333, 2017.