In this paper, we study customer data acquisition and selection problem when the data have to be purchased. We first propose using Generalized Second Price (GSP) auction to purchase data. We show that when the number of bidders is very large in GSP, the best bidding strategy is truth-telling. We then formulate the data selection problem as an optimization problem using both quality and cost criteria. The new dataset is selected such that it best represents the probability distribution of the target population while minimizing total cost for the unused data.
Poster #: 23