Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: siplab-gt/pairsearch
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: andymass/pairsearch
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Can’t automatically merge. Don’t worry, you can still create the pull request.
  • 2 commits
  • 10 files changed
  • 1 contributor

Commits on Jun 7, 2019

  1. Verified

    This commit was signed with the committer’s verified signature.
    huonw Huon Wilson
    Copy the full SHA
    e915f4a View commit details

Commits on Jun 9, 2019

  1. Add enusvm baseline method

    andymass committed Jun 9, 2019

    Verified

    This commit was signed with the committer’s verified signature.
    huonw Huon Wilson
    Copy the full SHA
    6b8f7fb View commit details
Showing with 1,274 additions and 5 deletions.
  1. +24 −5 README.md
  2. +188 −0 enusvm/enusvm_solve.m
  3. +25 −0 enusvm/load_embedding.m
  4. +252 −0 enusvm/nusvm_sim.m
  5. +203 −0 make-embedding/LICENSE
  6. +3 −0 make-embedding/MISSING_DATA_README
  7. +6 −0 make-embedding/ORIGIN
  8. +61 −0 make-embedding/ckltest.py
  9. +512 −0 make-embedding/utilsCrowdKernel.py
  10. 0 output-data/.gitkeep
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,42 @@
# Active Embedding Search via Noisy Paired Comparisons

Project code for "Active Embedding Search via Noisy Paired Comparisons" ([ICML 2019](https://arxiv.org/abs/1905.04363))
Project code for "Active Embedding Search via Noisy Paired Comparisons"
([ICML 2019](https://arxiv.org/abs/1905.04363)) by Gregory H. Canal,
Andrew K. Massimino, Mark A. Davenport, Christopher J. Rozell.

## Paper abstract
Suppose that we wish to estimate a user's preference vector w from paired comparisons of the form "does user w prefer item p or item q?," where both the user and items are embedded in a low-dimensional Euclidean space with distances that reflect user and item similarities. Such observations arise in numerous settings, including psychometrics and psychology experiments, search tasks, advertising, and recommender systems. In such tasks, queries can be extremely costly and subject to varying levels of response noise; thus, we aim to actively choose pairs that are most informative given the results of previous comparisons. We provide new theoretical insights into the benefits and challenges of greedy information maximization in this setting, and develop two novel strategies that maximize lower bounds on information gain and are simpler to analyze and compute respectively. We use simulated responses from a real-world dataset to validate our strategies through their similar performance to greedy information maximization, and their superior preference estimation over state-of-the-art selection methods as well as random queries.

## Requirements

### Packages (python)
- [Python 3](https://www.python.org/downloads/)
- [NumPy](https://www.numpy.org/)
- [SciPy](https://www.scipy.org/)
- [Matplotlib](https://matplotlib.org/)
- [PyStan](https://pystan.readthedocs.io/en/latest/)
- [CVXOPT](https://cvxopt.org/)

### Data

`run_experiments.py` uses our pre-processed embedding files contained in
the sub-directory `make-embedding/data/`. It is not necessary to download
the original dataset in this case.

To build an embedding using the scripts in `make-embedding/` or to
pre-process an embedding using `process_embedding.py` it is necessary to
have the Food-10k dataset of triplets. This dataset may be obtained from
the [SE(3) Computer Vision Group at Cornell Tech](https://vision.cornell.edu/se3/concept-embeddings/).

### Packages (MATLAB only for Enusvm/GaussCloud method baseline)
- [CVX](cvxr.com/cvx/)

## Code
- `active_search.py`: module for proposed InfoGain, EPMV, and MCMV search methods.
- `actrankq.py`: module for ActRankQ, our implementation of an [active ranking baseline](https://papers.nips.cc/paper/4427-active-ranking-using-pairwise-comparisons.pdf).
- `run_experiments.py`: script to run paper experiments.
- `process_embedding.py`: script to estimate noise constant and embedding scaling from embedding file


Send correspondence to Greg Canal (gregory.canal@gatech.edu) and Andrew Massimino (massimino@gatech.edu)
- `process_embedding.py`: script to estimate noise constant and embedding scaling from embedding file
- `make-embedding/*.py`: scripts to generate embedding from human intelligence task sourced triplets.
- `enusvm/*.m`: implementation and simulation for the "GaussCloud" [baseline method](https://arxiv.org/abs/1802.10489).

Please send correspondence to Greg Canal (gregory.canal@gatech.edu) and Andrew Massimino (massimino@gatech.edu)
188 changes: 188 additions & 0 deletions enusvm/enusvm_solve.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
function [x_hat, xi, cinfo] = enusvm_solve(A, b, y, nu, maxiter, mopts)

if nargin < 6 || isempty(mopts)
normconstraint = 0;
mopts = [];
else
normconstraint = mopts.enforce; % 0, 1, 2, 3, or 4
R = mopts.value;
end

[n, p] = size(A);

% get the "admissible" range of nu
examp = [A' -b];
bias = 0; % set to 1 to have bias term optimized

C = 1e6;
cvx_begin quiet
cvx_precision default
variable w_hat(n+1)
if bias > 0, variable bias, end
variable xi(p)
%
dual variable d_alpha
dual variable d_beta
%
minimize 0.5*sum_square(w_hat) + C*sum(xi)
subject to
d_alpha : y.*(examp*w_hat + bias) >= 1 - xi; %#ok
d_beta : xi >= 0; %#ok
cvx_end

nu_min = sum(d_alpha) / (C*p);

% 1: Enusvm, 0: single-pass
extended = 1;

if nu > nu_min
extended = 0;
nu0 = nu;
else
nu0 = 1.01*nu_min;
end

% starting point
cvx_begin quiet
cvx_precision default
variable w_tilde(n+1)
if bias > 0, variable bias, end
variable xi(p)
variable rho
%
dual variable d_alpha
dual variable d_beta
dual variable d_delta
%
minimize 0.5*sum_square(w_tilde) - nu0*rho + (1/p)*sum(xi)
subject to
d_alpha : y.*(examp*w_tilde + bias) >= rho - xi; %#ok
d_beta : xi >= 0; %#ok
d_delta : rho >= 0; %#ok
cvx_end

random_init = 0;
if isnan(d_delta) || any(isnan(w_tilde))
fprintf('bad starting point: status %s, nu %g, min %g, ext %d\n', ...
cvx_status, nu0, nu_min, extended);
random_init = 1;
end

if ~extended
fprintf('ordinary nu-svm success\n');
x_hat = w_tilde(1:n)/w_tilde(n+1);

cinfo.iters = 0;
cinfo.gap = 0;
cinfo.w_hat = [];
cinfo.rho = rho;
cinfo.nu_min = nu_min;

return
end

if random_init
w_tilde = randn(n+1,1) / sqrt(n+1);
end

% Enu svm
t = tic;
tol = 1e-3;

gam = 9/10;

if isfield(mopts, 'gamma')
gam = mopts.gamma;
end

if nargin < 5
maxiter = 100;
end
for ii = 1:maxiter

% no bias
% -y.*examp*w_hat + rho - xi <= 0
% -xi <= 0
% w_tilde'*w_hat == 2

lpf = [ -nu, ones(1,p)/p, zeros(1,n+1) ];
lpA = [ ones(p,1), -eye(p), diag(-y)*examp; ...
zeros(p,1), -eye(p), zeros(p,n+1) ];

lpb = zeros(2*p,1);

if normconstraint == 1
constraint2 = [ w_tilde(1:n); 0 ] ./ w_tilde(n+1)^2;
lpA = [lpA; [zeros(1,p+1), constraint2']];
lpb = [lpb; R];
end

lpAeq = [ zeros(1,p+1), w_tilde' ];
opts = optimoptions('linprog', 'Display', 'off', ...
'Algorithm', 'dual-simplex');
lp = linprog(lpf, lpA, lpb, lpAeq, 2, [], [], opts);
w_hat = lp(2+p:end);
xi = lp(2:p+1);

%{
cvx_begin quiet
cvx_precision default
variable w_hat(n+1)
if bias > 0, variable bias, end
variable xi(p)
variable rho
%
dual variable d_alpha
dual variable d_beta
%
minimize -nu*rho + (1/p)*sum(xi)
subject to
d_alpha : y.*(examp*w_hat + bias) >= rho - xi; %#ok
d_beta : xi >= 0; %#ok
w_tilde'*w_hat == 2; %#ok
cvx_end
%}

if norm(w_tilde - w_hat) <= tol*norm(w_hat)
break;
end

if normconstraint == 2 || normconstraint == 3
x_hat_ = w_hat(1:n)./w_hat(n+1);
if norm(x_hat_) > R
x_hat_ = x_hat_ ./ norm(x_hat_) * R;
w_hat = [x_hat_; 1];
w_hat = w_hat / norm(w_hat);
end
end
if normconstraint == 2
x_tilde_ = w_tilde(1:n)./w_tilde(n+1);
if norm(x_tilde_) > R
x_tilde_ = x_tilde_ ./ norm(x_tilde_) * R;
w_tilde = [x_tilde_; 1];
w_tilde = w_tilde / norm(w_tilde);
end
end

w_tilde = gam*w_tilde + (1-gam)*w_hat;

end

cinfo.iters = ii;
cinfo.gap = norm(w_tilde - w_hat)/norm(w_hat);
cinfo.w_hat = w_hat;
cinfo.rho = lp(1);
cinfo.nu_min = nu_min;

fprintf('iterations %d, gap %d, time %gs\n', ...
ii, norm(w_tilde-w_hat)/norm(w_hat), toc(t));

x_hat = w_hat(1:n)/w_hat(n+1);

if normconstraint == 4
if norm(x_hat) > R
x_hat = x_hat ./ norm(x_hat) * R;
end
end


25 changes: 25 additions & 0 deletions enusvm/load_embedding.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
function [X,kopt,error_frac,embed_scale] = load_embedding(dim)

dataset_home = '../make-embedding/data/';
embed_files = struct(...
'd2', 'output-d220180512-001631.mat',...
'd3', 'output-d320180509-165802.mat',...
'd4', 'output-d420180509-165958.mat',...
'd5', 'output-d520180512-000800.mat',...
'd6', 'output-d620180512-001232.mat',...
'd7', 'output-d720180512-001205.mat',...
'd9', 'output-d9-20190428-020247.mat',...
'd12', 'output-d12-20190428-020705.mat',...
'd15', 'output-d15-20190428-020815.mat',...
'd20', 'output-d20-20190428-020534.mat'...
);

embed_file = embed_files.(['d',num2str(dim)]);
load([dataset_home,embed_file],'X');
load([dataset_home,embed_file(1:end-4),'_processed.mat'],'embed_scale','kopt','error_frac');

assert(size(X,2) == dim);
X = X - mean(X,1);
X = X*embed_scale;
end

Loading