Quick Start¶

Instantiate a NumPy array, which holds the ground set \(V\):

[1]:

import numpy as np
import datetime

np.random.seed(42)
def timed(f, *args):
    tStart = datetime.datetime.now()
    res = f(*args)
    tEnd = datetime.datetime.now()
    return res, (tEnd - tStart).total_seconds()

N = 25000
d = 50
V = np.random.random((N, d))

We will now select a random subset \(S \subset V\) and evaluate its function value on the GPU using the ExemplarClustering class provided by the exemcl package:

[2]:

from exemcl import ExemplarClustering

S = np.take(V, np.random.choice(V.shape[0], size=5000, replace=False), axis=0)
exem = ExemplarClustering(ground_set=V, device="gpu")
fvalue = exem(S)

Change precision¶

It is also possible to change the required floating point precision to either half, single or double precision by adding the precision parameter during construction and specifying fp16, fp32 or fp64, respectively:

[3]:

for fp in ["fp16", "fp32", "fp64"]:
    exem = ExemplarClustering(ground_set=V, device="gpu", precision=fp)
    fvalue, secs = timed(exem.__call__, S)
    print(f"Function value ({fp}): {fvalue} (took {secs}s).")

Function value (fp16): 13.42227840423584 (took 0.166322s).
Function value (fp32): 13.4232816696167 (took 0.179814s).
Function value (fp64): 13.423275558060581 (took 3.023624s).

Multi-set evaluation¶

If you are using Exemplar-based clustering as target function for optimization, e.g. using the Greedy routine, you might want to evaluate more than one set per (optimization) step. We will now create a list of subsets, which we will evaluate for their function value:

[4]:

# Sample a set of 100 sets with 5000 vectors each, which should be evaluated for their function value.
S_multi = [np.take(V, np.random.choice(V.shape[0], size=5000, replace=False), axis=0) for _ in range(100)]
exem = ExemplarClustering(ground_set=V)
fvalues, secs = timed(exem.__call__, S_multi)
print(f"{len(fvalues)} function values found (took {secs}s).")

100 function values found (took 10.662428s).

Alternatively, you might have some fixed set \(S\) and you are looking for marginal gains resulting from marginal elements \(E = \left\lbrace e_1, ..., e_n \right\rbrace\).

[5]:

# Sample a set of 100 vectors, which with S should be evaluated for their respective marginal function values.
e_multi = [np.take(V, np.random.choice(V.shape[0], size=1, replace=False), axis=0).flatten() for _ in range(100)]
exem = ExemplarClustering(ground_set=V)
marginals, secs = timed(exem.__call__, S, e_multi)
print(f"{len(marginals)} marginal function values found (took {secs}s).")

100 marginal function values found (took 10.44639s).

CPU computation¶

You might be also interested in using a CPU-only version of this algorithm. In this case you can simply replace device=gpu with device=cpu. Please keep in mind, that FP16 operation is not available for CPU devices.

[6]:

for fp in ["fp32", "fp64"]:
    exem = ExemplarClustering(ground_set=V, device="cpu", precision=fp)
    fvalue, secs = timed(exem.__call__, S)
    print(f"Function value ({fp}): {fvalue} (took {secs}s).")

Function value (fp32): 13.423282623291016 (took 2.068594s).
Function value (fp64): 13.423275558060581 (took 2.436802s).