Quick Start¶
Instantiate a NumPy array, which holds the ground set \(V\):
[1]:
import numpy as np
import datetime
np.random.seed(42)
def timed(f, *args):
tStart = datetime.datetime.now()
res = f(*args)
tEnd = datetime.datetime.now()
return res, (tEnd - tStart).total_seconds()
N = 25000
d = 50
V = np.random.random((N, d))
We will now select a random subset \(S \subset V\) and evaluate its function value on the GPU using the ExemplarClustering
class provided by the exemcl
package:
[2]:
from exemcl import ExemplarClustering
S = np.take(V, np.random.choice(V.shape[0], size=5000, replace=False), axis=0)
exem = ExemplarClustering(ground_set=V, device="gpu")
fvalue = exem(S)
Change precision¶
It is also possible to change the required floating point precision to either half, single or double precision by adding the precision
parameter during construction and specifying fp16
, fp32
or fp64
, respectively:
[3]:
for fp in ["fp16", "fp32", "fp64"]:
exem = ExemplarClustering(ground_set=V, device="gpu", precision=fp)
fvalue, secs = timed(exem.__call__, S)
print(f"Function value ({fp}): {fvalue} (took {secs}s).")
Function value (fp16): 13.42227840423584 (took 0.166322s).
Function value (fp32): 13.4232816696167 (took 0.179814s).
Function value (fp64): 13.423275558060581 (took 3.023624s).
Multi-set evaluation¶
If you are using Exemplar-based clustering as target function for optimization, e.g. using the Greedy routine, you might want to evaluate more than one set per (optimization) step. We will now create a list of subsets, which we will evaluate for their function value:
[4]:
# Sample a set of 100 sets with 5000 vectors each, which should be evaluated for their function value.
S_multi = [np.take(V, np.random.choice(V.shape[0], size=5000, replace=False), axis=0) for _ in range(100)]
exem = ExemplarClustering(ground_set=V)
fvalues, secs = timed(exem.__call__, S_multi)
print(f"{len(fvalues)} function values found (took {secs}s).")
100 function values found (took 10.662428s).
Alternatively, you might have some fixed set \(S\) and you are looking for marginal gains resulting from marginal elements \(E = \left\lbrace e_1, ..., e_n \right\rbrace\).
[5]:
# Sample a set of 100 vectors, which with S should be evaluated for their respective marginal function values.
e_multi = [np.take(V, np.random.choice(V.shape[0], size=1, replace=False), axis=0).flatten() for _ in range(100)]
exem = ExemplarClustering(ground_set=V)
marginals, secs = timed(exem.__call__, S, e_multi)
print(f"{len(marginals)} marginal function values found (took {secs}s).")
100 marginal function values found (took 10.44639s).
CPU computation¶
You might be also interested in using a CPU-only version of this algorithm. In this case you can simply replace device=gpu
with device=cpu
. Please keep in mind, that FP16 operation is not available for CPU devices.
[6]:
for fp in ["fp32", "fp64"]:
exem = ExemplarClustering(ground_set=V, device="cpu", precision=fp)
fvalue, secs = timed(exem.__call__, S)
print(f"Function value ({fp}): {fvalue} (took {secs}s).")
Function value (fp32): 13.423282623291016 (took 2.068594s).
Function value (fp64): 13.423275558060581 (took 2.436802s).