Quick StartΒΆ
Instantiate an ordinary NumPy array, which holds the data we want to inspect for outlying observations:
[1]:
import numpy as np
from genif import GeneralizedIsolationForest
N = 1000
d = 50
X = np.random.random((N, d))
We may now construct an instance of the GeneralizedIsolationForest
and call the fit_predict
method to receive predictions:
[2]:
gif = GeneralizedIsolationForest(k=10, n_models=50, sample_size=256, kernel="rbf", kernel_scaling=[0.05], sigma=0.01)
gif.fit_predict(X)[:50]
[2]:
array([0.00322031, 0.0031 , 0.00304063, 0.00335938, 0.00295469,
0.00334531, 0.00304844, 0.00302813, 0.00267656, 0.00299531,
0.00277969, 0.00304844, 0.00305937, 0.0036375 , 0.00280938,
0.00264219, 0.0028875 , 0.00277031, 0.0031625 , 0.00306719,
0.00316094, 0.00313125, 0.00304844, 0.00296875, 0.00288281,
0.00292031, 0.00317031, 0.00312031, 0.0031375 , 0.00308125,
0.00309063, 0.00336563, 0.00338125, 0.00306719, 0.00306094,
0.0033125 , 0.00270156, 0.00283906, 0.00314688, 0.00292813,
0.00351875, 0.00294063, 0.00314844, 0.00341875, 0.00345156,
0.00313281, 0.00332344, 0.00308594, 0.00281406, 0.00292969])
The given output indicates the probabilities for the first fifty vectors by the region it has been assigned to. Values near one stand for rather inlying observations, while values near zero stand for rather outlying observations.