Biometrics Northwest LLC

Performing Data Analysis and Modeling

Home

Services

About Us

Projects

Contact Us

Disclaimer

bktCluster vs. k-means run times for different sample sizes and a test data set with 200 5-dimensional random clusters.

N/Cluster Total
Points
bktCluster
Time (s)
bktCluster
Clusters
k-means
Time (s)
k-means
all match
replication
count
Speedup
100 20000 0.066 1 10.145 50 153.373
250 50000 0.095 1 23.174 50 243.139
500 100000 0.733 199 56.905 50 77.655
1000 200000 1.038 200 121.312 50 116.822
2500 500000 1.881 200 333.767 50 177.448
5000 1000000 3.987 200 695.367 50 174.422
10000 2000000 5.979 200 N/A N/A N/A
100 20000 0.374 200 10.145 50 27.096
250 50000 0.398 200 23.174 50 58.245
500 100000 0.712 200 56.905 50 79.962

  bktCluster large sample algorithm (default) with default distance threshold
  bktCluster large sample algorithm (default) with a distance threshold of 25
N/A k-means time exceeded 30 minutes due to excesive paging
Note 1 Cluster counts are not provided for k-means since the number of clusters, 200, is an input to the algorithm and it will always find 200 clusters.
Note2 The actual cluster centroids were found using k-means for all sample sizes at a replication count of 50. This means that for this data set at least 50 replications should be used, regardless of sample size, since the actual centroids would not be known in advance.

Back to top

For information send email to: info@biometricsnw.com

Last Update: September 22, 2021 11:00 AM

Copyright 2005-2021 Biometrics Northwest LLC