Biometrics Northwest LLC

Performing Data Analysis and Modeling

Home

Services

About Us

Projects

Contact Us

Disclaimer

bktCluster vs. k-means run times for different sample sizes and a test data set with 200 5-dimensional random clusters.

N/Cluster Total
Points
bktCluster
Time (s)
bktCluster
Clusters
k-means
Time (s)
k-means
first match
replication
count
Speedup
100 20000 0.066 1 2.056 10 31.078
250 50000 0.095 1 2.214 5 23.229
500 100000 0.733 199 5.625 5 7.703
1000 200000 1.038 200 60.805 25 58.554
2500 500000 1.881 200 155.730 25 82.794
5000 1000000 3.987 200 140.248 10 35.179
10000 2000000 5.979 200 N/A N/A N/A
100 20000 0.374 200 2.056 10 5.490
250 50000 0.398 200 2.214 5 5.565
500 100000 0.712 200 5.625 5 7.942

  bktCluster large sample algorithm (default) with default distance threshold
  bktCluster large sample algorithm (default) with a distance threshold of 25
N/A k-means time exceeded 30 minutes due to excesive paging
Note 1 Cluster counts are not provided for k-means since the number of clusters, 200, is an input to the algorithm and it will always find 200 clusters.
Note 2 The actual cluster centroids were found using k-means for all sample sizes at a replication count of 50. This means that for this data set at least 50 replications should be used, regardless of sample size, since the actual centroids would not be known in advance. The timing results for 50 replications can be found here.

Back to top

For information send email to: info@biometricsnw.com

Last Update: September 22, 2021 11:00 AM

Copyright 2005-2021 Biometrics Northwest LLC