CPT / measured Vs30 data is available in higher densities around certain points of interest.
To prevent these resulting clusters from disproportionately influencing overall values, values within clusters share the weight of a value outside of a cluster.
A cluster of points close together has a similar impact overall to a single value that isn't part of a cluster.

Example


Here there are many points around Christchurch (teal cluster) and Wellington (yellow cluster) and a few scattered across the rest of New Zealand (brown, not part of a cluster).

To get an average which is more representative nationally (rather than mostly representing Christchurch and Wellington), points in a cluster are given a weighting of 1 / cluster_size.

In the above example the average vs30 value went from 195 (point based weighting) to 400 (cluster weighting). This is because the values in the clusters were generally lower than around the country.

Selecting Clusters

Clusters can be selected using the DBSCAN algorithm from sklearn.

python sklearn
from sklearn.cluster import DBSCAN

# dataset contains x (NZTM Easting), y (NZTM Northing) and vs30 (Vs30 value).
features = np.array([dataset.x, dataset.y]).T

# for usage: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
# eps determines how far points can be and be considered a cluster.
# because input coordinates are in NZTM, 15000 is in units metres.
# min_samples is the smallest amount of points that can be considered a cluster.
# n_jobs -1 means use all cpu resources for multiprocessing features.
dbscan = DBSCAN(eps=15000, min_samples=5, n_jobs=-1)
# run
dbscan.fit(features)

# labels_ give the cluster number for each of the input coordinates
# -1 means the point is not a member of any cluster
dbscan.labels_





  • No labels