Title: | Package for Brik, Fabrik and Fdebrik Algorithms to Initialise Kmeans |
---|---|
Description: | Implementation of the BRIk, FABRIk and FDEBRIk algorithms to initialise k-means. These methods are intended for the clustering of multivariate and functional data, respectively. They make use of the Modified Band Depth and bootstrap to identify appropriate initial seeds for k-means, which are proven to be better options than many techniques in the literature. Torrente and Romo (2021) <doi:10.1007/s00357-020-09372-3> It makes use of the functions kma and kma.similarity, from the archived package fdakma, by Alice Parodi et al. |
Authors: | Javier Albert Smet <[email protected]> and Aurora Torrente <[email protected]>. Alice Parodi, Mirco Patriarca, Laura Sangalli, Piercesare Secchi, Simone Vantini and Valeria Vitelli, as contributors. |
Maintainer: | Aurora Torrente <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0 |
Built: | 2024-11-14 05:44:21 UTC |
Source: | https://github.com/cran/briKmeans |
brik
computes appropriate seeds –based on bootstrap and the MBD depth– to initialise k-means, which is then run.
brik(x, k, method="Ward", nstart=1, B=10, J = 2, ...)
brik(x, k, method="Ward", nstart=1, B=10, J = 2, ...)
x |
a data matrix containing |
k |
number of clusters |
method |
clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; |
nstart |
number of random initialisations when using the |
B |
number of bootstrap replicates to be generated |
J |
number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used |
... |
additional arguments to be passed to the |
The brik algorithm is a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. It consists of two stages: first, a set of cluster centers is obtained by applying k-means to bootstrap replications of the original data to be, next, clustered; the deepest point in each assembled cluster is returned as initial seeds for k-means.
seeds |
a matrix of size |
km |
an object of class |
Javier Albert Smet [email protected] and Aurora Torrente [email protected]
Torrente, A. and Romo, J. (2020). Initializing k-means Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3.
## brik algorithm ## simulated data set.seed(0) g1 <- matrix(rnorm(200,0,3), 25, 8) ; g1[,1]<-g1[,1]+4; g2 <- matrix(rnorm(200,0,3), 25, 8) ; g2[,1]<-g2[,1]+4; g2[,3]<-g2[,3]-4 g3 <- matrix(rnorm(200,0,3), 25, 8) ; g3[,1]<-g3[,1]+4; g3[,3]<-g3[,3]+4 x <- rbind(g1,g2,g3) labels <-c(rep(1,25),rep(2,25),rep(3,25)) C1 <- kmeans(x,3) C2 <- brik(x,3,B=25) table(C1$cluster, labels) table(C2$km$cluster, labels)
## brik algorithm ## simulated data set.seed(0) g1 <- matrix(rnorm(200,0,3), 25, 8) ; g1[,1]<-g1[,1]+4; g2 <- matrix(rnorm(200,0,3), 25, 8) ; g2[,1]<-g2[,1]+4; g2[,3]<-g2[,3]-4 g3 <- matrix(rnorm(200,0,3), 25, 8) ; g3[,1]<-g3[,1]+4; g3[,3]<-g3[,3]+4 x <- rbind(g1,g2,g3) labels <-c(rep(1,25),rep(2,25),rep(3,25)) C1 <- kmeans(x,3) C2 <- brik(x,3,B=25) table(C1$cluster, labels) table(C2$km$cluster, labels)
elbowRule
runs the FABRIk algorithm for different degrees of freedom (DF) and suggests the best of such values as the one where the minimum distortion is obtained. An optional visualization of the computed values allows the choice of alternative suitable DF values based on an elbow-like rule.
elbowRule(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 4:20, knots = NULL, plot = FALSE, ...)
elbowRule(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 4:20, knots = NULL, plot = FALSE, ...)
x |
a data matrix containing |
k |
number of clusters |
method |
clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; |
nstart |
number of random initialisations when using the |
B |
number of bootstrap replicates to be generated |
J |
number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used |
x.coord |
initial x coordinates (time points) where the functional data is observed; if not provided, it is assumed to be |
OSF |
oversampling factor for the smoothed data; an OSF of m means that the number of (equally spaced) time points observed in the approximated function is m times the number of original number of features, |
vect |
optional collection of x coordinates (time points) where to assess the smoothed data; if provided, it ignores the OSF |
intercept |
if |
degPolyn |
degree of the piecewise polynomial; 3 by default (cubic splines) |
degFr |
a vector containing tentative values of the degrees of freedom, to be tested |
knots |
the internal breakpoints that define the spline |
plot |
a Boolean parameter; it allows plotting the distortion against the degrees of freedom. Set to |
... |
additional arguments to be passed to the |
The function implements a simple elbow-like rule that allows selecting an appropriate value for the DF parameter among the tested ones. It computes the distortion obtained for each of these values and returns the one yielding to the smallest distortion. By setting the parameter plot
to TRUE
the distortion is plotted against the degrees of freedom and elbows or minima can be visually detected.
df |
the original vector of DF values to be tested |
tot.withinss |
a vector containing the distortion obtained for each tested DF value |
optimal |
DF value producing the smallest distortion among the tested |
Javier Albert Smet [email protected] and Aurora Torrente [email protected]
Torrente, A. and Romo, J. (2020). Initializing Kmeans Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3. Albert-Smet, J., Torrente, A. and Romo J. (2021). Modified Band Depth Based Initialization of Kmeans for Functional Data Clustering. Submitted to Computational Statistics and Data Analysis.
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 80) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:20){ labels[20*(i-1) + j] <- i if(i == 1){x[20*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[20*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[20*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[20*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } # ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=FALSE) ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=TRUE)
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 80) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:20){ labels[20*(i-1) + j] <- i if(i == 1){x[20*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[20*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[20*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[20*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } # ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=FALSE) ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=TRUE)
fabrik
fits splines to the multivariate dataset and runs the BRIk algorithm on the smoothed data. For functional data, this is just a straight forward application of BRIk to the k-means algorithm; for multivariate data, the result corresponds to an alternative clustering method where the objective function is not necessarily minimised, but better allocations are obtained in general.
fabrik(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 5, knots = NULL, ...)
fabrik(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 5, knots = NULL, ...)
x |
a data matrix containing |
k |
number of clusters |
method |
clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; |
nstart |
number of random initialisations when using the |
B |
number of bootstrap replicates to be generated |
J |
number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used |
x.coord |
initial x coordinates (time points) where the functional data is observed; if not provided, it is assumed to be |
OSF |
oversampling factor for the smoothed data; an OSF of m means that the number of (equally spaced) time points observed in the approximated function is m times the number of original number of features, |
vect |
optional collection of x coordinates (time points) where to assess the smoothed data; if provided, it ignores the OSF |
intercept |
if |
degPolyn |
degree of the piecewise polynomial; 3 by default (cubic splines) |
degFr |
degrees of freedom, as in the |
knots |
the internal breakpoints that define the spline |
... |
additional arguments to be passed to the |
The FABRIk algorithm extends the BRIk algorithm to the case of longitudinal functional data by adding a step that includes B-splines fitting and evaluation of the curve at specific x coordinates. Thus, it allows handling issues such as noisy or missing data. It identifies smoothed initial seeds that are used as starting points of kmeans on the smoothed data. The resulting clustering does not optimise the distortion (sum of squared distances of each data point to its nearest centre) in the original data space but it provides in general a better allocation of datapoints to real groups.
seeds |
a matrix of size |
km |
an object of class |
Javier Albert Smet [email protected] and Aurora Torrente [email protected]
Torrente, A. and Romo, J. (2020). Initializing Kmeans Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3. Albert-Smet, J., Torrente, A. and Romo J. (2021). Modified Band Depth Based Initialization of Kmeans for Functional Data Clustering. Submitted to Computational Statistics and Data Analysis.
## fabrik algorithm ## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } C1 <- kmeans(x,4) C2 <- fabrik(x,4,B=25) table(C1$cluster, labels) table(C2$km$cluster, labels)
## fabrik algorithm ## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } C1 <- kmeans(x,4) C2 <- fabrik(x,4,B=25) table(C1$cluster, labels) table(C2$km$cluster, labels)
fdebrik
first fits splines to the multivariate dataset; then it identifies functional centers that form tighter groups, by means of the kma algorithm; finally, it converts these into a multivariate data set in a selected dimension, clusters them and finds the deepest point of each cluster to be used as initial seeds. The multivariate objective function is not necessarily minimised, but better allocations are obtained in general.
fdebrik(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, functionalDist="d0.pearson", OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 5, knots = NULL, ...)
fdebrik(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, functionalDist="d0.pearson", OSF = 1, vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 5, knots = NULL, ...)
x |
a data matrix containing |
k |
number of clusters |
method |
clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; |
nstart |
number of random initialisations when using the |
B |
number of bootstrap replicates to be generated |
J |
number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used |
x.coord |
initial x coordinates (time points) where the functional data is observed; if not provided, it is assumed to be |
functionalDist |
similarity measure between functions to be used. Currently, only the cosine of the angles between functions ( |
OSF |
oversampling factor for the smoothed data; an OSF of m means that the number of (equally spaced) time points observed in the approximated function is m times the number of original number of features, |
vect |
optional collection of x coordinates (time points) where to assess the smoothed data; if provided, it ignores the OSF |
intercept |
if |
degPolyn |
degree of the piecewise polynomial; 3 by default (cubic splines) |
degFr |
degrees of freedom, as in the |
knots |
the internal breakpoints that define the spline |
... |
additional arguments to be passed to the |
The FDEBRIk algorithm extends the BRIk algorithm to the case of longitudinal functional data by adding a B-spline fitting step, a collection of functional centers by means of the kma algorithm and the evaluation of these at specific x coordinates. Thus, it allows handling issues such as noisy or missing data. It identifies smoothed initial seeds that are used as starting points of kmeans on the smoothed data. The resulting clustering does not optimise the distortion (sum of squared distances of each data point to its nearest centre) in the original data space but it provides in general a better allocation of datapoints to real groups.
seeds |
a matrix of size |
km |
an object of class |
Javier Albert Smet [email protected] and Aurora Torrente [email protected]
Torrente, A. and Romo, J. Initializing Kmeans Clustering by Bootstrap and Data Depth. J Classif (2021) 38(2):232-256. DOI: 10.1007/s00357-020-09372-3 Albert-Smet, J., Torrente, A. and Romo, J. Modified Band Depth Based Initialization of Kmeans for Functional Data Clustering. Submitted to Adv. Data Anal. Classif. (2022). Sangalli, L.M., Secchi, P., Vantini, V.S. and Vitelli, V. K-mean alignment for curve clustering. Comput. Stat. Data Anal. (2010) 54(5):1219-1233. DOI:10.1016/j.csda.2009.12.008
## fdebrik algorithm ## Not run: ## simulated data set.seed(1) x.coord = seq(0,1,0.05) x <- matrix(ncol = length(x.coord), nrow = 40) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:10){ labels[10*(i-1) + j] <- i if(i == 1){x[10*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[10*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[10*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[10*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } C1 <- kmeans(x,4) C2 <- fdebrik(x,4,B=5) table(C1$cluster, labels) table(C2$km$cluster, labels) ## End(Not run)
## fdebrik algorithm ## Not run: ## simulated data set.seed(1) x.coord = seq(0,1,0.05) x <- matrix(ncol = length(x.coord), nrow = 40) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:10){ labels[10*(i-1) + j] <- i if(i == 1){x[10*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[10*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[10*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[10*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } C1 <- kmeans(x,4) C2 <- fdebrik(x,4,B=5) table(C1$cluster, labels) table(C2$km$cluster, labels) ## End(Not run)
kma jointly performs clustering and alignment of a functional dataset (multidimensional or unidimensional functions).
kma(x, y0 = NULL, y1 = NULL, n.clust = 1, warping.method = "affine", similarity.method = "d1.pearson", center.method = "k-means", seeds = NULL, optim.method = "L-BFGS-B", span = 0.15, t.max = 0.1, m.max = 0.1, n.out = NULL, tol = 0.01, fence = TRUE, iter.max = 100, show.iter = 0, nstart=2, return.all=FALSE, check.total.similarity=FALSE)
kma(x, y0 = NULL, y1 = NULL, n.clust = 1, warping.method = "affine", similarity.method = "d1.pearson", center.method = "k-means", seeds = NULL, optim.method = "L-BFGS-B", span = 0.15, t.max = 0.1, m.max = 0.1, n.out = NULL, tol = 0.01, fence = TRUE, iter.max = 100, show.iter = 0, nstart=2, return.all=FALSE, check.total.similarity=FALSE)
x |
matrix n.func X grid.size or vector grid.size:
the abscissa values where each function is evaluated. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each function is evaluated. The abscissa points may be unevenly spaced and they may differ from function to function. |
y0 |
matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions on the abscissa grid |
y1 |
matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions first derivatives on the abscissa grid |
n.clust |
scalar: required number of clusters. Default value is |
warping.method |
character: type of alignment required. If |
similarity.method |
character: required similarity measure. Possible choices are: |
center.method |
character: type of clustering method to be used. Possible choices are: |
seeds |
vector max(n.clust) or matrix nstart X n.clust: indexes of the functions to be used as initial centers. If it is a matrix, each row contains the indexes of the initial centers of one of the |
.
optim.method |
character: optimization method chosen to find the best warping functions at each iteration. Possible choices are: |
span |
scalar: the span to be used for the loess procedure in the center estimation step when |
t.max |
scalar: |
m.max |
scalar: |
n.out |
scalar: the desired length of the abscissa for computation of the similarity indexes and the centers. Default value is |
tol |
scalar: the algorithm stops when the increment of similarity of each function with respect to the corrispondent center is lower than |
fence |
boolean: if |
iter.max |
scalar: maximum number of iterations in the k-mean alignment cycle. Default value is |
show.iter |
boolean: if |
nstart |
scalar: number of initializations with different seeds. Default value is |
return.all |
boolean: if |
check.total.similarity |
boolean: if |
The function output is a list containing the following elements:
iterations |
scalar: total number of iterations performed by kma function. |
x |
as input. |
y0 |
as input. |
y1 |
as input. |
n.clust |
as input. |
warping.method |
as input. |
similarity.method |
as input. |
center.method |
as input. |
x.center.orig |
vector n.out: abscissa of the original center. |
y0.center.orig |
matrix 1 X n.out: the unique row contains the evaluations of the original function center. If |
y1.center.orig |
matrix 1 X n.out: the unique row contains the evaluations of the original function first derivatives center. If |
similarity.orig |
vector: original similarities between the original functions and the original center. |
x.final |
matrix n.func X grid.size: aligned abscissas. |
n.clust.final |
scalar: final number of clusters. Note that, when |
x.centers.final |
vector n.out: abscissas of the final function centers and/or of the final function first derivatives centers. |
y0.centers.final |
matrix n.clust.final X n.out: rows contain the evaluations of the final functions centers. |
y1.centers.final |
matrix n.clust.final X n.out: rows contains the evaluations of the final derivatives centers. |
labels |
vector: cluster assignments. |
similarity.final |
vector: similarities between each function and the center of the cluster the function is assigned to. |
dilation.list |
list: dilations obtained at each iteration of kma function. |
shift.list |
list: shifts obtained at each iteration of kma function. |
dilation |
vector: dilation applied to the original abscissas |
shift |
vector: shift applied to the original abscissas |
Alice Parodi, Mirco Patriarca, Laura Sangalli, Piercesare Secchi, Simone Vantini, Valeria Vitelli.
Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V., 2010. "K-mean alignment for curve clustering". Computational Statistics and Data Analysis, 54, 1219-1233.
Sangalli, L.M., Secchi, P., Vantini, S., 2014. "Analysis of AneuRisk65 data: K-mean Alignment". Electronic Journal of Statistics, Special Section on "Statistics of Time Warpings and Phase Variations", Vol. 8, No. 2, 1891-1904.
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,0.1)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,0.1)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,0.1)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,0.1)} } } C <- kma(x.coord, x, n.clust = 4, warping.method = "NOalignment", similarity.method = "d0.pearson") table(C$labels, labels)
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,0.1)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,0.1)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,0.1)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,0.1)} } } C <- kma(x.coord, x, n.clust = 4, warping.method = "NOalignment", similarity.method = "d0.pearson") table(C$labels, labels)
kma.similarity computes a similarity/dissimilarity measure between two functions and
. Users can choose among different types of measures.
kma.similarity(x.f = NULL, y0.f = NULL, y1.f = NULL, x.g = NULL, y0.g = NULL, y1.g = NULL, similarity.method, unif.grid = TRUE)
kma.similarity(x.f = NULL, y0.f = NULL, y1.f = NULL, x.g = NULL, y0.g = NULL, y1.g = NULL, similarity.method, unif.grid = TRUE)
x.f |
vector length.f: abscissa grid where function |
y0.f |
vector length.f or matrix length.f X d: evaluations of function |
y1.f |
vector length.f or matrix length.f X d: evaluations of |
x.g |
vector length.g: abscissa grid where function |
y0.g |
vector length.g or matrix length.g X d: evaluations of function |
y1.g |
vector length.g or matrix length.g X d: evaluations of |
similarity.method |
character: similarity/dissimilarity between |
unif.grid |
boolean: if equal to |
We report the list of the currently available similarities/dissimilarities. Note that all norms and inner products are computed over , that is the intersection of the domains of
and
.
and
denote the mean value, respectively, of functions
and
.
1. 'd0.pearson'
: this similarity measure is the cosine of the angle between the two functions and
.
2. 'd1.pearson'
: this similarity measure is the cosine of the angle between the two function derivatives and
.
3. 'd0.L2'
: this dissimilarity measure is the L2 distance of the two functions and
normalized by the length of the common domain
.
4. 'd1.L2'
: this dissimilarity measure is the L2 distance of the two function first derivatives and
normalized by the length of the common domain
.
5. 'd0.L2.centered'
: this dissimilarity measure is the L2 distance of and
normalized by the length of the common domain
.
6. 'd1.L2.centered'
: this dissimilarity measure is the L2 distance of and
normalized by the length of the common domain
.
For multidimensional functions, if similarity.method='d0.pearson'
or 'd1.pearson'
the similarity/dissimilarity measure is computed via the average of the indexes in all directions.
The coherence properties specified in Sangalli et al. (2010) implies that if similarity.method
is set to 'd0.L2'
, 'd1.L2'
, 'd0.L2.centered'
or 'd1.L2.centered'
, value of warping.method
must be 'shift'
or 'NOalignment'
. If similarity.method
is set to 'd0.pearson'
or 'd1.pearson'
all values for warping.method
are allowed.
scalar: similarity/dissimilarity measure between the two functions and
computed via the similarity/dissimilarity measure specified.
Alice Parodi, Mirco Patriarca, Laura Sangalli, Piercesare Secchi, Simone Vantini, Valeria Vitelli.
Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V., 2010. "K-mean alignment for curve clustering". Computational Statistics and Data Analysis, 54, 1219-1233.
Sangalli, L.M., Secchi, P., Vantini, S., 2014. "Analysis of AneuRisk65 data: K-mean Alignment". Electronic Journal of Statistics, Special Section on "Statistics of Time Warpings and Phase Variations", Vol. 8, No. 2, 1891-1904.
plotKmeansClustering
represents, in different subpanels, each of the clusters obtained after running k-means. The corresponding centroid is highlighted.
plotKmeansClustering(x, kmeansObj, col=c(8,2), lty=c(2,1), x.coord = NULL, no.ticks = 5, ...)
plotKmeansClustering(x, kmeansObj, col=c(8,2), lty=c(2,1), x.coord = NULL, no.ticks = 5, ...)
x |
a data matrix containing |
kmeansObj |
an object of class |
col |
a vector containing colors for the elements in |
lty |
a vector containing the line type for the elements in |
x.coord |
initial x coordinates (time points) where the functional data is observed; if not provided, it is assumed to be |
no.ticks |
number of ticks to be displayed in the X axis |
... |
additional arguments to be passed to the |
The function creates a suitable grid where to plot the different clusters independently. In the i-th cell of the grid, the data points corresponding to the i-th cluster are represented in parallel coordinates and the final centroid is highlighted.
the function returns invisibly a list with the following components:
clusters |
a list containing one cluster per component; observations are given by rows |
centroids |
a list with the centroid of each cluster |
Javier Albert Smet [email protected] and Aurora Torrente [email protected]
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } plotKmeansClustering(x, kmeans(x,4)) plotKmeansClustering(x, brik(x,4)$km) plotKmeansClustering(x, fabrik(x,4)$km) plotKmeansClustering(x, fabrik(x,4,degFr=10)$km)
## simulated data set.seed(1) x.coord = seq(0,1,0.01) x <- matrix(ncol = length(x.coord), nrow = 100) labels <- matrix(ncol = 100, nrow = 1) centers <- matrix(ncol = length(x.coord), nrow = 4) centers[1, ] <- abs(x.coord)-0.5 centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8 centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7 centers[4, ] <- 0.75*sin(8*pi*abs(x.coord)) for(i in 1:4){ for(j in 1:25){ labels[25*(i-1) + j] <- i if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + rnorm(length(x.coord),0,1.5)} if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + rnorm(length(x.coord),0,1.5)} if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + rnorm(length(x.coord),0,1.5)} if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + rnorm(length(x.coord),0,1.5)} } } plotKmeansClustering(x, kmeans(x,4)) plotKmeansClustering(x, brik(x,4)$km) plotKmeansClustering(x, fabrik(x,4)$km) plotKmeansClustering(x, fabrik(x,4,degFr=10)$km)