Title: | Implements Measures for the Comparison of Two Partitions |
---|---|
Description: | Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) <doi:10.1007/s00357-006-0017-z> and Meila M (2007) <doi:10.1016/j.jmva.2006.11.013>. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system. |
Authors: | Fabian Ball [aut, cre, cph, ctb], Andreas Geyer-Schulz [cph] |
Maintainer: | Fabian Ball <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.6 |
Built: | 2024-10-13 06:35:14 UTC |
Source: | https://github.com/kit-iism-em/partitioncomparison |
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) doi:10.1007/s00357-006-0017-z and Meila M (2007) doi:10.1016/j.jmva.2006.11.013. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
This package provides a large collection of measures to compare two partitions. Some survey articles for these measures are cited below, the seminal papers for each individual measure is provided with the function definition.
Most functionality is implemented as S4 classes and methods so that an
adoption is easily possible for special needs and specifications.
The main class is Partition
which merely wraps an atomic
vector of length for storing the class label of each object.
The computation of all measures is designed to work on vectors
of class labels.
All partition comparison methods can be called in the
same way: <measure method>(p, q)
with p, q
being the two
partitions (as Partition
instances).
One often does not explicitly want to transform the vector of class labels
(as output of another package's function/algorithm) into
Partition
instances before using measures from this
package. For convenience, the function
registerPartitionVectorSignatures
exists which dynamically creates
versions of all measures that will directly work with plain R vectors.
Maintainer: Fabian Ball [email protected] [copyright holder, contributor]
Other contributors:
Andreas Geyer-Schulz [email protected] [copyright holder]
Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006). “On Similarity Indices and Correction for Chance Agreement.” Journal of Classification, 23(2), 301–313. ISSN 0176-4268, doi:10.1007/s00357-006-0017-z.
Meila M (2007). “Comparing Clusterings–an Information Based Distance.” Journal of Multivariate Analysis, 98(5), 873–895. doi:10.1016/j.jmva.2006.11.013.
Useful links:
Report bugs at https://github.com/KIT-IISM-EM/partitionComparison/issues
# Generate some data set.seed(42) data <- cbind(x=c(rnorm(50), rnorm(30, mean=5)), y=c(rnorm(50), rnorm(30, mean=5))) # Run k-means with two/three centers data.km2 <- kmeans(data, 2) data.km3 <- kmeans(data, 3) # Load this library library(partitionComparison) # Register the measures to take ANY input registerPartitionVectorSignatures(environment()) # Compare the clusters randIndex(data.km2$cluster, data.km3$cluster) # [1] 0.8101266
# Generate some data set.seed(42) data <- cbind(x=c(rnorm(50), rnorm(30, mean=5)), y=c(rnorm(50), rnorm(30, mean=5))) # Run k-means with two/three centers data.km2 <- kmeans(data, 2) data.km3 <- kmeans(data, 3) # Load this library library(partitionComparison) # Register the measures to take ANY input registerPartitionVectorSignatures(environment()) # Compare the clusters randIndex(data.km2$cluster, data.km3$cluster) # [1] 0.8101266
This method overrides the standard subsetting to prevent alteration (makes partitions, i.e. class labels, immutable).
## S4 replacement method for signature 'Partition' x[i, j] <- value
## S4 replacement method for signature 'Partition' x[i, j] <- value
x |
A Partition instance |
i |
|
j |
|
value |
Fabian Ball [email protected]
Compute the Adjusted Rand Index (ARI)
adjustedRandIndex(p, q) ## S4 method for signature 'Partition,Partition' adjustedRandIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' adjustedRandIndex(p, q = NULL)
adjustedRandIndex(p, q) ## S4 method for signature 'Partition,Partition' adjustedRandIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' adjustedRandIndex(p, q = NULL)
p |
The partition |
q |
The partition |
adjustedRandIndex(p = Partition, q = Partition)
: Compute given two partitions
adjustedRandIndex(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Hubert L, Arabie P (1985). “Comparing Partitions.” Journal of Classification, 2(1), 193–218.
isTRUE(all.equal(adjustedRandIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
isTRUE(all.equal(adjustedRandIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
Compute the index 1 of Baulieu
baulieu1(p, q) ## S4 method for signature 'Partition,Partition' baulieu1(p, q) ## S4 method for signature 'PairCoefficients,missing' baulieu1(p, q = NULL)
baulieu1(p, q) ## S4 method for signature 'Partition,Partition' baulieu1(p, q) ## S4 method for signature 'PairCoefficients,missing' baulieu1(p, q = NULL)
p |
The partition |
q |
The partition |
baulieu1(p = Partition, q = Partition)
: Compute given two partitions
baulieu1(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Baulieu FB (1989). “A Classification of Presence/Absence Based Dissimilarity Coefficients.” Journal of Classification, 6(1), 233–246. ISSN 0176-4268, 1432-1343, doi:10.1007/BF01908601.
isTRUE(all.equal(baulieu1(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.76))
isTRUE(all.equal(baulieu1(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.76))
Compute the index 2 of Baulieu
baulieu2(p, q) ## S4 method for signature 'Partition,Partition' baulieu2(p, q) ## S4 method for signature 'PairCoefficients,missing' baulieu2(p, q = NULL)
baulieu2(p, q) ## S4 method for signature 'Partition,Partition' baulieu2(p, q) ## S4 method for signature 'PairCoefficients,missing' baulieu2(p, q = NULL)
p |
The partition |
q |
The partition |
baulieu2(p = Partition, q = Partition)
: Compute given two partitions
baulieu2(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Baulieu FB (1989). “A Classification of Presence/Absence Based Dissimilarity Coefficients.” Journal of Classification, 6(1), 233–246. ISSN 0176-4268, 1432-1343, doi:10.1007/BF01908601.
isTRUE(all.equal(baulieu2(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.04))
isTRUE(all.equal(baulieu2(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.04))
Compute the classification error distance
with a weighted matching between the clusters of both partitions.
The nodes are the classes of each partition, the weights are the overlap of objects.
classificationErrorDistance(p, q) ## S4 method for signature 'Partition,Partition' classificationErrorDistance(p, q)
classificationErrorDistance(p, q) ## S4 method for signature 'Partition,Partition' classificationErrorDistance(p, q)
p |
The partition |
q |
The partition |
classificationErrorDistance(p = Partition, q = Partition)
: Compute given two partitions
This measure is implemented using lp.assign
from
the lpSolve
package to compute the maxmimal matching of a
weighted bipartite graph.
Fabian Ball [email protected]
Meila M, Heckerman D (2001). “An Experimental Comparison of Model-Based Clustering Methods.” Machine Learning, 42(1), 9–29.
Meila M (2005). “Comparing Clusterings: An Axiomatic View.” In Proceedings of the 22nd International Conference on Machine Learning, ICML '05, 577–584. ISBN 978-1-59593-180-1, doi:10.1145/1102351.1102424.
isTRUE(all.equal(classificationErrorDistance(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
isTRUE(all.equal(classificationErrorDistance(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
Compute the comparison between two partitions for all available measures.
compareAll(p, q) ## S4 method for signature 'Partition,Partition' compareAll(p, q)
compareAll(p, q) ## S4 method for signature 'Partition,Partition' compareAll(p, q)
p |
The partition |
q |
The partition |
Instance of data.frame
with columns measure
and value
compareAll(p = Partition, q = Partition)
: Compare given two Partition
instances
This method will identify every generic S4 method that has a signature
"Partition", "Partition"
(including signatures with following "missing"
parameters, e.g. "Partition", "Partition", "missing"
) as a partition
comparison measure, except this method itself (otherwise: infinite
recursion). This means one has to take care when defining other methods with the same
signature in order not to produce unwanted side-effects!
Fabian Ball [email protected]
compareAll(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) ## Not run: measure value 1 adjustedRandIndex 0.166666667 2 baulieu1 0.760000000 3 baulieu2 0.040000000 4 classificationErrorDistance 0.200000000 5 czekanowski 0.500000000 6 dongensMetric 2.000000000 7 fagerMcGowan 0.250000000 8 folwkesMallowsIndex 0.500000000 9 gammaStatistics 0.166666667 10 goodmanKruskal 0.333333333 11 gowerLegendre 0.750000000 12 hamann 0.200000000 13 jaccardCoefficient 0.333333333 14 kulczynski 0.500000000 15 larsenAone 0.800000000 16 lermanIndex 0.436435780 17 mcconnaughey 0.000000000 18 minkowskiMeasure 1.000000000 19 mirkinMetric 8.000000000 20 mutualInformation 0.291103166 21 normalizedLermanIndex 0.166666667 22 normalizedMutualInformation 0.432538068 23 pearson 0.006944444 24 peirce 0.166666667 25 randIndex 0.600000000 26 rogersTanimoto 0.428571429 27 russelRao 0.200000000 28 rvCoefficient 0.692307692 29 sokalSneath1 0.583333333 30 sokalSneath2 0.200000000 31 sokalSneath3 0.333333333 32 variationOfInformation 0.763817002 33 wallaceI 0.500000000 34 wallaceII 0.500000000 ## End(Not run)
compareAll(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) ## Not run: measure value 1 adjustedRandIndex 0.166666667 2 baulieu1 0.760000000 3 baulieu2 0.040000000 4 classificationErrorDistance 0.200000000 5 czekanowski 0.500000000 6 dongensMetric 2.000000000 7 fagerMcGowan 0.250000000 8 folwkesMallowsIndex 0.500000000 9 gammaStatistics 0.166666667 10 goodmanKruskal 0.333333333 11 gowerLegendre 0.750000000 12 hamann 0.200000000 13 jaccardCoefficient 0.333333333 14 kulczynski 0.500000000 15 larsenAone 0.800000000 16 lermanIndex 0.436435780 17 mcconnaughey 0.000000000 18 minkowskiMeasure 1.000000000 19 mirkinMetric 8.000000000 20 mutualInformation 0.291103166 21 normalizedLermanIndex 0.166666667 22 normalizedMutualInformation 0.432538068 23 pearson 0.006944444 24 peirce 0.166666667 25 randIndex 0.600000000 26 rogersTanimoto 0.428571429 27 russelRao 0.200000000 28 rvCoefficient 0.692307692 29 sokalSneath1 0.583333333 30 sokalSneath2 0.200000000 31 sokalSneath3 0.333333333 32 variationOfInformation 0.763817002 33 wallaceI 0.500000000 34 wallaceII 0.500000000 ## End(Not run)
,
,
,
Given two object partitions P and Q, of same length n,
each of them described as a vector of cluster ids,
compute the four coefficients (,
,
,
)
all of the pair comparison measures are based on.
computePairCoefficients(p, q)
computePairCoefficients(p, q)
p |
The partition |
q |
The partition |
Fabian Ball [email protected]
pc <- computePairCoefficients(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) isTRUE(all.equal(N11(pc), 2)) isTRUE(all.equal(N10(pc), 2)) isTRUE(all.equal(N01(pc), 2)) isTRUE(all.equal(N00(pc), 4))
pc <- computePairCoefficients(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) isTRUE(all.equal(N11(pc), 2)) isTRUE(all.equal(N10(pc), 2)) isTRUE(all.equal(N01(pc), 2)) isTRUE(all.equal(N00(pc), 4))
Compute the Czekanowski index
czekanowski(p, q) ## S4 method for signature 'Partition,Partition' czekanowski(p, q) ## S4 method for signature 'PairCoefficients,missing' czekanowski(p, q = NULL)
czekanowski(p, q) ## S4 method for signature 'Partition,Partition' czekanowski(p, q) ## S4 method for signature 'PairCoefficients,missing' czekanowski(p, q = NULL)
p |
The partition |
q |
The partition |
czekanowski(p = Partition, q = Partition)
: Compute given two partitions
czekanowski(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Czekanowski J (1932). “Coefficient of Racial Likeness" Und ,,Durchschnittliche Differenz".” Anthropologischer Anzeiger, 9(3/4), 227–249.
isTRUE(all.equal(czekanowski(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
isTRUE(all.equal(czekanowski(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
Compute Dongen's metric
dongensMetric(p, q) ## S4 method for signature 'Partition,Partition' dongensMetric(p, q)
dongensMetric(p, q) ## S4 method for signature 'Partition,Partition' dongensMetric(p, q)
p |
The partition |
q |
The partition |
dongensMetric(p = Partition, q = Partition)
: Compute given two partitions
Fabian Ball [email protected]
van Dongen S (2000). “Performance Criteria For Graph Clustering And Markov Cluster Experiments.” Technical Report INS-R 0012, CWI.
isTRUE(all.equal(dongensMetric(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 2))
isTRUE(all.equal(dongensMetric(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 2))
Compute the Shannon entropy
entropy(x, log_base) ## S4 method for signature 'numeric,numeric' entropy(x, log_base) ## S4 method for signature 'Partition,numeric' entropy(x, log_base) ## S4 method for signature 'ANY,missing' entropy(x, log_base = exp(1))
entropy(x, log_base) ## S4 method for signature 'numeric,numeric' entropy(x, log_base) ## S4 method for signature 'Partition,numeric' entropy(x, log_base) ## S4 method for signature 'ANY,missing' entropy(x, log_base = exp(1))
x |
A probability distribution |
log_base |
Optional base of the logarithm (default: |
entropy(x = Partition, log_base = numeric)
: Entropy of a partition represented by x
This method is used internally for measures based on information theory
Fabian Ball [email protected]
isTRUE(all.equal(entropy(c(.5, .5)), log(2))) isTRUE(all.equal(entropy(c(.5, .5), 2), 1)) isTRUE(all.equal(entropy(c(.5, .5), 4), .5)) # Entropy of a partition isTRUE(all.equal(entropy(new("Partition", c(0, 0, 1, 1, 1))), entropy(c(2/5, 3/5))))
isTRUE(all.equal(entropy(c(.5, .5)), log(2))) isTRUE(all.equal(entropy(c(.5, .5), 2), 1)) isTRUE(all.equal(entropy(c(.5, .5), 4), .5)) # Entropy of a partition isTRUE(all.equal(entropy(new("Partition", c(0, 0, 1, 1, 1))), entropy(c(2/5, 3/5))))
Compute the index of Fager and McGowan
fagerMcGowan(p, q) ## S4 method for signature 'Partition,Partition' fagerMcGowan(p, q) ## S4 method for signature 'PairCoefficients,missing' fagerMcGowan(p, q = NULL)
fagerMcGowan(p, q) ## S4 method for signature 'Partition,Partition' fagerMcGowan(p, q) ## S4 method for signature 'PairCoefficients,missing' fagerMcGowan(p, q = NULL)
p |
The partition |
q |
The partition |
fagerMcGowan(p = Partition, q = Partition)
: Compute given two partitions
fagerMcGowan(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Fager EW, McGowan JOHNA (1963). “Zooplankton Species Groups in the North Pacific Co-Occurrences of Species Can Be Used to Derive Groups Whose Members React Similarly to Water-Mass Types.” Science, 140(3566), 453–460.
isTRUE(all.equal(fagerMcGowan(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.25))
isTRUE(all.equal(fagerMcGowan(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.25))
Compute the index of Folwkes and Mallows
which is a combination of the two Wallace indices.
folwkesMallowsIndex(p, q) ## S4 method for signature 'Partition,Partition' folwkesMallowsIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' folwkesMallowsIndex(p, q = NULL)
folwkesMallowsIndex(p, q) ## S4 method for signature 'Partition,Partition' folwkesMallowsIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' folwkesMallowsIndex(p, q = NULL)
p |
The partition |
q |
The partition |
folwkesMallowsIndex(p = Partition, q = Partition)
: Compute given two partitions
folwkesMallowsIndex(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Fowlkes EB, Mallows CL (1983). “A Method for Comparing Two Hierarchical Clusterings.” Journal of the American Statistical Association, 78(383), 553–569.
isTRUE(all.equal(folwkesMallowsIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
isTRUE(all.equal(folwkesMallowsIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
Compute the Gamma statistics
gammaStatistics(p, q) ## S4 method for signature 'Partition,Partition' gammaStatistics(p, q) ## S4 method for signature 'PairCoefficients,missing' gammaStatistics(p, q = NULL)
gammaStatistics(p, q) ## S4 method for signature 'Partition,Partition' gammaStatistics(p, q) ## S4 method for signature 'PairCoefficients,missing' gammaStatistics(p, q = NULL)
p |
The partition |
q |
The partition |
gammaStatistics(p = Partition, q = Partition)
: Compute given two partitions
gammaStatistics(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Yule GU (1900). “On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c.” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 194, 257–319.
isTRUE(all.equal(gammaStatistics(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
isTRUE(all.equal(gammaStatistics(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
Compute the index of Goodman and Kruskal
goodmanKruskal(p, q) ## S4 method for signature 'Partition,Partition' goodmanKruskal(p, q) ## S4 method for signature 'PairCoefficients,missing' goodmanKruskal(p, q)
goodmanKruskal(p, q) ## S4 method for signature 'Partition,Partition' goodmanKruskal(p, q) ## S4 method for signature 'PairCoefficients,missing' goodmanKruskal(p, q)
p |
The partition |
q |
The partition |
goodmanKruskal(p = Partition, q = Partition)
: Compute given two partitions
goodmanKruskal(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Goodman LA, Kruskal WH (1954). “Measures of Association for Cross Classifications.” Journal of the American Statistical Association, 49(268), 732–764. ISSN 0162-1459, doi:10.1080/01621459.1954.10501231.
isTRUE(all.equal(goodmanKruskal(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
isTRUE(all.equal(goodmanKruskal(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
Compute the index of Gower and Legendre
gowerLegendre(p, q) ## S4 method for signature 'Partition,Partition' gowerLegendre(p, q) ## S4 method for signature 'PairCoefficients,missing' gowerLegendre(p, q)
gowerLegendre(p, q) ## S4 method for signature 'Partition,Partition' gowerLegendre(p, q) ## S4 method for signature 'PairCoefficients,missing' gowerLegendre(p, q)
p |
The partition |
q |
The partition |
gowerLegendre(p = Partition, q = Partition)
: Compute given two partitions
gowerLegendre(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Gower JC, Legendre P (1986). “Metric and Euclidean Properties of Dissimilarity Coefficients.” Journal of Classification, 3(1), 5–48. ISSN 0176-4268, 1432-1343, doi:10.1007/BF01896809.
isTRUE(all.equal(gowerLegendre(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.75))
isTRUE(all.equal(gowerLegendre(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.75))
Compute the Hamann coefficient
hamann(p, q) ## S4 method for signature 'Partition,Partition' hamann(p, q) ## S4 method for signature 'PairCoefficients,missing' hamann(p, q = NULL)
hamann(p, q) ## S4 method for signature 'Partition,Partition' hamann(p, q) ## S4 method for signature 'PairCoefficients,missing' hamann(p, q = NULL)
p |
The partition |
q |
The partition |
hamann(p = Partition, q = Partition)
: Compute given two partitions
hamann(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Hamann U (1961). “Merkmalsbestand Und Verwandtschaftsbeziehungen Der Farinosae: Ein Beitrag Zum System Der Monokotyledonen.” Willdenowia, 2(5), 639–768. ISSN 0511-9618.
isTRUE(all.equal(hamann(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
isTRUE(all.equal(hamann(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
Compute the Jaccard coefficient
jaccardCoefficient(p, q) ## S4 method for signature 'Partition,Partition' jaccardCoefficient(p, q) ## S4 method for signature 'PairCoefficients,missing' jaccardCoefficient(p, q = NULL)
jaccardCoefficient(p, q) ## S4 method for signature 'Partition,Partition' jaccardCoefficient(p, q) ## S4 method for signature 'PairCoefficients,missing' jaccardCoefficient(p, q = NULL)
p |
The partition |
q |
The partition |
jaccardCoefficient(p = Partition, q = Partition)
: Compute given two partitions
jaccardCoefficient(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Jaccard P (1908). “Nouvelles Recherches Sur La Distribution Florale.” Bulletin de la Société Vaudoise des Sciences Naturelles, 44(163), 223–270.
isTRUE(all.equal(jaccardCoefficient(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
isTRUE(all.equal(jaccardCoefficient(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
Compute the Kulczynski index
kulczynski(p, q) ## S4 method for signature 'Partition,Partition' kulczynski(p, q) ## S4 method for signature 'PairCoefficients,missing' kulczynski(p, q = NULL)
kulczynski(p, q) ## S4 method for signature 'Partition,Partition' kulczynski(p, q) ## S4 method for signature 'PairCoefficients,missing' kulczynski(p, q = NULL)
p |
The partition |
q |
The partition |
kulczynski(p = Partition, q = Partition)
: Compute given two partitions
kulczynski(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Kulczynski S (1927). “Zespoly Roslin w Pieninach.” Bull. Intern. Acad. Pol. Sci. Lett. Cl. Sci. Math. Nat., B (Sci. Nat.), 1927(Suppl 2), 57–203.
isTRUE(all.equal(kulczynski(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
isTRUE(all.equal(kulczynski(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
Compute the measure of Larsen and Aone
larsenAone(p, q) ## S4 method for signature 'Partition,Partition' larsenAone(p, q)
larsenAone(p, q) ## S4 method for signature 'Partition,Partition' larsenAone(p, q)
p |
The partition |
q |
The partition |
larsenAone(p = Partition, q = Partition)
: Compute given two partitions
Fabian Ball [email protected]
Larsen B, Aone C (1999). “Fast and Effective Text Mining Using Linear-Time Document Clustering.” In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '99, 16–22. ISBN 1-58113-143-7, doi:10.1145/312129.312186.
isTRUE(all.equal(larsenAone(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.8))
isTRUE(all.equal(larsenAone(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.8))
Compute the Lerman index
lermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,missing' lermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,PairCoefficients' lermanIndex(p, q, c = NULL)
lermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,missing' lermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,PairCoefficients' lermanIndex(p, q, c = NULL)
p |
The partition |
q |
The partition |
c |
PairCoefficients or NULL |
lermanIndex(p = Partition, q = Partition, c = missing)
: Compute given two partitions
lermanIndex(p = Partition, q = Partition, c = PairCoefficients)
: Compute given the partitions and pair coefficients
Fabian Ball [email protected]
Lerman IC (1988). “Comparing Partitions (Mathematical and Statistical Aspects).” In Bock H (ed.), Classification and Related Methods of Data Analysis, 121–132.
Hubert L, Arabie P (1985). “Comparing Partitions.” Journal of Classification, 2(1), 193–218.
Denœud L, Guénoche A (2006). “Comparison of Distance Indices Between Partitions.” In Batagelj V, Bock H, Ferligoj A, Žiberna A (eds.), Data Science and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, 21–28. Springer Berlin Heidelberg. ISBN 978-3-540-34415-5 978-3-540-34416-2.
isTRUE(all.equal(lermanIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 2/sqrt(21)))
isTRUE(all.equal(lermanIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 2/sqrt(21)))
Compute the McConnaughey index
mcconnaughey(p, q) ## S4 method for signature 'Partition,Partition' mcconnaughey(p, q) ## S4 method for signature 'PairCoefficients,missing' mcconnaughey(p, q = NULL)
mcconnaughey(p, q) ## S4 method for signature 'Partition,Partition' mcconnaughey(p, q) ## S4 method for signature 'PairCoefficients,missing' mcconnaughey(p, q = NULL)
p |
The partition |
q |
The partition |
mcconnaughey(p = Partition, q = Partition)
: Compute given two partitions
mcconnaughey(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
McConnaughey BH, Laut LP (1964). The Determination and Analysis of Plankton Communities. Lembaga Penelitian Laut.
isTRUE(all.equal(mcconnaughey(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0))
isTRUE(all.equal(mcconnaughey(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0))
Compute the Minkowski measure
minkowskiMeasure(p, q) ## S4 method for signature 'Partition,Partition' minkowskiMeasure(p, q) ## S4 method for signature 'PairCoefficients,missing' minkowskiMeasure(p, q = NULL)
minkowskiMeasure(p, q) ## S4 method for signature 'Partition,Partition' minkowskiMeasure(p, q) ## S4 method for signature 'PairCoefficients,missing' minkowskiMeasure(p, q = NULL)
p |
The partition |
q |
The partition |
minkowskiMeasure(p = Partition, q = Partition)
: Compute given two partitions
minkowskiMeasure(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Minkowski H (1911). Gesammelte Abhandlungen von Hermann Minkowski, Zweiter Band, number 2. B. G. Teubner, Leipzig, Berlin.
isTRUE(all.equal(minkowskiMeasure(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1))
isTRUE(all.equal(minkowskiMeasure(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1))
Compute the Mirkin metric
mirkinMetric(p, q) ## S4 method for signature 'Partition,Partition' mirkinMetric(p, q) ## S4 method for signature 'PairCoefficients,missing' mirkinMetric(p, q = NULL)
mirkinMetric(p, q) ## S4 method for signature 'Partition,Partition' mirkinMetric(p, q) ## S4 method for signature 'PairCoefficients,missing' mirkinMetric(p, q = NULL)
p |
The partition |
q |
The partition |
mirkinMetric(p = Partition, q = Partition)
: Compute given two partitions
mirkinMetric(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Mirkin BG, Chernyi LB (1970). “Measurement of the Distance Between Partitions of a Finite Set of Objects.” Automation and Remote Control, 31(5), 786–792.
isTRUE(all.equal(mirkinMetric(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 8))
isTRUE(all.equal(mirkinMetric(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 8))
Compute the mutual information
mutualInformation(p, q) ## S4 method for signature 'Partition,Partition' mutualInformation(p, q)
mutualInformation(p, q) ## S4 method for signature 'Partition,Partition' mutualInformation(p, q)
p |
The partition |
q |
The partition |
mutualInformation(p = Partition, q = Partition)
: Compute given two partitions
Fabian Ball [email protected]
Vinh NX, Epps J, Bailey J (2010). “Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance.” Journal of Machine Learning Research, 11, 2837–2854.
isTRUE(all.equal(mutualInformation(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 4/5*log(5/3) + 1/5*log(5/9)))
isTRUE(all.equal(mutualInformation(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 4/5*log(5/3) + 1/5*log(5/9)))
It is defined as which equals
with
the number of objects
N(obj) ## S4 method for signature 'PairCoefficients' N(obj)
N(obj) ## S4 method for signature 'PairCoefficients' N(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
Method to retrieve the coefficient
N00(obj) ## S4 method for signature 'PairCoefficients' N00(obj)
N00(obj) ## S4 method for signature 'PairCoefficients' N00(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
Method to retrieve the coefficient
N01(obj) ## S4 method for signature 'PairCoefficients' N01(obj)
N01(obj) ## S4 method for signature 'PairCoefficients' N01(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
It is defined as
N01p(obj) ## S4 method for signature 'PairCoefficients' N01p(obj)
N01p(obj) ## S4 method for signature 'PairCoefficients' N01p(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
Method to retrieve the coefficient
N10(obj) ## S4 method for signature 'PairCoefficients' N10(obj)
N10(obj) ## S4 method for signature 'PairCoefficients' N10(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
It is defined as
N10p(obj) ## S4 method for signature 'PairCoefficients' N10p(obj)
N10p(obj) ## S4 method for signature 'PairCoefficients' N10p(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
Method to retrieve the coefficient
N11(obj) ## S4 method for signature 'PairCoefficients' N11(obj)
N11(obj) ## S4 method for signature 'PairCoefficients' N11(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
It is defined as
N12(obj) ## S4 method for signature 'PairCoefficients' N12(obj)
N12(obj) ## S4 method for signature 'PairCoefficients' N12(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
It is defined as
N21(obj) ## S4 method for signature 'PairCoefficients' N21(obj)
N21(obj) ## S4 method for signature 'PairCoefficients' N21(obj)
obj |
Instance of PairCoefficients |
Fabian Ball [email protected]
Compute the normalized Lerman index
where is the Lerman index.
normalizedLermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,missing' normalizedLermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,PairCoefficients' normalizedLermanIndex(p, q, c = NULL)
normalizedLermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,missing' normalizedLermanIndex(p, q, c = NULL) ## S4 method for signature 'Partition,Partition,PairCoefficients' normalizedLermanIndex(p, q, c = NULL)
p |
The partition |
q |
The partition |
c |
PairCoefficients or NULL |
normalizedLermanIndex(p = Partition, q = Partition, c = missing)
: Compute given two partitions
normalizedLermanIndex(p = Partition, q = Partition, c = PairCoefficients)
: Compute given the partitions and pair coefficients
Fabian Ball [email protected]
Lerman IC (1988). “Comparing Partitions (Mathematical and Statistical Aspects).” In Bock H (ed.), Classification and Related Methods of Data Analysis, 121–132.
Hubert L, Arabie P (1985). “Comparing Partitions.” Journal of Classification, 2(1), 193–218.
isTRUE(all.equal(normalizedLermanIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
isTRUE(all.equal(normalizedLermanIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
Compute the mutual information () which is normalized either by the
minimum/maximum partition entropy (
)
or the sum
normalizedMutualInformation(p, q, type = c("min", "max", "sum")) ## S4 method for signature 'Partition,Partition,character' normalizedMutualInformation(p, q, type = c("min", "max", "sum")) ## S4 method for signature 'Partition,Partition,missing' normalizedMutualInformation(p, q, type = NULL)
normalizedMutualInformation(p, q, type = c("min", "max", "sum")) ## S4 method for signature 'Partition,Partition,character' normalizedMutualInformation(p, q, type = c("min", "max", "sum")) ## S4 method for signature 'Partition,Partition,missing' normalizedMutualInformation(p, q, type = NULL)
p |
The partition |
q |
The partition |
type |
One of "min" (default), "max" or "sum" |
normalizedMutualInformation(p = Partition, q = Partition, type = character)
: Compute given two partitions
normalizedMutualInformation(p = Partition, q = Partition, type = missing)
: Compute given two partitions with type="min"
Fabian Ball [email protected]
Kvalseth TO (1987). “Entropy and Correlation: Some Comments.” IEEE Transactions on Systems, Man and Cybernetics, 17(3), 517–519. ISSN 0018-9472, doi:10.1109/TSMC.1987.4309069.
isTRUE(all.equal(normalizedMutualInformation( new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1)), "min"), normalizedMutualInformation( new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1)), "max") ))
isTRUE(all.equal(normalizedMutualInformation( new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1)), "min"), normalizedMutualInformation( new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1)), "max") ))
and
).S4 class to represent coefficients of object pairs for the comparison of two
object partitions (say and
).
N11
The number of object pairs that are in both partitions together in a cluster
N00
The number of object pairs that are in no partition together in a cluster
N10
The number of object pairs that are only in partition together in a cluster
N01
The number of object pairs that are only in partition together in a cluster
Fabian Ball [email protected]
This class is a wrapper around a vector but allows only the atomic vectors logical, numeric, integer, complex, character, raw. The reason for this is that only those types seem to make sense as class labels. Furthermore, class labels are immutable.
Fabian Ball [email protected]
p <- new("Partition", c(0, 0, 1, 1, 1)) q <- new("Partition", c("a", "a", "b", "b", "b")) ## Not run: # This won't work: new("Partition", c(list("a"), "a", "b", "b", "b")) p[2] <- 2 ## End(Not run)
p <- new("Partition", c(0, 0, 1, 1, 1)) q <- new("Partition", c("a", "a", "b", "b", "b")) ## Not run: # This won't work: new("Partition", c(list("a"), "a", "b", "b", "b")) p[2] <- 2 ## End(Not run)
Compute the Pearson index
pearson(p, q) ## S4 method for signature 'Partition,Partition' pearson(p, q) ## S4 method for signature 'PairCoefficients,missing' pearson(p, q)
pearson(p, q) ## S4 method for signature 'Partition,Partition' pearson(p, q) ## S4 method for signature 'PairCoefficients,missing' pearson(p, q)
p |
The partition |
q |
The partition |
pearson(p = Partition, q = Partition)
: Compute given two partitions
pearson(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Pearson K (1926). “On the Coefficient of Racial Likeness.” Biometrika, 18(1/2), 105–117. ISSN 0006-3444, doi:10.2307/2332498.
isTRUE(all.equal(pearson(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/144))
isTRUE(all.equal(pearson(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/144))
Compute the Peirce index
peirce(p, q) ## S4 method for signature 'Partition,Partition' peirce(p, q) ## S4 method for signature 'PairCoefficients,missing' peirce(p, q = NULL)
peirce(p, q) ## S4 method for signature 'Partition,Partition' peirce(p, q) ## S4 method for signature 'PairCoefficients,missing' peirce(p, q = NULL)
p |
The partition |
q |
The partition |
peirce(p = Partition, q = Partition)
: Compute given two partitions
peirce(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Peirce CS (1884). “The Numerical Measure of the Success of Predictions.” Science, 4(93), 453–454.
isTRUE(all.equal(peirce(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
isTRUE(all.equal(peirce(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/6))
Given two partitions (p, q) represented as vectors of cluster ids,
compute the projection number which is the sum of maximum
cluster overlaps for all clusters of to any cluster of
.
projectionNumber(p, q)
projectionNumber(p, q)
p |
Partition |
q |
Partition |
Fabian Ball [email protected]
isTRUE(all.equal(projectionNumber(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)), 4))
isTRUE(all.equal(projectionNumber(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)), 4))
Compute the Rand index
randIndex(p, q) ## S4 method for signature 'Partition,Partition' randIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' randIndex(p, q = NULL)
randIndex(p, q) ## S4 method for signature 'Partition,Partition' randIndex(p, q) ## S4 method for signature 'PairCoefficients,missing' randIndex(p, q = NULL)
p |
The partition |
q |
The partition |
randIndex(p = Partition, q = Partition)
: Compute given two partitions
randIndex(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Rand WM (1971). “Objective Criteria for the Evaluation of Clustering Algorithms.” Journal of the American Statistical Association, 66(336), 846–850.
isTRUE(all.equal(randIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.6))
isTRUE(all.equal(randIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.6))
The comparison measures are defined to use the class Partition as parameters. If you do not want to explicitly convert an arbitrary vector of class labels (probably as a result from another package's algorithm) into a Partition instance, calling this function will create methods for all measures that allow "ANY" input which is implicitly converted to Partition.
registerPartitionVectorSignatures(e)
registerPartitionVectorSignatures(e)
e |
The environment to register the methods in
(mostly |
Fabian Ball [email protected]
library(partitionComparison) randIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) # [1] 0.6 ## Not run: randIndex(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)) # Error in (function (classes, fdef, mtable) : # unable to find an inherited method for function 'randIndex' for signature '"numeric", "numeric"' registerPartitionVectorSignatures(environment()) randIndex(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)) # [1] 0.6
library(partitionComparison) randIndex(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))) # [1] 0.6 ## Not run: randIndex(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)) # Error in (function (classes, fdef, mtable) : # unable to find an inherited method for function 'randIndex' for signature '"numeric", "numeric"' registerPartitionVectorSignatures(environment()) randIndex(c(0, 0, 0, 1, 1), c(0, 0, 1, 1, 1)) # [1] 0.6
Compute the index of Rogers and Tanimoto
rogersTanimoto(p, q) ## S4 method for signature 'Partition,Partition' rogersTanimoto(p, q) ## S4 method for signature 'PairCoefficients,missing' rogersTanimoto(p, q)
rogersTanimoto(p, q) ## S4 method for signature 'Partition,Partition' rogersTanimoto(p, q) ## S4 method for signature 'PairCoefficients,missing' rogersTanimoto(p, q)
p |
The partition |
q |
The partition |
rogersTanimoto(p = Partition, q = Partition)
: Compute given two partitions
rogersTanimoto(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Rogers DJ, Tanimoto TT (1960). “A Computer Program for Classifying Plants.” Science, 132(3434), 1115–1118. ISSN 0036-8075, 1095-9203, doi:10.1126/science.132.3434.1115.
isTRUE(all.equal(rogersTanimoto(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 3/7))
isTRUE(all.equal(rogersTanimoto(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 3/7))
Compute the index of Russel and Rao
russelRao(p, q) ## S4 method for signature 'Partition,Partition' russelRao(p, q) ## S4 method for signature 'PairCoefficients,missing' russelRao(p, q = NULL)
russelRao(p, q) ## S4 method for signature 'Partition,Partition' russelRao(p, q) ## S4 method for signature 'PairCoefficients,missing' russelRao(p, q = NULL)
p |
The partition |
q |
The partition |
russelRao(p = Partition, q = Partition)
: Compute given two partitions
russelRao(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Russel PF, Rao TR (1940). “On Habitat and Association of Species of Anopheline Larvae in South-Eastern Madras.” Journal of the Malaria Institute of India, 3(1), 153–178.
isTRUE(all.equal(russelRao(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
isTRUE(all.equal(russelRao(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
Compute the RV coefficient
rvCoefficient(p, q) ## S4 method for signature 'Partition,Partition' rvCoefficient(p, q) ## S4 method for signature 'PairCoefficients,missing' rvCoefficient(p, q = NULL)
rvCoefficient(p, q) ## S4 method for signature 'Partition,Partition' rvCoefficient(p, q) ## S4 method for signature 'PairCoefficients,missing' rvCoefficient(p, q = NULL)
p |
The partition |
q |
The partition |
rvCoefficient(p = Partition, q = Partition)
: Compute the RV coefficient given two partitions
rvCoefficient(p = PairCoefficients, q = missing)
: Compute the RV coefficient given the pair coefficients
Fabian Ball [email protected]
Robert P, Escoufier Y (1976). “A Unifying Tool for Linear Multivariate Statistical Methods: The RV- Coefficient.” Journal of the Royal Statistical Society. Series C (Applied Statistics), 25(3), 257–265. ISSN 00359254.
Youness G, Saporta G (2004). “Some Measures of Agreement between Close Partitions.” Student, 51, 1–12.
isTRUE(all.equal(rvCoefficient(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 9/13))
isTRUE(all.equal(rvCoefficient(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 9/13))
Compute the index 1 of Sokal and Sneath
sokalSneath1(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath1(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath1(p, q = NULL)
sokalSneath1(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath1(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath1(p, q = NULL)
p |
The partition |
q |
The partition |
sokalSneath1(p = Partition, q = Partition)
: Compute given two partitions
sokalSneath1(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Sokal RR, Sneath PHA (1963). Principles of numerical taxonomy.. Freeman, San Francisco.
isTRUE(all.equal(sokalSneath1(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 7/12))
isTRUE(all.equal(sokalSneath1(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 7/12))
Compute the index 2 of Sokal and Sneath
sokalSneath2(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath2(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath2(p, q = NULL)
sokalSneath2(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath2(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath2(p, q = NULL)
p |
The partition |
q |
The partition |
sokalSneath2(p = Partition, q = Partition)
: Compute given two partitions
sokalSneath2(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Sokal RR, Sneath PHA (1963). Principles of numerical taxonomy.. Freeman, San Francisco.
isTRUE(all.equal(sokalSneath2(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
isTRUE(all.equal(sokalSneath2(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.2))
Compute the index 3 of Sokal and Sneath
sokalSneath3(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath3(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath3(p, q = NULL)
sokalSneath3(p, q) ## S4 method for signature 'Partition,Partition' sokalSneath3(p, q) ## S4 method for signature 'PairCoefficients,missing' sokalSneath3(p, q = NULL)
p |
The partition |
q |
The partition |
sokalSneath3(p = Partition, q = Partition)
: Compute given two partitions
sokalSneath3(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Sokal RR, Sneath PHA (1963). Principles of numerical taxonomy.. Freeman, San Francisco.
isTRUE(all.equal(sokalSneath3(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
isTRUE(all.equal(sokalSneath3(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 1/3))
Compute the variation of information
where is the mutual information,
the partition entropy
variationOfInformation(p, q) ## S4 method for signature 'Partition,Partition' variationOfInformation(p, q)
variationOfInformation(p, q) ## S4 method for signature 'Partition,Partition' variationOfInformation(p, q)
p |
The partition |
q |
The partition |
variationOfInformation(p = Partition, q = Partition)
: Compute given two partitions
Fabian Ball [email protected]
Meila M (2003). “Comparing Clusterings by the Variation of Information.” In Schölkopf B, Warmuth MK (eds.), Learning Theory and Kernel Machines, volume 2777 of Lecture Notes in Computer Science, 173–187. Springer Berlin / Heidelberg. ISBN 978-3-540-40720-1.
Meila M (2007). “Comparing Clusterings–an Information Based Distance.” Journal of Multivariate Analysis, 98(5), 873–895. doi:10.1016/j.jmva.2006.11.013.
isTRUE(all.equal(variationOfInformation(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.763817))
isTRUE(all.equal(variationOfInformation(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.763817))
Compute Wallace' index I
wallaceI(p, q) ## S4 method for signature 'Partition,Partition' wallaceI(p, q) ## S4 method for signature 'PairCoefficients,missing' wallaceI(p, q = NULL)
wallaceI(p, q) ## S4 method for signature 'Partition,Partition' wallaceI(p, q) ## S4 method for signature 'PairCoefficients,missing' wallaceI(p, q = NULL)
p |
The partition |
q |
The partition |
wallaceI(p = Partition, q = Partition)
: Compute given two partitions
wallaceI(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Wallace DL (1983). “A Method for Comparing Two Hierarchical Clusterings: Comment.” Journal of the American Statistical Association, 78(383), 569–576.
isTRUE(all.equal(wallaceI(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
isTRUE(all.equal(wallaceI(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
Compute Wallace' index II
wallaceII(p, q) ## S4 method for signature 'Partition,Partition' wallaceII(p, q) ## S4 method for signature 'PairCoefficients,missing' wallaceII(p, q = NULL)
wallaceII(p, q) ## S4 method for signature 'Partition,Partition' wallaceII(p, q) ## S4 method for signature 'PairCoefficients,missing' wallaceII(p, q = NULL)
p |
The partition |
q |
The partition |
wallaceII(p = Partition, q = Partition)
: Compute given two partitions
wallaceII(p = PairCoefficients, q = missing)
: Compute given the pair coefficients
Fabian Ball [email protected]
Wallace DL (1983). “A Method for Comparing Two Hierarchical Clusterings: Comment.” Journal of the American Statistical Association, 78(383), 569–576.
isTRUE(all.equal(wallaceII(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))
isTRUE(all.equal(wallaceII(new("Partition", c(0, 0, 0, 1, 1)), new("Partition", c(0, 0, 1, 1, 1))), 0.5))