| Title: | Finite Mixture Modeling, Clustering & Classification |
|---|---|
| Description: | Random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or circular von Mises parametric families. |
| Authors: | Marko Nagode [aut, cre] (ORCID: <https://orcid.org/0000-0003-0637-3812>), Branislav Panic [ctb] (ORCID: <https://orcid.org/0000-0001-8349-8550>), Jernej Klemenc [ctb] (ORCID: <https://orcid.org/0000-0002-6778-6728>), Simon Oman [ctb] (ORCID: <https://orcid.org/0000-0001-8213-0818>) |
| Maintainer: | Marko Nagode <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.17.1 |
| Built: | 2026-06-09 07:08:24 UTC |
| Source: | https://github.com/cran/rebmix |
The adult dataset containing 48842 instances with 16 continuous, binary and discrete variables was extracted from the census bureau database. Extraction was done by Barry Becker from the 1994 census bureau database.
data(adult)data(adult)
adult is a data frame with 48842 cases (rows) and 16 variables (columns) named:
Type binary train or test.
Age continuous.
Workclass one of the 8 discrete values
private,
self-emp-not-inc,
self-emp-inc,
federal-gov,
local-gov,
state-gov,
without-pay or
never-worked.
Fnlwgt stands for continuous final weight.
Education one of the 16 discrete values
bachelors,
some-college,
11th,
hs-grad,
prof-school,
assoc-acdm,
assoc-voc,
9th,
7th-8th,
12th,
masters,
1st-4th,
10th,
doctorate,
5th-6th or
preschool.
Education.Num continuous.
Marital.Status one of the 7 discrete values
married-civ-spouse,
divorced,
never-married,
separated,
widowed,
married-spouse-absent or
married-af-spouse.
Occupation one of the 14 discrete values
tech-support,
craft-repair,
other-service,
sales,
exec-managerial,
prof-specialty,
handlers-cleaners,
machine-op-inspct,
adm-clerical,
farming-fishing,
transport-moving,
priv-house-serv,
protective-serv or
armed-forces.
Relationship one of the 6 discrete values
wife,
own-child,
husband,
not-in-family,
other-relative or
unmarried.
Race one of the 5 discrete values
white,
asian-pac-islander,
amer-indian-eskimo,
other or
black.
Sex binary female or male.
Capital.Gain continuous.
Capital.Loss continuous.
Hours.Per.Week continuous.
Native.Country one of the 41 discrete values
united-states,
cambodia,
england,
puerto-rico,
canada,
germany,
outlying-us(guam-usvi-etc),
india,
japan,
greece,
south,
china,
cuba,
iran,
honduras,
philippines,
italy,
poland,
jamaica,
vietnam,
mexico,
portugal,
ireland,
france,
dominican-republic,
laos,
ecuador,
taiwan,
haiti,
columbia,
hungary,
guatemala,
nicaragua,
scotland,
thailand,
yugoslavia,
el-salvador,
trinadad&tobago,
peru,
hong or
holand-netherlands.
Income binary <=50k or >50k.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Show level attributes for binary and discrete variables. levels(adult[["Type"]]) levels(adult[["Workclass"]]) levels(adult[["Education"]]) levels(adult[["Marital.Status"]]) levels(adult[["Occupation"]]) levels(adult[["Relationship"]]) levels(adult[["Race"]]) levels(adult[["Sex"]]) levels(adult[["Native.Country"]]) levels(adult[["Income"]])data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Show level attributes for binary and discrete variables. levels(adult[["Type"]]) levels(adult[["Workclass"]]) levels(adult[["Education"]]) levels(adult[["Marital.Status"]]) levels(adult[["Occupation"]]) levels(adult[["Relationship"]]) levels(adult[["Race"]]) levels(adult[["Sex"]]) levels(adult[["Native.Country"]]) levels(adult[["Income"]])
Returns the Akaike information criterion at pos.
## S4 method for signature 'REBMIX' AIC(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC3(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC4(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AICc(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' CAIC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' AIC(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC3(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AIC4(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' AICc(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' CAIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(51):716-723, 1974.
A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear
models. Journal of the Royal Statistical Society. Series B, 42(2):213-220, 1980. https://www.jstor.org/stable/2984964.
H. Bozdogan. Model selection and akaike's information criterion (aic): The general theory and its
analytical extensions. Psychometrika, 52(3):345-370, 1987. doi:10.1007/BF02294361.
C. M. Hurvich and C.-L. Tsai. Regression and time series model selection in small samples. Biometrika,
76(2):297-307, 1989. https://www.jstor.org/stable/2336663.
Returns the approximate weight of evidence criterion at pos.
## S4 method for signature 'REBMIX' AWE(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' AWE(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
J. D. Banfield and A. E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics, 49(3):803-821, 1993. doi:10.2307/2532201.
These data are the results of the extraction process from the vibrational data of healthy and faulty bearings. Different faults are considered: faultless (1), defect on outer race (2), defect on inner race (3) and defect on ball (4). The extracted features are: root mean square (RMS), square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV), peak to peak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF), kurtosis factor (KF), frequency centre (FC), root mean square frequency (RMSF) and root variance frequency (RVF).
data(bearings)data(bearings)
bearings is a data frame with 1906 cases (rows) and 14 variables (columns) named:
RMS continuous.
SRA continuous.
KV continuous.
SV continuous.
PPV continuous.
CF continuous.
IF continuous.
MF continuous.
SF continuous.
KF continuous.
FC continuous.
RMSF continuous.
RVF continuous.
Class discrete 1, 2, 3 or 4.
Case Western Reserve University Bearing Data Center Website https://engineering.case.edu/bearingdatacenter/welcome.
B. Panic, J. Klemenc and M. Nagode. Gaussian mixture model based classification revisited: Application to the bearing fault classification. Journal of Mechanical Engineering, 66(4):215-226, 2020. doi:10.5545/sv-jme.2020.6563.
## Not run: data(bearings) # Split dataset into train (75 set.seed(3) Bearings <- split(p = 0.75, Dataset = bearings, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. bearingsest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Bearings), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. bearingscla <- RCLSMIX(model = "RCLSMVNORM", x = list(bearingsest), Dataset = a.test(Bearings), Zt = a.Zt(Bearings)) bearingscla summary(bearingscla) ## End(Not run)## Not run: data(bearings) # Split dataset into train (75 set.seed(3) Bearings <- split(p = 0.75, Dataset = bearings, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. bearingsest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Bearings), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. bearingscla <- RCLSMIX(model = "RCLSMVNORM", x = list(bearingsest), Dataset = a.test(Bearings), Zt = a.Zt(Bearings)) bearingscla summary(bearingscla) ## End(Not run)
Returns as default the optimized RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model equals "RCLSMVNORM" optimized output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLSMIX' BFSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures## S4 method for signature 'RCLSMIX' BFSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
... |
currently not used. |
Returns an optimized object of class RCLSMIX or RCLSMVNORM.
signature(model = "RCLSMIX")a character giving the default class name "RCLSMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLSMVNORM")a character giving the class name "RCLSMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Marko Nagode
R. Kohavi and G. H. John. Wrappers for feature subset selection, Artificial Intelligence, 97(1-2):273-324, 1997. doi:10.1016/S0004-3702(97)00043-X.
Returns the Bayesian information criterion at pos.
## S4 method for signature 'REBMIX' BIC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' BIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
G. Schwarz. Estimating the dimension of the model. The Annals of Statistics, 6(2):461-464, 1978.
Returns the list of data frames containing bin means and frequencies for the histogram preprocessing.
## S4 method for signature 'list' bins(Dataset = list(), K = matrix(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures## S4 method for signature 'list' bins(Dataset = list(), K = matrix(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
Dataset |
a list of length |
K |
a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
signature(x = "list")a list of data frames.
Branislav Panic, Marko Nagode
M. Nagode. Finite mixture modeling via REBMIX. Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate multivariate normal datasets. n <- c(7, 10) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth unequal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y# Generate multivariate normal datasets. n <- c(7, 10) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth unequal", kmin = 1, kmax = 20) opt.k Y <- bins(Dataset = sim2d@Dataset, K = opt.k) Y
Returns as default the boot output for mixtures of conditionally independent normal,
lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If
x is of class RNGMVNORM the boot output for mixtures of multivariate normal
component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' boot(x = NULL, rseed = -1, pos = 1, Bootstrap = "parametric", B = 100, n = numeric(), replace = TRUE, prob = numeric(), ...) ## ... and for other signatures ## S4 method for signature 'REBMIX.boot' summary(object, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' boot(x = NULL, rseed = -1, pos = 1, Bootstrap = "parametric", B = 100, n = numeric(), replace = TRUE, prob = numeric(), ...) ## ... and for other signatures ## S4 method for signature 'REBMIX.boot' summary(object, ...) ## ... and for other signatures
x |
see Methods section below. |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it.
For each next bootstrap dataset the random seed is decremented |
pos |
a desired row number in |
Bootstrap |
a character giving the bootstrap type. One of default |
B |
number of bootstrap datasets. The default value is |
n |
number of observations. The default value is |
replace |
logical. The sampling is with replacement if |
prob |
a vector of length |
... |
maximum number of components |
object |
see Methods section below. |
Returns an object of class REBMIX.boot or REBMVNORM.boot.
signature(x = "REBMIX")an object of class REBMIX for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(x = "REBMVNORM")an object of class REBMVNORM for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "REBMIX")an object of class REBMIX.
signature(object = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
## Not run: data(weibull) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "single", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. weibullest <- REBMIX(Dataset = list(weibull), Preprocessing = "kernel density estimation", cmin = 2, cmax = 4, Criterion = "BIC", pdf = "Weibull", EMcontrol = EM) # Plot finite mixture. plot(weibullest, what = c("pdf", "marginal cdf", "IC", "logL", "D"), nrow = 3, ncol = 2, npts = 1000) # Bootstrap finite mixture. weibullboot <- boot(x = weibullest, Bootstrap = "nonparametric", B = 10) weibullboot ## End(Not run)## Not run: data(weibull) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "single", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. weibullest <- REBMIX(Dataset = list(weibull), Preprocessing = "kernel density estimation", cmin = 2, cmax = 4, Criterion = "BIC", pdf = "Weibull", EMcontrol = EM) # Plot finite mixture. plot(weibullest, what = c("pdf", "marginal cdf", "IC", "logL", "D"), nrow = 3, ncol = 2, npts = 1000) # Bootstrap finite mixture. weibullboot <- boot(x = weibullest, Bootstrap = "nonparametric", B = 10) weibullboot ## End(Not run)
Returns an object of class Histogram. The method can be called recursively.
This way more than one dataset can be binned into one histogram. The method is time consuming.
## S4 method for signature 'Histogram' chistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures## S4 method for signature 'Histogram' chistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), ...) ## ... and for other signatures
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
signature(x = "Histogram")an object of class Histogram.
Marko Nagode
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- chistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- chistogram(x = hist, Dataset = Dataset2) # Bin the third dataset. hist <- chistogram(x = hist, Dataset = Dataset3) hist# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- chistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- chistogram(x = hist, Dataset = Dataset2) # Bin the third dataset. hist <- chistogram(x = hist, Dataset = Dataset3) hist
Returns (invisibly) the object containing train and test observations as well as true class membership for the test dataset. Vectors are subvectors of
.
## S4 method for signature 'RCLS.chunk' chunk(x = NULL, variables = expression(1:d)) ## ... and for other signatures## S4 method for signature 'RCLS.chunk' chunk(x = NULL, variables = expression(1:d)) ## ... and for other signatures
x |
see Methods section below. |
variables |
a vector containing indices of variables in subvectors |
Returns an object of class RCLS.chunk.
signature(x = "RCLS.chunk")an object of class RCLS.chunk.
Marko Nagode
data(iris) # Split dataset into train (75%) and test (25%) subsets. set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) # Extract chunk from train and test datasets. Iris14 <- chunk(x = Iris, variables = c(1,4)) Iris14data(iris) # Split dataset into train (75%) and test (25%) subsets. set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) # Extract chunk from train and test datasets. Iris14 <- chunk(x = Iris, variables = c(1,4)) Iris14
Returns the classification likelihood criterion at pos.
## S4 method for signature 'REBMIX' CLC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' CLC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
C. Biernacki and G. Govaert. Using the classification likelihood to choose the number of clusters. In E. J. Wegman and S. P. Azen, editors, Computing Science and Statistics, 1997.
Returns the data frame containing observations and empirical
densities for the kernel density estimation or k-nearest neighbour or bin means
and empirical densities for the histogram preprocessing. Vectors and are subvectors of
and .
## S4 method for signature 'REBMIX' demix(x = NULL, pos = 1, variables = expression(1:d), ...) ## ... and for other signatures## S4 method for signature 'REBMIX' demix(x = NULL, pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "best") # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 8, Criterion = "BIC", EMcontrol = NULL) # Preprocess simulated dataset. f <- demix(simulatedest, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1) par(usr = opar[[2]]$usr, mfg = c(2, 1)) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 1)# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "best") # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 8, Criterion = "BIC", EMcontrol = NULL) # Preprocess simulated dataset. f <- demix(simulatedest, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1) par(usr = opar[[2]]$usr, mfg = c(2, 1)) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 1)
Returns the data frame containing observations and
predictive marginal densities . Vectors are subvectors of
. If the method returns the data frame containing observations and
the corresponding predictive mixture densities .
## S4 method for signature 'REBMIX' dfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), ...) ## ... and for other signatures## S4 method for signature 'REBMIX' dfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC") # Preprocess simulated dataset. Dataset <- data.frame(c(-7, 1), NA, c(3, 7)) f <- dfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC") # Preprocess simulated dataset. Dataset <- data.frame(c(-7, 1), NA, c(3, 7)) f <- dfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
"EM.Control"
Object of class EM.Control.
Objects can be created by calls of the form new("EM.Control", ...). Accessor methods for the slots are a.strategy(x = NULL),
a.variant(x = NULL), a.acceleration(x = NULL), a.tolerance(x = NULL), a.acceleration.multiplier(x = NULL),
a.maximum.iterations(x = NULL), a.K(x = NULL), a.eliminate.zero.components(x = NULL), a.likelihood.tolerance.check(x = NULL) and a.likelihood.estimation.rule(x = NULL) where x stands for an object of
class EM.Control. Setter methods a.strategy(x = NULL), a.variant(x = NULL),
a.acceleration(x = NULL), a.tolerance(x = NULL), a.acceleration.multiplier(x = NULL), a.maximum.iterations(x = NULL),
a.K(x = NULL), a.eliminate.zero.components(x = NULL), a.likelihood.tolerance.check{x = NULL} and a.likelihood.estimation.rule{x = NULL} are provided to write to strategy, variant, acceleration, tolerance,
acceleration.multiplier, maximum.iterations, eliminate.zero.components, likelihood.tolerance.check and likelihood.estimation.rule slot respectively.
strategy:a character containing the EM and REBMIX strategy. One of "none", "exhaustive", "best" and "single". The default value is "none".
variant:a character containing the type of the EM algorithm to be used. One of "EM", "ECM", "SEM", "SEM-EM", "ECM-EM". The default value is "EM".
acceleration:a character containing the type of acceleration of the EM iteration increment. One of "fixed", "line", "golden", "stem1", "stem2", "stem3", "square1", "square2" or "square3". The default value is "fixed".
tolerance:tolerance value for the EM convergence criteria. The default value is 1.0E-4.
acceleration.multiplier:acceleration.multiplier , . acceleration.multiplier for the EM step increment. The default value is 1.0. Only used when acceleration == "fixed".
maximum.iterations:a positive integer containing the maximum allowed number of iterations of the EM algorithm. The default value is 1000.
K:an integer containing the number of bins for the histogram based EM algorithm. This option can reduce computational time drastically if the datasets contain a large number of observations and K is set to the value . The default value of 0 means that the EM algorithm runs over all .
eliminate.zero.components:a logical indicating if the componenets with should be eliminated from output. Only used with EMMIX-methods.
likelihood.tolerance.check:a character containing the type of log likelihood convergence check. One of "absolute", "normalised" and "percentage". The default value is "normalised".
likelihood.estimation.rule:a character containing the type of log likelihood estimate. One of "standard", "aitken-lindsay", "aitken-bohning" and "aitken-nicholas". The default value is "standard".
Branislav Panic
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
A. P. Dempster et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1):1-38, 1977.
https://www.jstor.org/stable/2984875.
G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions, Computational Statistics & Data Analysis, 14(3):315:332, 1992.
doi:10.1016/0167-9473(92)90042-E.
R. Varadhan and C. Roland. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics, 35(2):335:353, 2008.
doi:10.1111/j.1467-9469.2007.00585.x.
P. D. McNicholas et al. Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics & Data Analysis, 54(3):711:723. 2010.
doi:10.1016/j.csda.2009.02.011
# Inline creation by new call. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000, K = 0, likelihood.tolerance.check = "absolute", likelihood.estimation.rule = "standard") EM # Creation of EM object with setter method. EM <- new("EM.Control") a.strategy(EM) <- "exhaustive" a.variant(EM) <- "EM" a.acceleration(EM) <- "fixed" a.tolerance(EM) <- 1e-4 a.acceleration.multiplier(EM) <- 1.0 a.maximum.iterations(EM) <- 1000 a.K(EM) <- 256 a.likelihood.tolerance.check(EM) <- "normalised" a.likelihood.estimation.rule(EM) <- "standard" EM# Inline creation by new call. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000, K = 0, likelihood.tolerance.check = "absolute", likelihood.estimation.rule = "standard") EM # Creation of EM object with setter method. EM <- new("EM.Control") a.strategy(EM) <- "exhaustive" a.variant(EM) <- "EM" a.acceleration(EM) <- "fixed" a.tolerance(EM) <- 1e-4 a.acceleration.multiplier(EM) <- 1.0 a.maximum.iterations(EM) <- 1000 a.K(EM) <- 256 a.likelihood.tolerance.check(EM) <- "normalised" a.likelihood.estimation.rule(EM) <- "standard" EM
Returns as default the EM algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma,
Gumbel, binomial, Poisson, Dirac or von Mises component densities. If model equals "REBMVNORM" output
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' EMMIX(model = "REBMIX", Dataset = list(), Theta = NULL, EMcontrol = NULL, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' EMMIX(model = "REBMIX", Dataset = list(), Theta = NULL, EMcontrol = NULL, ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset |
a list of length |
Theta |
an object of class |
EMcontrol |
an object of class |
... |
currently not used. |
Returns an object of class REBMIX or REBMVNORM.
signature(model = "REBMIX")a character giving the default class name "REBMIX" for mixtures of conditionally
independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac or von Mises component densities.
signature(model = "REBMVNORM")a character giving the class name "REBMVNORM" for mixtures
of multivariate normal component densities with unrestricted variance-covariance matrices.
Branislav Panic
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. doi:10.3390/math8030373.
## Not run: devAskNewPage(ask = TRUE) # Load faithful dataset. data(faithful) # Plot faithfull dataset. plot(faithful) # Number of dimensions. d <- ncol(faithful) # Obtain 2 component solution with Gaussian mixtures. c <- 2 # Create EMMVNORM.Theta object with new call. Theta <- new("EMMVNORM.Theta", d = d, c = c) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 0, 0, 1, 1, 0, 0, 1) # Run EMMIX method. model <- EMMIX(model = "REBMVNORM", Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) # Create EMMIX.Theta object with new call. Theta <- new("EMMIX.Theta", c = c, pdf = c("normal", "normal")) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 1, 1, 1) # Run EMMIX method. model <- EMMIX(Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) ## End(Not run)## Not run: devAskNewPage(ask = TRUE) # Load faithful dataset. data(faithful) # Plot faithfull dataset. plot(faithful) # Number of dimensions. d <- ncol(faithful) # Obtain 2 component solution with Gaussian mixtures. c <- 2 # Create EMMVNORM.Theta object with new call. Theta <- new("EMMVNORM.Theta", d = d, c = c) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 0, 0, 1, 1, 0, 0, 1) # Run EMMIX method. model <- EMMIX(model = "REBMVNORM", Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) # Create EMMIX.Theta object with new call. Theta <- new("EMMIX.Theta", c = c, pdf = c("normal", "normal")) # Set parameters of Theta. # Weights. a.w(Theta) <- c(0.5, 0.5) # Means. a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0) # Covariances. a.theta2.all(Theta) <- c(1, 1, 1, 1) # Run EMMIX method. model <- EMMIX(Dataset = list(faithful), Theta = Theta) # show. model # summary. summary(model) # plot. plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf")) ## End(Not run)
"EMMIX.Theta"
Object of class EMMIX.Theta.
Objects can be created by calls of the form new("EMMIX.Theta", ...). Accessor methods for the slots are a.c(x = NULL), a.d(x = NULL),
a.pdf(x = NULL) and a.Theta(x = NULL), where x stands for an object of class EMMIX.Theta. Setter methods
a.theta1(x = NULL, l = numeric()), a.theta2(x = NULL, l = numeric()), a.theta3(x = NULL, l = numeric()),
a.theta1.all(x = NULL), a.theta2.all(x = NULL), a.theta3.all(x = NULL) and a.w(x = NULL)
are provided to write to Theta slot, where .
c:number of components . The default value is 1.
d:number of dimensions.
pdf:a character vector of length containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac" or "vonMises".
Theta:a list containing parametric family types pdfl. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac" or circular "vonMises" defined for .
Component parameters theta1.l follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions and for Weibull, gamma, binomial, Poisson and Dirac distributions.
Component parameters theta2.l follow theta1.l. One of for normal, lognormal and Gumbel distributions, for Weibull and gamma distributions, for binomial distribution, for von Mises distribution.
Component parameters theta3.l follow theta2.l. One of for Gumbel distribution.
w:a vector of length containing component weights summing to 1.
Branislav Panic
Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel", "Poisson")) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, 30, 20, 50, 60) a.theta2.all(Theta) <- c(0.5, 2.3, NA, 3, 4.2, NA) a.theta3.all(Theta) <- c(NA, 1.0, NA, NA, -1.0, NA) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta2(Theta, l = 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) a.theta2(Theta, l = 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, -20, -2.4, -15.1, 30) a.theta2.all(Theta) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1, 4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) ThetaTheta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel", "Poisson")) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, 30, 20, 50, 60) a.theta2.all(Theta) <- c(0.5, 2.3, NA, 3, 4.2, NA) a.theta3.all(Theta) <- c(NA, 1.0, NA, NA, -1.0, NA) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta2(Theta, l = 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) a.theta2(Theta, l = 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta Theta <- new("EMMVNORM.Theta", c = 2, d = 3) a.w(Theta) <- c(0.4, 0.6) a.theta1.all(Theta) <- c(2, 10, -20, -2.4, -15.1, 30) a.theta2.all(Theta) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1, 4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) Theta
Returns an object of class Histogram. The method can be called recursively.
This way more than one dataset can be binned into one histogram. Set shrink
to TRUE only when the method is called for the last time to optimize the size of the object.
The method is memory consuming.
## S4 method for signature 'Histogram' fhistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), shrink = FALSE, ...) ## ... and for other signatures## S4 method for signature 'Histogram' fhistogram(x = NULL, Dataset = data.frame(), K = numeric(), ymin = numeric(), ymax = numeric(), shrink = FALSE, ...) ## ... and for other signatures
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
shrink |
logical. If |
... |
currently not used. |
signature(x = "Histogram")an object of class Histogram.
Marko Nagode
# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- fhistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- fhistogram(x = hist, Dataset = Dataset2) # Bin the third dataset and shrink the hist object. hist <- fhistogram(x = hist, Dataset = Dataset3, shrink = TRUE) hist# Create three datasets. set.seed(1) n <- 15 Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10))) Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29))) Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13))) apply(Dataset1, 2, range) apply(Dataset2, 2, range) apply(Dataset3, 2, range) # Bin the first dataset. hist <- fhistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0)) # Bin the second dataset. hist <- fhistogram(x = hist, Dataset = Dataset2) # Bin the third dataset and shrink the hist object. hist <- fhistogram(x = hist, Dataset = Dataset3, shrink = TRUE) hist
The unfilled survey of the Corona Borealis region contains the velocities of 82 galaxies from 6 well separated conic sections of space.
data(galaxy)data(galaxy)
galaxy is a data frame with 82 cases (rows) and 1 continuous variable (columns) called Velocity.
K. Roeder. Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of American Statistical Association, 85(411):617-624, 1990. https://www.jstor.org/stable/2289993.
S. Richardson and P. J. Green. On bayesian analysis of mixtures with an unknown number
of components. Journal of the Royal Statistical Society B, 59(4):731-792, 1997. https://www.jstor.org/stable/2985194.
G. McLachlan and D. Peel. Contribution to the discussion of paper by s. richardson
and p.j. green. Journal of the Royal Statistical Society B, 59(4):779-780, 1997. https://www.jstor.org/stable/2985194.
M. Stephens. Bayesian analysis of mixture models with an unknown number of components -
an alternative to reversible jump methods. The Annals of Statistics, 28(1):40-74, 2000. https://www.jstor.org/stable/2673981.
"Histogram"
Object of class Histogram.
Objects can be created by calls of the form new("Histogram", ...). Accessor methods for the slots are a.Y(x = NULL),
a.K(x = NULL), a.ymin(x = NULL), a.ymax(x = NULL), a.y0(x = NULL), a.h(x = NULL), a.n(x = NULL) and a.ns(x = NULL).
Y:a data frame of size containing d-dimensional histogram.
Each of the first columns represents one random variable and contains bin means
. Column contains frequencies .
K:an integer or a vector of length containing numbers of bins .
ymin:a vector of length containing minimum observations.
ymax:a vector of length containing maximum observations.
y0:a vector of length containing origins.
h:a vector of length containing bin widths.
n:an integer containing total number of observations.
ns:an integer containing number of samples.
Marko Nagode
Y <- as.data.frame(matrix(1.0, nrow = 8, ncol = 3)) hist <- new("Histogram", Y = Y, K = c(4, 2), ymin = c(2, 1), ymax = c(10, 8)) a.Y(hist) a.K(hist) a.ymin(hist) a.ymax(hist) a.y0(hist) a.h(hist) a.n(hist) a.ns(hist) # Multiplay Y[ , d + 1] by 0.1. a.Y(hist) <- 0.1Y <- as.data.frame(matrix(1.0, nrow = 8, ncol = 3)) hist <- new("Histogram", Y = Y, K = c(4, 2), ymin = c(2, 1), ymax = c(10, 8)) a.Y(hist) a.K(hist) a.ymin(hist) a.ymax(hist) a.y0(hist) a.h(hist) a.n(hist) a.ns(hist) # Multiplay Y[ , d + 1] by 0.1. a.Y(hist) <- 0.1
Returns the Hannan-Quinn information criterion at pos.
## S4 method for signature 'REBMIX' HQC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' HQC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B, 41(2):190-195, 1979. https://www.jstor.org/stable/2985032.
Returns the integrated classification likelihood criterion at pos.
## S4 method for signature 'REBMIX' ICL(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' ICL(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
Returns the approximate integrated classification likelihood criterion at pos.
## S4 method for signature 'REBMIX' ICLBIC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' ICLBIC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
data(iris)data(iris)
iris is a data frame with 150 cases (rows) and 5 variables (columns) named:
Sepal.Length continuous.
Sepal.Width continuous.
Petal.Length continuous.
Petal.Width continuous.
Class discrete iris-setosa, iris-versicolour or iris-virginica.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179-188, 1936.
## Not run: devAskNewPage(ask = TRUE) data(iris) # Show level attributes. levels(iris[["Class"]]) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.6, Dataset = iris, class = 5) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Iris)) irisest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Iris), Preprocessing = "histogram", cmax = 10, Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "single")) plot(irisest, pos = 1, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 2, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 3, nrow = 3, ncol = 2, what = c("pdf")) # Selected chunks. iriscla <- RCLSMIX(model = "RCLSMVNORM", x = list(irisest), Dataset = a.test(Iris), Zt = a.Zt(Iris)) iriscla summary(iriscla) # Plot selected chunks. plot(iriscla, nrow = 3, ncol = 2) ## End(Not run)## Not run: devAskNewPage(ask = TRUE) data(iris) # Show level attributes. levels(iris[["Class"]]) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.6, Dataset = iris, class = 5) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Iris)) irisest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Iris), Preprocessing = "histogram", cmax = 10, Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "single")) plot(irisest, pos = 1, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 2, nrow = 3, ncol = 2, what = c("pdf")) plot(irisest, pos = 3, nrow = 3, ncol = 2, what = c("pdf")) # Selected chunks. iriscla <- RCLSMIX(model = "RCLSMVNORM", x = list(irisest), Dataset = a.test(Iris), Zt = a.Zt(Iris)) iriscla summary(iriscla) # Plot selected chunks. plot(iriscla, nrow = 3, ncol = 2) ## End(Not run)
Returns (invisibly) a vector containing numbers of bins for the histogram and the kernel density estimation or numbers of nearest
neighbours for the k-nearest neighbour.
kseq(from = NULL, to = NULL, f = 0.05, ...)kseq(from = NULL, to = NULL, f = 0.05, ...)
from |
starting value of the sequence. The default value is |
to |
end value of the sequence. The default value is |
f |
number specifying the fraction by which the bins or nearest neighbours should be separated |
... |
currently not used. |
Marko Nagode
# Generate numbers of bins. n <- 10000 Sturges <- as.integer(1 + log2(n)) # Minimum v follows Sturges rule. Log10 <- as.integer(10 * log10(n)) # Maximum v follows Log10 rule. RootN <- as.integer(2 * n^0.5) # Maximum v follows RootN rule. K <- kseq(from = Sturges, to = Log10, f = 0.05) K K <- kseq(from = Sturges, to = RootN, f = 0.03) K# Generate numbers of bins. n <- 10000 Sturges <- as.integer(1 + log2(n)) # Minimum v follows Sturges rule. Log10 <- as.integer(10 * log10(n)) # Maximum v follows Log10 rule. RootN <- as.integer(2 * n^0.5) # Maximum v follows RootN rule. K <- kseq(from = Sturges, to = Log10, f = 0.05) K K <- kseq(from = Sturges, to = RootN, f = 0.03) K
Returns the list with the data frame Mij containing the cluster levels , the numbers of pixels and the cluster moments
for 2D images or the data frame Mijk containing the cluster levels , the numbers of voxels and the cluster moments
for 3D images and the adjacency matrix A of size . It may have some NA rows and columns. To calculate the adjacency matrix , the raw cluster moments are first converted into z-scores.
## S4 method for signature 'array' labelmoments(Zp = array(), cmax = integer(), Sigma = 1.0, ...) ## ... and for other signatures## S4 method for signature 'array' labelmoments(Zp = array(), cmax = integer(), Sigma = 1.0, ...) ## ... and for other signatures
Zp |
a 2D array of size |
cmax |
maximum number of clusters |
Sigma |
scale parameter |
... |
currently not used. |
signature(Zp = "array")an array.
Marko Nagode, Branislav Panic
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Zp <- matrix(rep(0, 100), nrow = 10, ncol = 10) Zp[2, 2:4] <- 1; Zp[2:4, 5] <- 2; Zp[8, 7:10] <- 3; Zp[9, 6] <- 4; Zp[10, 5] <- 4 Zp[10, 1:4] <- 5 Zp[6:9, 1] <- 6 labelmoments <- labelmoments(Zp, cmax = 6, Sigma = 1.0) set.seed(12) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 3) Zp mergelabelsZp <- matrix(rep(0, 100), nrow = 10, ncol = 10) Zp[2, 2:4] <- 1; Zp[2:4, 5] <- 2; Zp[8, 7:10] <- 3; Zp[9, 6] <- 4; Zp[10, 5] <- 4 Zp[10, 1:4] <- 5 Zp[6:9, 1] <- 6 labelmoments <- labelmoments(Zp, cmax = 6, Sigma = 1.0) set.seed(12) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 3) Zp mergelabels
Returns the log likelihood at pos.
## S4 method for signature 'REBMIX' logL(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' logL(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Returns a factor of predictive cluster membership for dataset.
## S4 method for signature 'RCLRMIX' mapclusters(x = NULL, Dataset = data.frame(), s = expression(c), ...) ## ... and for other signatures## S4 method for signature 'RCLRMIX' mapclusters(x = NULL, Dataset = data.frame(), s = expression(c), ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame of size |
s |
a desired number of clusters to be created. The default value is |
... |
currently not used. |
signature(x = "RCLRMIX")an object of class RCLRMIX.
signature(x = "RCLRMVNORM")an object of class RCLRMVNORM.
Marko Nagode, Branislav Panic
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(50, 20, 40) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("normal_", 1:10, sep = ""), n = n, Theta = a.Theta(Theta)) # Convert all datasets to single histogram. hist <- NULL n <- length(normal@Dataset) hist <- fhistogram(Dataset = normal@Dataset[[1]], K = c(10, 10), ymin = a.ymin(normal), ymax = a.ymax(normal)) for (i in 2:n) { hist <- fhistogram(x = hist, Dataset = normal@Dataset[[i]], shrink = i == n) } # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = list(hist), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest) # Plot clusters. plot(normalclu) summary(normalclu) # Map clusters. Zp <- mapclusters(x = normalclu, Dataset = a.Dataset(normal, 4)) Zt <- a.Zt(normal) Zp ZtdevAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(50, 20, 40) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("normal_", 1:10, sep = ""), n = n, Theta = a.Theta(Theta)) # Convert all datasets to single histogram. hist <- NULL n <- length(normal@Dataset) hist <- fhistogram(Dataset = normal@Dataset[[1]], K = c(10, 10), ymin = a.ymin(normal), ymax = a.ymax(normal)) for (i in 2:n) { hist <- fhistogram(x = hist, Dataset = normal@Dataset[[i]], shrink = i == n) } # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = list(hist), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest) # Plot clusters. plot(normalclu) summary(normalclu) # Map clusters. Zp <- mapclusters(x = normalclu, Dataset = a.Dataset(normal, 4)) Zt <- a.Zt(normal) Zp Zt
Returns the minimum desription length at pos.
## S4 method for signature 'REBMIX' MDL2(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' MDL5(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' MDL2(x = NULL, pos = 1, ...) ## S4 method for signature 'REBMIX' MDL5(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. H. Hansen and B. Yu. Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96(454):746-774, 2001. https://www.jstor.org/stable/2670311.
Returns the list with the normalised adjacency matrix L of size . The normalised adjacency matrix
depends on the probability adjacency matrix , where
and the degree matrix . The matrices may contain some NA rows and columns, which are eliminated by the method.
The list also contains the vector of integers cluster of length , which indicates the cluster to which each label is assigned.
## S4 method for signature 'list' mergelabels(A = list(), w = numeric(), k = 2, ...) ## ... and for other signatures## S4 method for signature 'list' mergelabels(A = list(), w = numeric(), k = 2, ...) ## ... and for other signatures
A |
a list of length |
w |
vector of length |
k |
number of clusters |
... |
further arguments to |
signature(A = "list")a list.
Marko Nagode, Branislav Panic
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Zp <- array(0, dim = c(10, 10, 2)) Zp[ , ,1][10, 1:4] <- 1 Zp[ , ,1][1:4, 10] <- 2 Zp[ , ,2][9, 1:5] <- 3 Zp[ , ,2][1:6, 9] <- 4 labelmoments <- labelmoments(Zp, cmax = 4, Sigma = 1.0) labelmoments set.seed(3) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 5) Zp mergelabelsZp <- array(0, dim = c(10, 10, 2)) Zp[ , ,1][10, 1:4] <- 1 Zp[ , ,1][1:4, 10] <- 2 Zp[ , ,2][9, 1:5] <- 3 Zp[ , ,2][1:6, 9] <- 4 labelmoments <- labelmoments(Zp, cmax = 4, Sigma = 1.0) labelmoments set.seed(3) mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 5) Zp mergelabels
Returns the matrix of size containing optimal numbers of bins for all processed datasets.
## S4 method for signature 'list' optbins(Dataset = list(), Rule = "Knuth equal", ymin = numeric(), ymax = numeric(), kmin = numeric(), kmax = numeric(), ...) ## ... and for other signatures## S4 method for signature 'list' optbins(Dataset = list(), Rule = "Knuth equal", ymin = numeric(), ymax = numeric(), kmin = numeric(), kmax = numeric(), ...) ## ... and for other signatures
Dataset |
a list of length |
Rule |
a character giving the histogram binning rule. One of |
ymin |
a vector of length |
ymax |
a vector of length |
kmin |
lower limit of the number of bins. The default value is |
kmax |
upper limit of the number of bins. The default value is |
... |
currently not used. |
signature(x = "list")a list of data frames.
Branislav Panic, Marko Nagode
K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models.
Digital Signal Processing, 95:102581, 2019.
doi:10.1016/j.dsp.2019.102581.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
# Generate multivariate normal datasets. n <- c(750, 1000) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", ymin = sim2d@ymin, ymax = sim2d@ymax, kmin = 2, kmax = 20) opt.k # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. sim2dest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(sim2d), Preprocessing = "h", cmax = 10, ymin = a.ymin(sim2d), ymax = a.ymax(sim2d), K = opt.k, Criterion = "BIC", EMcontrol = EM) # Plot finite mixture. plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC")) # Estimate number of components, component weights and component # parameters for well known Iris dataset. Dataset <- list(iris[, c(1:4)]) # Calculate optimal numbers of bins using non-equal number of bins in each dimension. opt.k <- optbins(Dataset = Dataset, Rule = "Knuth unequal", kmin = 2, kmax = 20) opt.k # Estimate number of components, component weights and component parameters. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "h", cmax = 10, K = opt.k, Criterion = "BIC", EMcontrol = EM) irisest# Generate multivariate normal datasets. n <- c(750, 1000) Theta <- new("RNGMVNORM.Theta", c = 2, d = 2) a.theta1(Theta, 1) <- c(8, 6) a.theta1(Theta, 2) <- c(6, 8) a.theta2(Theta, 1) <- c(8, 2, 2, 4) a.theta2(Theta, 2) <- c(2, 1, 1, 4) sim2d <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("sim2d_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Calculate optimal numbers of bins. opt.k <- optbins(Dataset = sim2d@Dataset, Rule = "Knuth equal", ymin = sim2d@ymin, ymax = sim2d@ymax, kmin = 2, kmax = 20) opt.k # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. sim2dest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(sim2d), Preprocessing = "h", cmax = 10, ymin = a.ymin(sim2d), ymax = a.ymax(sim2d), K = opt.k, Criterion = "BIC", EMcontrol = EM) # Plot finite mixture. plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC")) # Estimate number of components, component weights and component # parameters for well known Iris dataset. Dataset <- list(iris[, c(1:4)]) # Calculate optimal numbers of bins using non-equal number of bins in each dimension. opt.k <- optbins(Dataset = Dataset, Rule = "Knuth unequal", kmin = 2, kmax = 20) opt.k # Estimate number of components, component weights and component parameters. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "h", cmax = 10, K = opt.k, Criterion = "BIC", EMcontrol = EM) irisest
Returns the partition coefficient of Bezdek at pos.
## S4 method for signature 'REBMIX' PC(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' PC(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Returns the data frame containing observations and empirical
distribution functions . Vectors are subvectors of
.
## S4 method for signature 'REBMIX' pemix(x = NULL, pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' pemix(x = NULL, pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "ECM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "kernel density estimation", cmax = 4, pdf = c("n", "n", "n"), EMcontrol = EM) # Preprocess simulated dataset. f <- pemix(simulatedest, pos = 3, variables = c(1)) f# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Create object of class EM.Control. EM <- new("EM.Control", strategy = "exhaustive", variant = "ECM", acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4, maximum.iterations = 1000) # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "kernel density estimation", cmax = 4, pdf = c("n", "n", "n"), EMcontrol = EM) # Preprocess simulated dataset. f <- pemix(simulatedest, pos = 3, variables = c(1)) f
Returns the data frame containing observations and
predictive marginal distribution functions . Vectors are subvectors of
. If the method returns the data frame containing observations and
the corresponding predictive mixture distribution function .
## S4 method for signature 'REBMIX' pfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' pfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...) ## ... and for other signatures
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC", pdf = c("n", "n", "n")) # Preprocess simulated dataset. Dataset <- data.frame(c(25, 5, -20), NA, c(31, 20, 20)) f <- pfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, what = "pdf", contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)# Generate simulated dataset. n <- c(15, 15) Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3)) a.theta1(Theta, 1) <- c(10, 20, 30) a.theta1(Theta, 2) <- c(3, 4, 5) a.theta2(Theta, 1) <- c(3, 2, 1) a.theta2(Theta, 2) <- c(15, 10, 5) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) # Number of classes or nearest neighbours to be processed. K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule. as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule. # Estimate number of components, component weights and component parameters. simulatedest <- REBMIX(Dataset = a.Dataset(simulated), Preprocessing = "h", cmax = 4, Criterion = "BIC", pdf = c("n", "n", "n")) # Preprocess simulated dataset. Dataset <- data.frame(c(25, 5, -20), NA, c(31, 20, 20)) f <- pfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3)) f # Plot finite mixture. opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1, what = "pdf", contour.drawlabels = TRUE, contour.labcex = 0.6) par(usr = opar[[2]]$usr, mfg = c(2, 1)) points(x = f[, 1], y = f[, 2]) text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
Plots true clusters if x equals "RNGMIX". Plots the REBMIX output
depending on what argument if x equals "REBMIX".
Plots predictive clusters if x equals "RCLRMIX".
Wrongly clustered observations are plotted only if x@Zt is available.
Plots predictive classes and wrongly classified observations if x equals "RCLSMIX".
## S4 method for signature 'RNGMIX,missing' plot(x, y, pos = 1, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'REBMIX,missing' plot(x, y, pos = 1, what = c("pdf"), nrow = 1, ncol = 1, npts = 200, n = 200, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, contour.drawlabels = FALSE, contour.labcex = 0.8, contour.method = "flattest", contour.nlevels = 12, log = "", ...) ## S4 method for signature 'RCLRMIX,missing' plot(x, y, s = expression(c), nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'RCLSMIX,missing' plot(x, y, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## ... and for other signatures## S4 method for signature 'RNGMIX,missing' plot(x, y, pos = 1, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'REBMIX,missing' plot(x, y, pos = 1, what = c("pdf"), nrow = 1, ncol = 1, npts = 200, n = 200, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, contour.drawlabels = FALSE, contour.labcex = 0.8, contour.method = "flattest", contour.nlevels = 12, log = "", ...) ## S4 method for signature 'RCLRMIX,missing' plot(x, y, s = expression(c), nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## S4 method for signature 'RCLSMIX,missing' plot(x, y, nrow = 1, ncol = 1, cex = 0.8, fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5, plot.cex = 0.8, plot.pch = 19, ...) ## ... and for other signatures
x |
see Methods section below. |
y |
currently not used. |
pos |
a desired row number in |
s |
a desired number of clusters to be plotted. The default value is |
what |
a character vector giving the plot types. One of |
nrow |
a desired number of rows in which the empirical and predictive densities are to be plotted. The default value is |
ncol |
a desired number of columns in which the empirical and predictive densities are to be plotted. The default value is |
npts |
a number of points at which the predictive densities are to be plotted. The default value is |
n |
a number of observations to be plotted. The default value is |
cex |
a numerical value giving the amount by which the plotting text and symbols should be magnified
relative to the default, see also |
fg |
a colour used for things like axes and boxes around plots, see also |
lty |
a line type, see also |
lwd |
a line width, see also |
pty |
a character specifying the type of the plot region to be used. One of |
tcl |
a length of tick marks as a fraction of the height of a line of the text, see also |
plot.cex |
a numerical vector giving the amount by which plotting characters and symbols should be
scaled relative to the default. It works as a multiple of |
plot.pch |
a vector of plotting characters or symbols, see also |
contour.drawlabels |
logical. The contours are labelled if |
contour.labcex |
|
contour.method |
a character specifying where the labels will be located. The possible values
are |
contour.nlevels |
a number of desired contour levels. The default value is |
log |
a character which contains |
... |
further arguments to |
Returns (invisibly) a list containing graphical parameters par. Such a list can be passed as an argument to par to restore the parameter values.
signature(x = "RNGMIX", y = "missing")an object of class RNGMIX.
signature(x = "RNGMVNORM", y = "missing")an object of class RNGMVNORM.
signature(x = "REBMIX", y = "missing")an object of class REBMIX.
signature(x = "REBMVNORM", y = "missing")an object of class REBMVNORM.
signature(x = "RCLRMIX", y = "missing")an object of class RCLRMIX.
signature(x = "RCLRMVNORM", y = "missing")an object of class RCLRMVNORM.
signature(x = "RCLSMIX", y = "missing")an object of class RCLSMIX.
signature(x = "RCLSMVNORM", y = "missing")an object of class RCLSMVNORM.
Marko Nagode
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
## Not run: devAskNewPage(ask = TRUE) data(wine) colnames(wine) # Remove Cultivar column from wine dataset. winecolnames <- !(colnames(wine) wine <- wine[, winecolnames] # Determine number of dimensions d and wine dataset size n. d <- ncol(wine) n <- nrow(wine) wineest <- REBMIX(model = "REBMVNORM", Dataset = list(wine = wine), Preprocessing = "kernel density estimation", Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "best")) # Plot finite mixture. plot(wineest, what = c("pdf", "IC", "logL", "D"), nrow = 2, ncol = 2, pty = "s") ## End(Not run)## Not run: devAskNewPage(ask = TRUE) data(wine) colnames(wine) # Remove Cultivar column from wine dataset. winecolnames <- !(colnames(wine) wine <- wine[, winecolnames] # Determine number of dimensions d and wine dataset size n. d <- ncol(wine) n <- nrow(wine) wineest <- REBMIX(model = "REBMVNORM", Dataset = list(wine = wine), Preprocessing = "kernel density estimation", Criterion = "ICL-BIC", EMcontrol = new("EM.Control", strategy = "best")) # Plot finite mixture. plot(wineest, what = c("pdf", "IC", "logL", "D"), nrow = 2, ncol = 2, pty = "s") ## End(Not run)
Returns the total of positive relative deviations D at pos.
## S4 method for signature 'REBMIX' PRD(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' PRD(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
"RCLRMIX"
Object of class RCLRMIX.
Objects can be created by calls of the form new("RCLRMIX", ...).
Accessor methods for the slots are a.Dataset(x = NULL), a.pos(x = NULL), a.Zt(x = NULL),
a.Zp(x = NULL, s = expression(c)), a.c(x = NULL),
a.p(x = NULL, s = expression(c)), a.pi(x = NULL, s = expression(c)), a.P(x = NULL, s = expression(c)), a.tau(x = NULL, s = expression(c)),
a.prob(x = NULL), a.Rule(x = NULL), a.from(x = NULL), a.to(x = NULL),
a.EN(x = NULL) and a.ED(x = NULL), where x stands for an object of class RCLRMIX and s
a desired number of clusters for which the slot is calculated.
x:an object of class REBMIX.
Dataset:a data frame or an object of class Histogram to be clustered.
pos:a desired row number in x@summary for which the clustering is performed. The default value is 1.
Zt:a factor of true cluster membership.
Zp:a factor of predictive cluster membership.
c:number of nonempty clusters.
p:a vector of length containing prior probabilities of cluster memberships summing to 1. The value is returned only if all variables in slot x follow either binomial or Dirac parametric families. The default value is numeric().
pi:a list of length of matrices of size containing cluster conditional probabilities . Let
denote the cluster conditional probability that an observation in cluster produces the th outcome on the th variable.
Suppose we observe polytomous categorical variables (the manifest variables), each of which contains possible outcomes for observations .
A manifest variable is a variable that can be measured or observed directly. It must be coded as whole number starting at zero for the first outcome and increasing to the possible number of outcomes minus one.
It is presumed here that all variables are statistically independentand within clusters and that
stands for an observed dimensional dataset of size of vector observations .
The value is returned only if all variables in slot x follow either binomial or Dirac parametric families. The default value is list().
P:a data frame containing true and predictive frequencies calculated for unique , where and .
tau:a matrix of size containing conditional probabilities that observations arise from clusters .
prob:a vector of length containing probabilities of correct clustering for .
Rule:a character containing the merging rule. One of "Entropy" and "Demp". The default value is "Entropy".
from:a vector of length containing clusters merged to to clusters.
to:a vector of length containing clusters originating from from clusters.
EN:a vector of length containing entropies for combined clusters.
ED:a vector of length containing decrease of entropies for combined clusters.
A:an adjacency matrix of size , where .
Marko Nagode, Branislav Panic
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering.
Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
S. Kyoya and K. Yamanishi. Summarizing finite mixture model with overlapping quantification. Entropy, 23(11):1503, 2021. doi:10.3390/e23111503
devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(500, 200, 400) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(normal), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal)) # Plot clusters. plot(normalclu) summary(normalclu)devAskNewPage(ask = TRUE) # Generate normal dataset. n <- c(500, 200, 400) Theta <- new("RNGMVNORM.Theta", c = 3, d = 2) a.theta1(Theta, 1) <- c(3, 10) a.theta1(Theta, 2) <- c(8, 6) a.theta1(Theta, 3) <- c(12, 11) a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2) a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5) a.theta2(Theta, 3) <- c(2, 1, 1, 2) normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. normalest <- REBMIX(model = "REBMVNORM", Dataset = a.Dataset(normal), Preprocessing = "histogram", cmax = 6, Criterion = "BIC") summary(normalest) # Plot finite mixture. plot(normalest) # Cluster dataset. normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal)) # Plot clusters. plot(normalclu) summary(normalclu)
Returns as default the RCLRMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities, following the methodology proposed in the article cited in the references. If model equals "RCLRMVNORM" output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLRMIX' RCLRMIX(model = "RCLRMIX", x = NULL, Dataset = NULL, pos = 1, Zt = factor(), Rule = character(), ...) ## ... and for other signatures ## S4 method for signature 'RCLRMIX' summary(object, ...) ## ... and for other signatures## S4 method for signature 'RCLRMIX' RCLRMIX(model = "RCLRMIX", x = NULL, Dataset = NULL, pos = 1, Zt = factor(), Rule = character(), ...) ## ... and for other signatures ## S4 method for signature 'RCLRMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
x |
an object of class |
Dataset |
a data frame or an object of class |
pos |
a desired row number in |
Zt |
a factor of true cluster membership. The default value is |
Rule |
a character containing the merging rule. One of |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class RCLRMIX or RCLRMVNORM.
signature(model = "RCLRMIX")a character giving the default class name "RCLRMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLRMVNORM")a character giving the class name "RCLRMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "RCLRMIX")an object of class RCLRMIX.
signature(object = "RCLRMVNORM")an object of class RCLRMVNORM.
Marko Nagode
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
devAskNewPage(ask = TRUE) # Generate Poisson dataset. n <- c(500, 200, 400) Theta <- new("RNGMIX.Theta", c = 3, pdf = "Poisson") a.theta1(Theta) <- c(3, 12, 36) poisson <- RNGMIX(Dataset.name = "Poisson_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. EM <- new("EM.Control", strategy = "exhaustive") poissonest <- REBMIX(Dataset = a.Dataset(poisson), Preprocessing = "histogram", cmax = 6, Criterion = "BIC", pdf = rep("Poisson", 1), EMcontrol = EM) summary(poissonest) # Plot finite mixture. plot(poissonest) # Cluster dataset. poissonclu <- RCLRMIX(x = poissonest, Zt = a.Zt(poisson)) summary(poissonclu) # Plot clusters. plot(poissonclu) # Create new dataset. Dataset <- sample.int(n = 50, size = 10, replace = TRUE) Dataset <- as.data.frame(Dataset) # Cluster the dataset. poissonclu <- RCLRMIX(x = poissonest, Dataset = Dataset, Rule = "Demp") a.Dataset(poissonclu)devAskNewPage(ask = TRUE) # Generate Poisson dataset. n <- c(500, 200, 400) Theta <- new("RNGMIX.Theta", c = 3, pdf = "Poisson") a.theta1(Theta) <- c(3, 12, 36) poisson <- RNGMIX(Dataset.name = "Poisson_1", n = n, Theta = a.Theta(Theta)) # Estimate number of components, component weights and component parameters. EM <- new("EM.Control", strategy = "exhaustive") poissonest <- REBMIX(Dataset = a.Dataset(poisson), Preprocessing = "histogram", cmax = 6, Criterion = "BIC", pdf = rep("Poisson", 1), EMcontrol = EM) summary(poissonest) # Plot finite mixture. plot(poissonest) # Cluster dataset. poissonclu <- RCLRMIX(x = poissonest, Zt = a.Zt(poisson)) summary(poissonclu) # Plot clusters. plot(poissonclu) # Create new dataset. Dataset <- sample.int(n = 50, size = 10, replace = TRUE) Dataset <- as.data.frame(Dataset) # Cluster the dataset. poissonclu <- RCLRMIX(x = poissonest, Dataset = Dataset, Rule = "Demp") a.Dataset(poissonclu)
"RCLS.chunk"
Object of class RCLS.chunk.
Objects can be created by calls of the form new("RCLS.chunk", ...). Accessor methods for the slots are a.s(x = NULL),
a.levels(x = NULL), a.ntrain(x = NULL), a.train(x = NULL), a.Zr(x = NULL), a.ntest(x = NULL), a.test(x = NULL) and a.Zt(x = NULL),
where x stands for an object of class RCLS.chunk.
s:finite set of size of classes .
levels:a character vector of length containing class names .
ntrain:a vector of length containing numbers of observations in train datasets .
train:a list of length of data frames containing train datasets of length .
Zr:a list of factors of true class membership for the train datasets.
ntest:number of observations in test dataset .
test:a data frame containing test dataset of length .
Zt:a factor of true class membership for the test dataset.
Marko Nagode
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
"RCLSMIX"
Object of class RCLSMIX.
Objects can be created by calls of the form new("RCLSMIX", ...). Accessor methods for the slots are a.o(x = NULL),
a.Dataset(x = NULL), a.s(x = NULL), a.ntrain(x = NULL), a.P(x = NULL), a.ntest(x = NULL), a.Zt(x = NULL),
a.Zp(x = NULL), a.CM(x = NULL), a.Accuracy(x = NULL), a.Error(x = NULL), a.Precision(x = NULL), a.Sensitivity(x = NULL),
a.Specificity(x = NULL) and a.Chunks(x = NULL), where x stands for an object of class RCLSMIX.
x:a list of objects of class REBMIX of length obtained by running REBMIX on train datasets all of length .
For the train datasets the corresponding class membership is known. This yields
, while for all .
Each object in the list corresponds to one chunk, e.g., .
o:number of chunks . is an observed -dimensional dataset of size of vector observations and
is partitioned into train and test datasets. Vector observations may further be split into chunks when running REBMIX, e.g.,
for and the set of chunks substituting may be as follows , and .
Dataset:a data frame containing test dataset of length . For the test dataset the corresponding class membership is not known.
s:finite set of size of classes .
ntrain:a vector of length containing numbers of observations in train datasets .
P:a vector of length containing prior probabilities .
ntest:number of observations in test dataset .
Zt:a factor of true class membership for the test dataset.
Zp:a factor of predictive class membership for the test dataset.
CM:a table containing confusion matrix for multiclass classifier. It contains
number of test observations with the true class that are classified into the class , where .
Accuracy:proportion of all test observations that are classified correctly. .
Error:proportion of all test observations that are classified wrongly. .
Precision:a vector containing proportions of predictive observations in class that are
classified correctly into class . .
Sensitivity:a vector containing proportions of test observations in class that are classified
correctly into class . .
Specificity:a vector containing proportions of test observations that are not in class and
are classified into the non class. .
Chunks:a vector containing selected chunks.
Marko Nagode
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
Returns as default the RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model equals "RCLSMVNORM" output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'RCLSMIX' RCLSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures ## S4 method for signature 'RCLSMIX' summary(object, ...) ## ... and for other signatures## S4 method for signature 'RCLSMIX' RCLSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(), Zt = factor(), ...) ## ... and for other signatures ## S4 method for signature 'RCLSMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class RCLSMIX or RCLSMVNORM.
signature(model = "RCLSMIX")a character giving the default class name "RCLSMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RCLSMVNORM")a character giving the class name "RCLSMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "RCLSMIX")an object of class RCLSMIX.
signature(object = "RCLSMVNORM")an object of class RCLSMVNORM.
Marko Nagode
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
## Not run: devAskNewPage(ask = TRUE) data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Replace levels with numbers. adult <- as.data.frame(data.matrix(adult)) # Find numbers of levels. cmax <- unlist(lapply(apply(adult[, c(-1, -16)], 2, unique), length)) cmax # Split adult dataset into train and test subsets for two Incomes # and remove Type and Income columns. Adult <- split(p = list(type = 1, train = 2, test = 1), Dataset = adult, class = 16) # Estimate number of components, component weights and component parameters # for the set of chunks 1:14. adultest <- list() for (i in 1:14) { adultest[[i]] <- REBMIX(Dataset = a.train(chunk(Adult, i)), Preprocessing = "histogram", cmax = min(120, cmax[i]), Criterion = "BIC", pdf = "Dirac", K = 1) } # Class membership prediction based upon the best first search algorithm. adultcla <- BFSMIX(x = adultest, Dataset = a.test(Adult), Zt = a.Zt(Adult)) adultcla summary(adultcla) # Plot selected chunks. plot(adultcla, nrow = 5, ncol = 2) ## End(Not run)## Not run: devAskNewPage(ask = TRUE) data(adult) # Find complete cases. adult <- adult[complete.cases(adult),] # Replace levels with numbers. adult <- as.data.frame(data.matrix(adult)) # Find numbers of levels. cmax <- unlist(lapply(apply(adult[, c(-1, -16)], 2, unique), length)) cmax # Split adult dataset into train and test subsets for two Incomes # and remove Type and Income columns. Adult <- split(p = list(type = 1, train = 2, test = 1), Dataset = adult, class = 16) # Estimate number of components, component weights and component parameters # for the set of chunks 1:14. adultest <- list() for (i in 1:14) { adultest[[i]] <- REBMIX(Dataset = a.train(chunk(Adult, i)), Preprocessing = "histogram", cmax = min(120, cmax[i]), Criterion = "BIC", pdf = "Dirac", K = 1) } # Class membership prediction based upon the best first search algorithm. adultcla <- BFSMIX(x = adultest, Dataset = a.test(Adult), Zt = a.Zt(Adult)) adultcla summary(adultcla) # Plot selected chunks. plot(adultcla, nrow = 5, ncol = 2) ## End(Not run)
"REBMIX"
Object of class REBMIX.
Objects can be created by calls of the form new("REBMIX", ...). Accessor methods for the slots are a.Dataset(x = NULL, pos = 0),
a.Preprocessing(x = NULL), a.cmax(x = NULL), a.cmin(x = NULL), a.Criterion(x = NULL), a.Variables(x = NULL),
a.pdf(x = NULL), a.theta1(x = NULL), a.theta2(x = NULL), a.theta3(x = NULL), a.K(x = NULL), a.ymin(x = NULL),
a.ymax(x = NULL), a.ar(x = NULL), a.Restraints(x = NULL), a.Mode(x = NULL), a.w(x = NULL, pos = 0), a.Theta(x = NULL, pos = 0), a.summary(x = NULL, col.name = character(), pos = 0),
a.summary.EM(x = NULL, col.name = character(), pos = 0), a.pos(x = NULL),
a.opt.c(x = NULL), a.opt.IC(x = NULL), a.opt.logL(x = NULL), a.opt.Dmin(x = NULL), a.opt.D(x = NULL), a.all.K(x = NULL), a.all.IC(x = NULL),
a.theta1.all(x = NULL, pos = 1), a.theta2.all(x = NULL, pos = 1) and a.theta3.all(x = NULL, pos = 1), where x, pos and col.name stand for an object of class REBMIX,
a desired slot item and a desired column name, respectively.
Dataset:a list of length of data frames or objects of class Histogram.
Data frames should have size containing d-dimensional datasets. Each of the
columns represents one random variable. Numbers of observations equal the number of rows in the datasets.
Preprocessing:a character vector giving the preprocessing types. One of "histogram", "kernel density estimation" or "k-nearest neighbour".
cmax:maximum number of components . The default value is 15.
cmin:minimum number of components . The default value is 1.
Criterion:a character giving the information criterion type. One of default Akaike "AIC", "AIC3", "AIC4" or "AICc",
Bayesian "BIC", consistent Akaike "CAIC", Hannan-Quinn "HQC", minimum description length "MDL2" or "MDL5",
approximate weight of evidence "AWE", classification likelihood "CLC",
integrated classification likelihood "ICL" or "ICL-BIC", partition coefficient "PC",
total of positive relative deviations "D" or sum of squares error "SSE".
Variables:a character vector of length containing types of variables. One of "continuous" or "discrete".
pdf:a character vector of length containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or "vonMises".
theta1:a vector of length containing initial component parameters. One of for "binomial" distribution.
theta2:a vector of length containing initial component parameters. Currently not used.
theta3:a vector of length containing initial component parameters. One of for "Gumbel" distribution.
K:a character or a vector or a list of vectors containing numbers of bins for the histogram and the kernel density estimation or numbers of nearest
neighbours for the k-nearest neighbour. There is no genuine rule to identify or . Consequently,
the REBMIX algorithm identifies them from the set K of input values by
minimizing the information criterion. The Sturges rule , rule or RootN
rule can be applied to estimate the limiting numbers of bins
or the rule of thumb to guess the intermediate number of nearest neighbours. If, e.g., K = c(10, 20, 40, 60) and minimum IC coincides, e.g., 40, brackets are set to 20 and 60 and the golden section is applied to refine the minimum search. See also kseq for sequence of bins or nearest neighbours generation. The default value is "auto".
ymin:a vector of length containing minimum observations. The default value is numeric().
ymax:a vector of length containing maximum observations. The default value is numeric().
ar:acceleration rate . The default value is 0.1 and in most cases does not have to be altered.
Restraints:a character giving the restraints type. One of "rigid" or default "loose".
The rigid restraints are obsolete and applicable for well separated components only.
Mode:a character giving the mode type. One of "all", "outliers" or default "outliersplus".The modes are determined in decreasing order of magnitude from all observations if Mode = "all".
If Mode = "outliers", the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed.
If Mode = "outliersplus", the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued.
w:a list of vectors of length containing component weights summing to 1.
Theta:a list of lists each containing parametric family types pdfl. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or circular "vonMises" defined for .
Component parameters theta1.l follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions, for Weibull, gamma, binomial, Poisson and Dirac distributions and for uniform distribution.
Component parameters theta2.l follow theta1.l. One of for normal, lognormal and Gumbel distributions, for Weibull and gamma distributions, for binomial distribution, for von Mises distribution and for uniform distribution.
Component parameters theta3.l follow theta2.l. One of for Gumbel distribution.
summary:a data frame with additional information about dataset, preprocessing, , , information criterion type,
, restraints type, mode type, optimal , optimal or , , , , , optimal ,
information criterion , log likelihood and degrees of freedom .
summary.EM:a data frame with additional information about dataset, strategy for the EM algorithm strategy,
variant of the EM algorithm variant, acceleration type acceleration, tolerance tolerance, acceleration multilplier acceleration.multiplier,
maximum allowed number of iterations maximum.iterations, number of iterations used for obtaining optimal solution opt.iterations.nbr and total number of iterations of the EM algorithm total.iterations.nbr.
pos:position in the summary data frame at which log likelihood attains its maximum.
opt.c:a list of vectors containing numbers of components for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours for the k-nearest neighbour.
opt.IC:a list of vectors containing information criteria for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours for the k-nearest neighbour.
opt.logL:a list of vectors containing log likelihoods for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours for the k-nearest neighbour.
opt.Dmin:a list of vectors containing values for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours for the k-nearest neighbour.
opt.D:a list of vectors containing totals of positive relative deviations for optimal for the histogram and the kernel density estimation or for optimal number of nearest
neighbours for the k-nearest neighbour.
all.K:a list of vectors containing all processed numbers of bins for the histogram and the kernel density estimation or all processed numbers of nearest
neighbours for the k-nearest neighbour.
all.IC:a list of vectors containing information criteria for all processed numbers of bins for the histogram and the kernel density estimation or for all processed numbers of nearest
neighbours for the k-nearest neighbour.
Marko Nagode
Returns as default the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model equals "REBMVNORM" output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
## S4 method for signature 'REBMIX' REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(), cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(), theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto", ymin = numeric(), ymax = numeric(), ar = 0.1, Restraints = "loose", Mode = "outliersplus", EMcontrol = NULL, ...) ## ... and for other signatures ## S4 method for signature 'REBMIX' summary(object, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(), cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(), theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto", ymin = numeric(), ymax = numeric(), ar = 0.1, Restraints = "loose", Mode = "outliersplus", EMcontrol = NULL, ...) ## ... and for other signatures ## S4 method for signature 'REBMIX' summary(object, ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset |
a list of length |
Preprocessing |
a character giving the preprocessing type. One of |
cmax |
maximum number of components |
cmin |
minimum number of components |
Criterion |
a character giving the information criterion type. One of default Akaike |
pdf |
a character vector of length |
theta1 |
a vector of length |
theta2 |
a vector of length |
theta3 |
a vector of length |
K |
a character or a vector or a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
ar |
acceleration rate |
Restraints |
a character giving the restraints type. One of |
Mode |
a character giving the mode type. One of |
EMcontrol |
an object of class |
object |
see Methods section below. |
... |
currently not used. |
Returns an object of class REBMIX or REBMVNORM.
signature(model = "REBMIX")a character giving the default class name "REBMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "REBMVNORM")a character giving the class name "REBMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
signature(object = "REBMIX")an object of class REBMIX.
signature(object = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153):
65-66, 1926. https://www.jstor.org/stable/2965501.
P. F. Velleman. Interactive computing for exploratory data analysis I: display algorithms. Proceedings of the Statistical Computing Section,
American Statistical Association, 1976.
W. J. Dixon and R. A. Kronmal. The Choice of origin and scale for graphs. Journal of the ACM, 12(2):
259-261, 1965. doi:10.1145/321264.321277.
M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the
rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223,
1998. doi:10.1016/S0142-1123(97)00106-0.
M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed
weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. doi:10.1016/S0142-1123(99)00112-7.
M. Nagode, J. Klemenc and M. Fajdiga. Parametric modelling and scatter prediction of rainflow
matrices. International Journal of Fatigue, 23(6):525-532, 2001. doi:10.1016/S0142-1123(01)00007-X.
M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability
Engineering & System Safety, 91(4):388-397, 2006. doi:10.1016/j.ress.2005.02.005.
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
# Generate and plot univariate normal dataset. n <- c(998, 263, 1086, 487) Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal") a.theta1(Theta) <- c(688, 265, 30, 934) a.theta2(Theta) <- c(72, 54, 34, 28) normal <- RNGMIX(Dataset.name = "complex1", rseed = -1, n = n, Theta = a.Theta(Theta)) normal a.Dataset(normal, 1)[1:20,] # Estimate number of components, component weights and component parameters. normalest <- REBMIX(Dataset = a.Dataset(normal), Preprocessing = "h", cmax = 8, Criterion = "BIC", pdf = "n") normalest BIC(normalest) logL(normalest) # Plot finite mixture. plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000) # EM algorithm utilization # Load iris data. data(iris) Dataset <- list(data.frame(iris[, c(1:4)])) # Create EM.Control object. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000) # Mixture parameter estimation using REBMIX and EM algorithm. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "histogram", cmax = 10, Criterion = "BIC", EMcontrol = EM) irisest # Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot. a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)# Generate and plot univariate normal dataset. n <- c(998, 263, 1086, 487) Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal") a.theta1(Theta) <- c(688, 265, 30, 934) a.theta2(Theta) <- c(72, 54, 34, 28) normal <- RNGMIX(Dataset.name = "complex1", rseed = -1, n = n, Theta = a.Theta(Theta)) normal a.Dataset(normal, 1)[1:20,] # Estimate number of components, component weights and component parameters. normalest <- REBMIX(Dataset = a.Dataset(normal), Preprocessing = "h", cmax = 8, Criterion = "BIC", pdf = "n") normalest BIC(normalest) logL(normalest) # Plot finite mixture. plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000) # EM algorithm utilization # Load iris data. data(iris) Dataset <- list(data.frame(iris[, c(1:4)])) # Create EM.Control object. EM <- new("EM.Control", strategy = "exhaustive", variant = "EM", acceleration = "fixed", tolerance = 1e-4, acceleration.multiplier = 1.0, maximum.iterations = 1000) # Mixture parameter estimation using REBMIX and EM algorithm. irisest <- REBMIX(model = "REBMVNORM", Dataset = Dataset, Preprocessing = "histogram", cmax = 10, Criterion = "BIC", EMcontrol = EM) irisest # Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot. a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)
"REBMIX.boot"
Object of class REBMIX.boot.
Objects can be created by calls of the form new("REBMIX.boot", ...). Accessor methods for the slots are a.rseed(x = NULL),
a.pos(x = NULL), a.Bootstrap(x = NULL), a.B(x = NULL), a.n(x = NULL), a.replace(x = NULL), a.prob(x = NULL),
a.c(x = NULL), a.c.se(x = NULL), a.c.cv(x = NULL), a.c.mode(x = NULL), a.c.prob(x = NULL), a.w(x = NULL),
a.w.se(x = NULL), a.w.cv(x = NULL), a.Theta(x = NULL), a.Theta.se(x = NULL) and a.Theta.cv(x = NULL), where x stands for an object of class REBMIX.boot.
x:an object of class REBMIX.
rseed:set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it.
For each next bootstrap dataset the random seed is decremented . The default value is -1.
pos:a desired row number in x@summary to be bootstrapped. The default value is 1.
Bootstrap:a character giving the bootstrap type. One of default "parametric" or "nonparametric".
B:number of bootstrap datasets. The default value is 100.
n:number of observations. The default value is numeric().
replace:logical. The sampling is with replacement if TRUE, see also sample. The default value is TRUE.
prob:a vector of length containing probability weights, see also sample. The default value is numeric().
c:a vector containing numbers of components for bootstrap datasets.
c.se:standard error of numbers of components c.
c.cv:coefficient of variation of numbers of components c.
c.mode:mode of numbers of components c.
c.prob:probability of mode c.mode.
w:a matrix containing component weights for bootstrap datasets.
w.se:a vector containing standard errors of component weights w.
w.cv:a vector containing coefficients of variation of component weights w.
Theta:a list of matrices containing component parameters theta1.l, theta2.l and theta3.l for bootstrap datasets.
Theta.se:a list of vectors containing standard errors of component parameters theta1.l, theta2.l and theta3.l.
Theta.cv:a list of vectors containing coefficients of variation of component parameters theta1.l, theta2.l and theta3.l.
Marko Nagode
"RNGMIX"
Object of class RNGMIX.
Objects can be created by calls of the form new("RNGMIX", ...). Accessor methods for the slots are a.Dataset.name(x = NULL),
a.rseed(x = NULL), a.n(x = NULL), a.Theta(x = NULL), a.Dataset(x = NULL, pos = 0),
a.Zt(x = NULL), a.w(x = NULL), a.Variables(x = NULL), a.ymin(x = NULL) and a.ymax(x = NULL),
where x and pos stand for an object of class RNGMIX and a desired slot item, respectively.
Dataset.name:a character vector containing list names of data frames of size that d-dimensional datasets are written in.
rseed:set the random seed to any negative integer value to initialize the sequence. The first file in Dataset.name corresponds to it.
For each next file the random seed is decremented . The default value is -1.
n:a vector containing numbers of observations in classes , where number of observations .
Theta:a list containing parametric family types pdfl. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or circular "vonMises" defined for .
Component parameters theta1.l follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions, for Weibull, gamma, binomial, Poisson and Dirac distributions and for uniform distribution.
Component parameters theta2.l follow theta1.l. One of for normal, lognormal and Gumbel distributions, for Weibull and gamma distributions, for binomial distribution, for von Mises distribution and for uniform distribution.
Component parameters theta3.l follow theta2.l. One of for Gumbel distribution.
Dataset:a list of length of data frames of size containing d-dimensional datasets. Each of the columns represents one random variable. Numbers of observations equal the number of rows
in the datasets.
Zt:a factor of true cluster membership.
w:a vector of length containing component weights summing to 1.
Variables:a character vector containing types of variables. One of "continuous" or "discrete".
ymin:a vector of length containing minimum observations.
ymax:a vector of length containing maximum observations.
Marko Nagode
Returns as default the RNGMIX univariate or multivariate random datasets for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
If model equals "RNGMVNORM" multivariate random datasets for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices are returned.
## S4 method for signature 'RNGMIX' RNGMIX(model = "RNGMIX", Dataset.name = character(), rseed = -1, n = numeric(), Theta = list(), ...) ## ... and for other signatures## S4 method for signature 'RNGMIX' RNGMIX(model = "RNGMIX", Dataset.name = character(), rseed = -1, n = numeric(), Theta = list(), ...) ## ... and for other signatures
model |
see Methods section below. |
Dataset.name |
a character vector containing list names of data frames of size |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first file in |
n |
a vector containing numbers of observations in classes |
Theta |
a list containing |
... |
currently not used. |
RNGMIX is based on the "Minimal" random number generator ran1 of Park and Miller with the Bays-Durham shuffle and added safeguards that returns a uniform random deviate between 0.0 and 1.0
(exclusive of the endpoint values).
Returns an object of class RNGMIX or RNGMVNORM.
signature(model = "RNGMIX")a character giving the default class name "RNGMIX" for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
signature(model = "RNGMVNORM")a character giving the class name "RNGMVNORM" for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Marko Nagode
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 1992.
devAskNewPage(ask = TRUE) # Generate and print multivariate normal datasets with diagonal # variance-covariance matrices. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = rep("normal", 4)) a.theta1(Theta, 1) <- c(10, 12, 10, 12) a.theta1(Theta, 2) <- c(8.5, 10.5, 8.5, 10.5) a.theta1(Theta, 3) <- c(12, 14, 12, 14) a.theta1(Theta, 4) <- c(13, 15, 7, 9) a.theta1(Theta, 5) <- c(7, 9, 13, 15) a.theta2(Theta, 1) <- c(1, 1, 1, 1) a.theta2(Theta, 2) <- c(1, 1, 1, 1) a.theta2(Theta, 3) <- c(1, 1, 1, 1) a.theta2(Theta, 4) <- c(2, 2, 2, 2) a.theta2(Theta, 5) <- c(3, 3, 3, 3) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:25, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 22, nrow = 2, ncol = 3) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. n <- c(200, 50, 50) Theta <- new("RNGMVNORM.Theta", c = 3, d = 3) a.theta1(Theta, 1) <- c(0, 0, 0) a.theta1(Theta, 2) <- c(-6, 3, 6) a.theta1(Theta, 3) <- c(6, 6, 4) a.theta2(Theta, 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta2(Theta, 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) a.theta2(Theta, 3) <- c(4, 3.2, 2.8, 3.2, 4, 2.4, 2.8, 2.4, 2) simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("simulated_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 2, nrow = 3, ncol = 1) # Generate and print multivariate mixed continuous-discrete datasets. n <- c(400, 100, 500) Theta <- new("RNGMIX.Theta", c = 3, pdf = c("lognormal", "Poisson", "binomial", "Weibull")) a.theta1(Theta, 1) <- c(1, 2, 10, 2) a.theta1(Theta, 2) <- c(3.5, 10, 10, 10) a.theta1(Theta, 3) <- c(2.5, 15, 10, 25) a.theta2(Theta, 1) <- c(0.3, NA, 0.9, 3) a.theta2(Theta, 2) <- c(0.2, NA, 0.1, 7) a.theta2(Theta, 3) <- c(0.4, NA, 0.7, 20) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 4, nrow = 2, ncol = 3) # Generate and print univariate mixed Weibull dataset. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = "Weibull") a.theta1(Theta) <- c(12, 10, 14, 15, 9) a.theta2(Theta) <- c(2, 4.1, 3.2, 7.1, 5.3) simulated <- RNGMIX(Dataset.name = "simulated", rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 1) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. # Set dimension, dataset size, number of components and seed. d <- 2; n <- 1000; c <- 10; set.seed(123) # Component weights are generated. w <- runif(c, 0.1, 0.9); w <- w / sum(w) # Set range of means and rang of eigenvalues. mu <- c(-100, 100); lambda <- c(1, 100) # Component means and variance-covariance matrices are calculated. Mu <- list(); Sigma <- list() for (l in 1:c) { Mu[[l]] <- runif(d, mu[1], mu[2]) Lambda <- diag(runif(d, lambda[1], lambda[2]), nrow = d, ncol = d) P <- svd(matrix(runif(d * d, -1, 1), nc = d))$u Sigma[[l]] <- P } # Numbers of observations are calculated and component means and # variance-covariance matrices are stored. n <- round(w * n); Theta <- list() for (l in 1:c) { Theta[[paste0("pdf", l)]] <- rep("normal", d) Theta[[paste0("theta1.", l)]] <- Mu[[l]] Theta[[paste0("theta2.", l)]] <- as.vector(Sigma[[l]]) } # Dataset is generated. simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = "mvnorm_1", rseed = -1, n = n, Theta = Theta) plot(simulated) # Generate and print bivariate mixed uniform-Gumbel dataset. n <- c(100, 150) Theta <- new("RNGMIX.Theta", c = 2, pdf = c("uniform", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(10, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(10, 50) a.theta2(Theta, l = 2) <- c(30, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) plot(simulated)devAskNewPage(ask = TRUE) # Generate and print multivariate normal datasets with diagonal # variance-covariance matrices. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = rep("normal", 4)) a.theta1(Theta, 1) <- c(10, 12, 10, 12) a.theta1(Theta, 2) <- c(8.5, 10.5, 8.5, 10.5) a.theta1(Theta, 3) <- c(12, 14, 12, 14) a.theta1(Theta, 4) <- c(13, 15, 7, 9) a.theta1(Theta, 5) <- c(7, 9, 13, 15) a.theta2(Theta, 1) <- c(1, 1, 1, 1) a.theta2(Theta, 2) <- c(1, 1, 1, 1) a.theta2(Theta, 3) <- c(1, 1, 1, 1) a.theta2(Theta, 4) <- c(2, 2, 2, 2) a.theta2(Theta, 5) <- c(3, 3, 3, 3) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:25, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 22, nrow = 2, ncol = 3) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. n <- c(200, 50, 50) Theta <- new("RNGMVNORM.Theta", c = 3, d = 3) a.theta1(Theta, 1) <- c(0, 0, 0) a.theta1(Theta, 2) <- c(-6, 3, 6) a.theta1(Theta, 3) <- c(6, 6, 4) a.theta2(Theta, 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1) a.theta2(Theta, 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1) a.theta2(Theta, 3) <- c(4, 3.2, 2.8, 3.2, 4, 2.4, 2.8, 2.4, 2) simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("simulated_", 1:2, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 2, nrow = 3, ncol = 1) # Generate and print multivariate mixed continuous-discrete datasets. n <- c(400, 100, 500) Theta <- new("RNGMIX.Theta", c = 3, pdf = c("lognormal", "Poisson", "binomial", "Weibull")) a.theta1(Theta, 1) <- c(1, 2, 10, 2) a.theta1(Theta, 2) <- c(3.5, 10, 10, 10) a.theta1(Theta, 3) <- c(2.5, 15, 10, 25) a.theta2(Theta, 1) <- c(0.3, NA, 0.9, 3) a.theta2(Theta, 2) <- c(0.2, NA, 0.1, 7) a.theta2(Theta, 3) <- c(0.4, NA, 0.7, 20) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:5, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 4, nrow = 2, ncol = 3) # Generate and print univariate mixed Weibull dataset. n <- c(75, 100, 125, 150, 175) Theta <- new("RNGMIX.Theta", c = 5, pdf = "Weibull") a.theta1(Theta) <- c(12, 10, 14, 15, 9) a.theta2(Theta) <- c(2, 4.1, 3.2, 7.1, 5.3) simulated <- RNGMIX(Dataset.name = "simulated", rseed = -1, n = n, Theta = a.Theta(Theta)) simulated plot(simulated, pos = 1) # Generate and print multivariate normal datasets with unrestricted # variance-covariance matrices. # Set dimension, dataset size, number of components and seed. d <- 2; n <- 1000; c <- 10; set.seed(123) # Component weights are generated. w <- runif(c, 0.1, 0.9); w <- w / sum(w) # Set range of means and rang of eigenvalues. mu <- c(-100, 100); lambda <- c(1, 100) # Component means and variance-covariance matrices are calculated. Mu <- list(); Sigma <- list() for (l in 1:c) { Mu[[l]] <- runif(d, mu[1], mu[2]) Lambda <- diag(runif(d, lambda[1], lambda[2]), nrow = d, ncol = d) P <- svd(matrix(runif(d * d, -1, 1), nc = d))$u Sigma[[l]] <- P } # Numbers of observations are calculated and component means and # variance-covariance matrices are stored. n <- round(w * n); Theta <- list() for (l in 1:c) { Theta[[paste0("pdf", l)]] <- rep("normal", d) Theta[[paste0("theta1.", l)]] <- Mu[[l]] Theta[[paste0("theta2.", l)]] <- as.vector(Sigma[[l]]) } # Dataset is generated. simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = "mvnorm_1", rseed = -1, n = n, Theta = Theta) plot(simulated) # Generate and print bivariate mixed uniform-Gumbel dataset. n <- c(100, 150) Theta <- new("RNGMIX.Theta", c = 2, pdf = c("uniform", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(10, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(10, 50) a.theta2(Theta, l = 2) <- c(30, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) simulated <- RNGMIX(Dataset.name = paste("simulated_", 1, sep = ""), rseed = -1, n = n, Theta = a.Theta(Theta)) plot(simulated)
"RNGMIX.Theta"
Object of class RNGMIX.Theta.
Objects can be created by calls of the form new("RNGMIX.Theta", ...). Accessor methods for the slots are a.c(x = NULL), a.d(x = NULL),
a.pdf(x = NULL) and a.Theta(x = NULL), where x stands for an object of class RNGMIX.Theta. Setter methods
a.theta1(x = NULL, l = numeric()), a.theta2(x = NULL, l = numeric()) and a.theta3(x = NULL, l = numeric()),
a.theta1.all(x = NULL), a.theta2.all(x = NULL) and a.theta3.all(x = NULL)
are provided to write to Theta slot, where .
c:number of components . The default value is 1.
d:number of dimensions.
pdf:a character vector of length containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or "vonMises".
Theta:a list containing parametric family types pdfl. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or circular "vonMises" defined for .
Component parameters theta1.l follow the parametric family types. One of for normal, lognormal, Gumbel and von Mises distributions, for Weibull, gamma, binomial, Poisson and Dirac distributions and for uniform distribution.
Component parameters theta2.l follow theta1.l. One of for normal, lognormal and Gumbel distributions, for Weibull and gamma distributions, for binomial distribution, for von Mises distribution and for uniform distribution.
Component parameters theta3.l follow theta2.l. One of for Gumbel distribution.
Marko Nagode
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1.all(Theta) <- c(2, 10, 20, 50) a.theta2.all(Theta) <- c(0.5, 2.3, 3, 4.2) a.theta3.all(Theta) <- c(NA, 1.0, NA, -1.0) Theta Theta <- new("RNGMVNORM.Theta", c = 2, d = 3) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) ThetaTheta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1(Theta, l = 1) <- c(2, 10) a.theta2(Theta, l = 1) <- c(0.5, 2.3) a.theta3(Theta, l = 1) <- c(NA, 1.0) a.theta1(Theta, l = 2) <- c(20, 50) a.theta2(Theta, l = 2) <- c(3, 4.2) a.theta3(Theta, l = 2) <- c(NA, -1.0) Theta Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel")) a.theta1.all(Theta) <- c(2, 10, 20, 50) a.theta2.all(Theta) <- c(0.5, 2.3, 3, 4.2) a.theta3.all(Theta) <- c(NA, 1.0, NA, -1.0) Theta Theta <- new("RNGMVNORM.Theta", c = 2, d = 3) a.theta1(Theta, l = 1) <- c(2, 10, -20) a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30) Theta
These data are the results of a sensorless drive diagnosis procedure. Features are extracted from the electric current drive signals. The drive has intact and defective components. This results in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, this means by different speeds, load moments and load forces. The current signals are measured with a current probe and an oscilloscope on two phases. The original dataset contains 49 features, however, here only 3 are used, that is, features 5, 7 and 11. First class (1) are the healthy drives and the rest are the drives with fault components.
data(sensorlessdrive)data(sensorlessdrive)
sensorlessdrive is a data frame with 58509 cases (rows) and 4 variables (columns) named:
V5 continuous.
V7 continuous.
V11 continuous.
Class discrete 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or 11.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
F. Paschke1, C. Bayer, M. Bator, U. Moenks, A. Dicks, O. Enge-Rosenblatt and V. Lohweg. Sensorlose Zustandsueberwachung an Synchronmotoren.
23. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2013.
M. Bator, A. Dicks, U. Moenks and V. Lohweg. Feature extraction and reduction applied to sensorless drive diagnosis.
22. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2012. doi:10.13140/2.1.2421.5689.
## Not run: data(sensorlessdrive) # Split dataset into train (75 set.seed(3) Drive <- split(p = 0.75, Dataset = sensorlessdrive, class = 4) # Estimate number of components, component weights and component # parameters for train subsets. driveest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Drive), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. drivecla <- RCLSMIX(model = "RCLSMVNORM", x = list(driveest), Dataset = a.test(Drive), Zt = a.Zt(Drive)) drivecla summary(drivecla) ## End(Not run)## Not run: data(sensorlessdrive) # Split dataset into train (75 set.seed(3) Drive <- split(p = 0.75, Dataset = sensorlessdrive, class = 4) # Estimate number of components, component weights and component # parameters for train subsets. driveest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Drive), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. drivecla <- RCLSMIX(model = "RCLSMVNORM", x = list(driveest), Dataset = a.test(Drive), Zt = a.Zt(Drive)) drivecla summary(drivecla) ## End(Not run)
Returns (invisibly) the object containing train and test observations as well as true class membership for the test dataset.
## S4 method for signature 'numeric' split(p = 0.75, Dataset = data.frame(), class = numeric(), ...) ## S4 method for signature 'list' split(p = list(), Dataset = data.frame(), class = numeric(), ...) ## ... and for other signatures## S4 method for signature 'numeric' split(p = 0.75, Dataset = data.frame(), class = numeric(), ...) ## S4 method for signature 'list' split(p = list(), Dataset = data.frame(), class = numeric(), ...) ## ... and for other signatures
p |
see Methods section below. |
Dataset |
a data frame containing dataset |
class |
a column number in |
... |
further arguments to |
Returns an object of class RCLS.chunk.
signature(p = "numeric")a number specifying the fraction of observations for training . The default value is 0.75.
signature(p = "list")a list composed of column number p$type in Dataset containing the type membership information followed by the corresponding train p$train and test p$test values.
The default value is list().
Marko Nagode
## Not run: data(iris) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) Iris # Generate simulated dataset. N <- 1000 class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N), rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N)) type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N)) n <- 300 Dataset <- data.frame(1:n, sample(class, n)) colnames(Dataset) <- c("y", "class") # Split dataset into train (60 simulated <- split(p = 0.6, Dataset = Dataset, class = 2) simulated # Generate simulated dataset. Dataset <- data.frame(1:n, sample(class, n), sample(type, n)) colnames(Dataset) <- c("y", "class", "type") # Split dataset into train and test subsets. simulated <- split(p = list(type = 3, train = "train", test = "test"), Dataset = Dataset, class = 2) simulated ## End(Not run)## Not run: data(iris) # Split dataset into train (75 set.seed(5) Iris <- split(p = 0.75, Dataset = iris, class = 5) Iris # Generate simulated dataset. N <- 1000 class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N), rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N)) type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N)) n <- 300 Dataset <- data.frame(1:n, sample(class, n)) colnames(Dataset) <- c("y", "class") # Split dataset into train (60 simulated <- split(p = 0.6, Dataset = Dataset, class = 2) simulated # Generate simulated dataset. Dataset <- data.frame(1:n, sample(class, n), sample(type, n)) colnames(Dataset) <- c("y", "class", "type") # Split dataset into train and test subsets. simulated <- split(p = list(type = 3, train = "train", test = "test"), Dataset = Dataset, class = 2) simulated ## End(Not run)
Returns the sum of squares error at pos.
## S4 method for signature 'REBMIX' SSE(x = NULL, pos = 1, ...) ## ... and for other signatures## S4 method for signature 'REBMIX' SSE(x = NULL, pos = 1, ...) ## ... and for other signatures
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
signature(x = "REBMIX")an object of class REBMIX.
signature(x = "REBMVNORM")an object of class REBMVNORM.
Marko Nagode
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
These data are the results of an extraction process from images of faults of steel plates. There are seven different faults: Pastry (1), Z_Scratch (2), K_Scratch (3), Stains (4), Dirtiness (5), Bumps (6), Other faults (7).
data(steelplates)data(steelplates)
steelplates is a data frame with 1941 cases (rows) and 28 variables (columns) named:
X_Minimum integer.
X_Maximum integer.
Y_Minimum integer.
Y_Maximum integer.
Pixels_Areas integer.
X_Perimeter integer.
Y_Perimeter integer.
Sum_of_Luminosity integer.
Minimum_of_Luminosity integer.
Maximum_of_Luminosity integer.
Length_of_Conveyer integer.
TypeOfSteel_A300 binary.
TypeOfSteel_A400 binary.
Steel_Plate_Thickness integer.
Edges_Index continuous.
Empty_Index continuous.
Square_Index continuous.
Outside_X_Index continuous.
Edges_X_Index continuous.
Edges_Y_Index continuous.
Outside_Global_Index continuous.
LogOfAreas continuous.
Log_X_Index continuous.
Log_Y_Index continuous.
Orientation_Index continuous.
Luminosity_Index continuous.
SigmoidOfAreas continuous.
Class discrete 1, 2, 3, 4, 5, 6 or 7.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
M. Buscema, S. Terzi, W. Tastle. A new meta-classifier. Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, 2010. doi:10.1109/NAFIPS.2010.5548298.
M. Buscema. MetaNet*: The theory of independent judges. Substance Use & Misuse. 33(2):439-461, 1998. doi:10.3109/10826089809115875.
## Not run: data(steelplates) # Split dataset into train (75 set.seed(3) Steelplates <- split(p = 0.75, Dataset = steelplates, class = 28) # Estimate number of components, component weights and component # parameters for train subsets. steelplatesest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Steelplates), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. steelplatescla <- RCLSMIX(model = "RCLSMVNORM", x = list(steelplatesest), Dataset = a.test(Steelplates), Zt = a.Zt(Steelplates)) steelplatescla summary(steelplatescla) ## End(Not run)## Not run: data(steelplates) # Split dataset into train (75 set.seed(3) Steelplates <- split(p = 0.75, Dataset = steelplates, class = 28) # Estimate number of components, component weights and component # parameters for train subsets. steelplatesest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Steelplates), Preprocessing = "histogram", cmax = 15, Criterion = "BIC") # Classification. steelplatescla <- RCLSMIX(model = "RCLSMVNORM", x = list(steelplatesest), Dataset = a.test(Steelplates), Zt = a.Zt(Steelplates)) steelplatescla summary(steelplatescla) ## End(Not run)
The dataset contains amplitudes and means measured on a truck wheels.
data(truck)data(truck)
truck is a data frame with 31665 rows and 2 variables (columns) named:
Amplitude continuous.
Mean continuous.
Mitja Franko
data(truck)data(truck)
The complete data are the failure times in weeks.
data(weibull)data(weibull)
weibull is a data frame with 50 cases (rows) and 1 variables (columns) named:
Failure.Time continuous.
D. N. P. Murthy, M. Xie and R. Jiang. Weibull Models. John Wiley & Sons, New York, 2003.
data(weibull)data(weibull)
The dataset contains amplitudes and means simulated from a three component Weibull-normal mixture.
data(weibullnormal)data(weibullnormal)
weibullnormal is a data frame with 10000 rows and 2 variables (columns) named:
Amplitude continuous.
Mean continuous.
Mitja Franko
data(weibullnormal)data(weibullnormal)
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars (1-3). The analysis determined the quantities of 13 constituents: alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, colour intensity, hue, OD280/OD315 of diluted wines, and proline found in each of the three types of the wines. The number of instances in classes 1 to 3 is 59, 71 and 48, respectively.
data(wine)data(wine)
wine is a data frame with 178 cases (rows) and 14 variables (columns) named:
Alcohol continuous.
Malic.Acid continuous.
Ash continuous.
Alcalinity.of.Ash continuous.
Magnesium continuous.
Total.Phenols continuous.
Flavanoids continuous.
Nonflavanoid.Phenols continuous.
Proanthocyanins continuous.
Color.Intensity continuous.
Hue continuous.
OD280.OD315.of.Diluted.Wines continuous.
Proline continuous.
Cultivar discrete 1, 2 or 3.
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
S. J. Roberts, R. Everson and I. Rezek. Maximum certainty data partitioning. Pattern Recognition, 33(5):833-839, 2000. doi:10.1016/S0031-3203(99)00086-2.
## Not run: devAskNewPage(ask = TRUE) data(wine) # Show level attributes. levels(factor(wine[["Cultivar"]])) # Split dataset into train (75 set.seed(3) Wine <- split(p = 0.75, Dataset = wine, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Wine)) K <- c(as.integer(1 + log2(n[1])), # Minimum v follows Sturges rule. as.integer(10 * log10(n[2]))) # Maximum v follows log10 rule. K <- c(floor(K[1]^(1/13)), ceiling(K[2]^(1/13))) wineest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Wine), Preprocessing = "kernel density estimation", cmax = 10, Criterion = "ICL-BIC", pdf = rep("normal", 13), K = K[1]:K[2], Restraints = "loose", Mode = "outliersplus") plot(wineest, pos = 1, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 2, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 3, nrow = 7, ncol = 6, what = c("pdf")) # Selected chunks. winecla <- RCLSMIX(model = "RCLSMVNORM", x = list(wineest), Dataset = a.test(Wine), Zt = a.Zt(Wine)) winecla summary(winecla) # Plot selected chunks. plot(winecla, nrow = 7, ncol = 6) ## End(Not run)## Not run: devAskNewPage(ask = TRUE) data(wine) # Show level attributes. levels(factor(wine[["Cultivar"]])) # Split dataset into train (75 set.seed(3) Wine <- split(p = 0.75, Dataset = wine, class = 14) # Estimate number of components, component weights and component # parameters for train subsets. n <- range(a.ntrain(Wine)) K <- c(as.integer(1 + log2(n[1])), # Minimum v follows Sturges rule. as.integer(10 * log10(n[2]))) # Maximum v follows log10 rule. K <- c(floor(K[1]^(1/13)), ceiling(K[2]^(1/13))) wineest <- REBMIX(model = "REBMVNORM", Dataset = a.train(Wine), Preprocessing = "kernel density estimation", cmax = 10, Criterion = "ICL-BIC", pdf = rep("normal", 13), K = K[1]:K[2], Restraints = "loose", Mode = "outliersplus") plot(wineest, pos = 1, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 2, nrow = 7, ncol = 6, what = c("pdf")) plot(wineest, pos = 3, nrow = 7, ncol = 6, what = c("pdf")) # Selected chunks. winecla <- RCLSMIX(model = "RCLSMVNORM", x = list(wineest), Dataset = a.test(Wine), Zt = a.Zt(Wine)) winecla summary(winecla) # Plot selected chunks. plot(winecla, nrow = 7, ncol = 6) ## End(Not run)