Multivariate methods are now widely used in the quantitative sciences as well as in statistics because of the ready availability of computer packages for performing the calculations. While access to suitable computer software is essential to using multivariate methods, using the software still requires a working knowledge of these methods and how t
In a recent (2009) commentary in the International Journal of Epidemiology, Tu expressed concern about the scarcity of SEM models in epidemiological research and urged epidemiologists to use SEM models more frequently [6]. With its strength as a statistical tool to analyze complex relationships among variables, and even posit and test causal relationships with non-experimental data, it allows researchers to explain the development of phenomena such as disease and health behaviors. The purpose of the present paper is to consider the potential advances that SEM can make in medical and health sciences research and provide a five step approach to implementing SEM research in epidemiology and medical research. First a description of SEM is provided, followed by applications to research. A broad categorization of statistical methods is termed 'latent variable models', which include factor analysis, item response theory, latent class models, and structural equation models [7]. The focus of the present paper in on structural equation models and the latent variable models that are included in SEM.
Multivariate Statistical Methods A Primer Pdf 11
SEM is a set of statistical methods that allows researchers to test hypotheses based on multiple constructs that may be indirectly or directly related for both linear and nonlinear models [36]. It is distinguished from other types of analyses in its ability to examine many relationships while simultaneously partialing out measurement error. It can also examine correlated measurement error to determine to what degree unknown factors influence shared error among variables - which may affect the estimated parameters of the model [37]. It also handles missing data well by fitting raw data instead of summary statistics. SEM, in addition, can be used to analyze dependent observations (e.g., twin and family data). It can, furthermore, manage longitudinal designs such as time series and growth models. For example, Dahly, Adair, and Bollen [22] developed a longitudinal latent variable medical model showing that maternal characteristics during pregnancy predicted children's blood pressure and weight approximately 20 years later while controlling for child's birth weight. Therefore, SEM can be used for a number of research designs.
Multivariate statistical methods are used to analyze the joint behavior of more than one random variable. There are a wide range of multivariate techniques available, as may be seen from the different statistical method examples below. These techniques can be done using Statgraphics Centurion 19's multivariate statistical analysis.
The Multivariate Tolerance Limits procedure creates statistical tolerance limits for data consisting of more than one variable. It includes a tolerance region that bounds a selected p% of the population with 100(1-alpha)% confidence. It also includes joint simultaneous tolerance limits for each of the variables using a Bonferroni approach. The data are assumed to be a random sample from a multivariate normal distribution. Multivariate tolerance limits are often compared to specifications for multiple variables to determine whether or not most of the population is within spec.
Beta diversity analyses or community-wide ecological analyses are important tools for understanding the differentiation of the entire microbiome between experimental conditions, environments, and treatments. For these analyses, specialized distance metrics are used to capture the multivariate relationships between each pair of samples in the dataset. Analysis of variance-like techniques, such as PERMANOVA [1], maythen be used to determine if an overall difference exists between conditions. The distances use all of the measured taxa information simultaneously without the need to explicitly estimate individual covariances. The utility of these methods is hard to underestimate as virtually every recent major microbiome report has used some form of a community-wide association analysis. On many occasions, the comparison reveals major differences between the groups. However, one is not guaranteed to find one. For example, in Redel et al. [2], the authors have found that there are significant differences in cutaneous microbiota in diabetic vs. non-diabetic subject feet, but not on their hands (see fig. 5). This lack of difference is an important indicator about the potential pathobiological processes that lead to diabetic foot ulcers. Therefore, getting the correct result in such comparisons is important. The Redel et al. analysis can ultimately be achieved by pairwise comparisons only (diabetic vs. non diabetic); however, many study designs have more than two groups that need to be considered simultaneously. Dietary intervention studies among others often include several experimental groups. For example, Cox et al. [3] analysis of the impact of diet on the murine gut microbiome included animal groups receiving low fat, high fat, and high fat with fiber supplement diets. Although it is possible to treat such design using multi-way comparisons of dietary fat and dietary fiber, a simultaneous analysis of all three groups can be more intuitive. Hence, there is a need for methods that can compare more than two experimental groups at the same time. PERMANOVA among other methods allows for such analyses.
From the statistical stand point, community-wide analyses test the hypothesis that the data from two or more conditions share the location parameter (centroid or multivariate mean). Caution, however, needs to be taken to ensure that potential violations of assumptions do not lead to adverse statistical behavior of PERMANOVA. Two such assumptions that are commonly violated are the multivariate uniformity of variability (homoscedasticity) and sample size balance. We have previously shown that simultaneous violation of both assumptions leads to PERMANOVA analysis with indiscriminate rejection and type I error inflation or to significant loss of power up to inability to make any rejections at all [4]. Unfortunately, heteroscedasticity across conditions is a very common feature of microbiome data. Thus, new robust methods are needed to ensure correct data analysis.
Community-wide analyses where the entire microbiome is modelled as a response variable of one or more factors has become a standard first line of analysis technique in the field. These techniques address the question of overall aggregate changes in the microbiome in response to explanatory variables without the need to model each individual microbiome constituent. PERMANOVA [1] has been one of the most dominant tools for such analyses, although the potential for confounding of location and dispersion effects has been recognized for a long time [32, 33]. The \(W_d^*\) method closes the gap by explicitly accounting for the differences in multivariate dispersion in the data tested, which has been shown to be associated with adverse statistical properties in PERMANOVA [4]. Current heteroscedasticity-aware methodologies allow for modeling multi-level factors, stratification, and multiple post hoc testing scenarios. Although in many applications the differences in statistical decisions made on the basis of PERMANOVA and \(W_d^*\) may remain unchanged, the principled guarantees of being correct in wider range of scenarios provided by the latter might be important for practitioners. Although originally developed for discrete-valued covariates, PERMANOVA remains a viable analysis option for continuous covariates as well when multivariate regression-like formula are utilized [34]. However, the effect of heteroscedasticity has not been rigorously evaluated or addressed for such analyses. To be fair, heteroscedasticity with continuous covariates is an issue that does not have a generic statistical solution applicable in most cases. A more cautious analysis involving continuous covariates may require corroboration with discretized independent variables by \(W_d^*\), but has to also account for potential statistical power issues pertaining to discretization.
Secondly, distinct omics datasets have their own limitations and require complex analysis pipelines prior to performing data integration. For instance, analysis of methylation data is complicated by the uneven distribution of methylation target sequences across the genome requiring specific normalization and scaling strategies [57]. Each omics platform faces unique challenges such as experimental and inherent biological noise, differences among experimental platforms and detection bias [58]. In a similar vein to processing genomic data, a supplementary step is critical to ensure interpretability in metabolomics data: metabolite identification. In an agnostic approach, where metabolites are putatively annotated, integrative analysis can be performed regardless of the metabolite identification step. However in a more specific approach, integrative analysis needs to be performed with regards to whether or not metabolite identification has been realized beforehand. For example, if metabolites have not been identified, data integration would be rather limited to almost purely statistical analysis i.e., classification purposes, prediction purposes or inference of significant variables whereas when metabolite identities are known enrichment analysis methods can be applied. Additional challenges arise since there is often not a one to one relationship between genes and metabolites.
We present ADE-4, a multivariate analysis and graphical display software. Multivariate analysis methods available in ADE-4 include usual one-table methods like principal component analysis and correspondence analysis, spatial data analysis methods (using a total variance decomposition into local and global components, analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal regression (principal component regression), projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical display techniques include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, thus providing a very efficient way for automatic k-table graphics and geographical mapping options. A dynamic graphic module allows interactive operations like searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE-4 very easy for non- specialists in statistics, data analysis or computer science. 2ff7e9595c
Comments