³ÉÈËVRÊÓƵ

R Software Packages

If you are a student or researcher who analyzes genetic and genomic data, or a methodologist developing methods of analysis for such data, please download the software developed by our group. Most methods are implemented as R packages.

Ìý

Associations in high dimensional data.

  • -ÌýThis package proposes functions and algorithm to identify influential observations in high dimensional regression setting
  • Ìý-ÌýR software package to implement high-dimensional error-in-variables regression.ÌýThis package implements CoCoLasso algorithm in settings with additive error or missing data in the covariates. This package also implements a variation of the CoCoLasso algorithm called Block-Descent CoCoLasso (or BD-CoCoLasso), which focuses on a setting where only a small percentage of the features are corrupted (with additive error or missing data).
  • - Construction of a new instrumental variable that minimizes horizontal pleiotropy in the context of Mendelian randomization | Citation:
  • – A mixed model, where the fixed effects can be high dimensional and penalized (L1), and the random effects covariance may be constructed using some of the features also included in among the fixed effects. For example, for simultaneous estimation of SNP fixed effects while adjusting for family relationships using a kinship matrix constructed using overlapping SNPs. See
  • - Sparse additive interaction learning. Efficient penalized model for interactions between one key covariate and a high dimensional feature space. Sail enforces a strict hierarchy on the interaction terms.
  • – Principal components of heritability, a method for dimension reduction of a high dimensional feature space, while maximizing the variance explained by covariates | Citation:
  • – Finding p-values from a double Wishart problem | Citation:
  • - Provides tools to model and test the association between multiple genotypes and multiple traits, taking into account the prior biological knowledge. The method is based on Generalized Structured Component Analysis (GSCA) | Citation:
  • -ÌýR package for kernel semi-parametric models.Manuscript in preparation.

Ìý

Methods of analysis for DNA Methylation data.

  • - Estimating smooth covariate effects on targeted bisulfite sequencing measures of DNA methylation Manuscript submitted for publication.
  • - Hidden Markov model for estimating methylation levels and for testing for differentially methylated CpG sites | Citation: Biometrics
  • - A smoothing method for whole genome bisulfite sequencing data that allows for sequencing errors | Citation:
  • – Normalization of Illumina beadchip-derived DNA methylation data when data are from multiple tissues or cell types | Citation:
  • - Functional normalization of 450k methylation array data improves replication in large cancer studies | Citation:

Ìý

Analysis methods for rare genetic variants.

  • – AÌýmethod for estimating genome-wide significance thresholds for extremely dense genetic information, such as obtained from sequencing studies | Citation:
  • – Multivariate tests of association between rare genetic variants and two or more phenotypes | Citation:
  • – AÌýsuite of tools for rare variant analysis including non normal phenotypes and family structures consideration | Citation:
  • – Now integrated into RVPedigree | Citation:
  • – Tests for association with rare genetic variants | Citation:

Ìý

Scripts:

  • – A script to assist in preparing files for imputation using the Sanger imputation service. This repository contains scripts to prepare plink genotype files for imputation on the Sanger server.
  • - Functions to run a 450K pipeline analysis.Ìý
  • – Statistical analysis and visualization of functional profiles for genes and gene clusters.
  • – Scripts for performing cell type mixture adjustments in DNA methylation data | Citation:
  • - A pipeline to run a pcev analysis from the R package on CBRAIN.Ìý

Ìý

Microbiome Data:

  • - allows the estimation of microbiome OTU co-occurrence networks within two separate groups, where the networks are defined through precision matrices. The difference between the two precision matrices is also estimated, along with corresponding interval estimates.Manuscript submitted for publication.

Ìý

on various useful tools in analysis and research.

  1. Presentation by Greg Voisin
  2. Vignette by Greg Voisin
  3. Presentations by Sahir Bhatnagar
  4. by Sahir Bhatnagar

Ìý

For more information, visit the R project website at:

and

Ìý

Back to top