Last updated: 2020-03-16
Checks: 7 0
Knit directory: paed-cf-methylation/
This reproducible R Markdown analysis was created with workflowr (version 1.6.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200224)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b268666. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: code/DNAm-based-age-predictor/
Ignored: data/CFGeneModifiers.csv
Ignored: data/Horvath-27k-probes.csv
Ignored: data/Horvath-coefficients.csv
Ignored: data/Horvath-methylation-data.csv
Ignored: data/Horvath-mini-annotation.csv
Ignored: data/Horvath-sample-data.csv
Ignored: data/ageFile-final.txt
Ignored: data/idat/
Ignored: data/processedData.RData
Ignored: data/rawPatientBetas.rds
Ignored: output/Horvath-output.csv
Ignored: output/Horvath-output2.csv
Ignored: output/age.pred
Ignored: output/stderr.txt
Ignored: output/stdout.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/dataExplore.Rmd
) and HTML (docs/dataExplore.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | 02bf97c | Jovana Maksimovic | 2020-03-16 | Build site. |
Rmd | 58398ed | Jovana Maksimovic | 2020-03-16 | wflow_publish(“analysis/dataExplore.Rmd”) |
Load all the packages required for analysis.
library(here)
library(workflowr)
#Load Packages Required for Analysis
library(limma)
library(minfi)
library(RColorBrewer)
library(missMethyl)
library(matrixStats)
library(minfiData)
library(Gviz)
library(DMRcate)
Warning: replacing previous import 'minfi::getMeth' by 'bsseq::getMeth' when
loading 'DMRcate'
library(stringr)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(IlluminaHumanMethylationEPICmanifest)
Load the EPIC array annotation data that describes the genomic context of each of the probes on the array.
# Get the EPICarray annotation data
annEPIC <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
head(annEPIC)
DataFrame with 6 rows and 46 columns
chr pos strand Name AddressA
<character> <integer> <character> <character> <character>
cg18478105 chr20 61847650 - cg18478105 46761277
cg09835024 chrX 24072640 - cg09835024 16745152
cg14361672 chr9 131463936 + cg14361672 51800947
cg01763666 chr17 80159506 + cg01763666 37768834
cg12950382 chr14 105176736 + cg12950382 8726444
cg02115394 chr13 115000168 + cg02115394 68602543
AddressB ProbeSeqA
<character> <character>
cg18478105 86644198 AAATAAATTTCACTCTCAAATCCCAATCTCATACAACAAAACAAAAACCA
cg09835024 81631976 AATAAACACCAACCCCAAACCAATCTCACTTTATTAAATTACAAAAATCA
cg14361672 7743487 ATCACTACCTAATCTATAACAAACCATTCAACCCATCCTAACATCCTACA
cg01763666 23754592 AAACAAAAATAAACAAACTCAAAATAAAAACAACTAAACTAAAACAAACA
cg12950382 76660327 ATACCAAAAAATAACAATATACTTATATATATACACATACCCAAATAACA
cg02115394 59659581 AAAATCACTACAACACCTCTAAACATTAACAAAAAAATCAAAAAAACTCA
ProbeSeqB Type
<character> <character>
cg18478105 AAATAAATTTCGCTCTCAAATCCCAATCTCGTACGACGAAACGAAAACCG I
cg09835024 AATAAACGCCGACCCCGAACCGATCTCGCTTTATTAAATTACAAAAATCG I
cg14361672 ATCACTACCTAATCTATAACGAACCATTCAACCCGTCCTAACATCCTACG I
cg01763666 GAACAAAAATAAACGAACTCAAAATAAAAACAACTAAACTAAAACAAACG I
cg12950382 GTACCGAAAAATAACAATATACTTATATATATACACGTACCCGAATAACG I
cg02115394 AAAATCGCTACGACGCCTCTAAACATTAACGAAAAAATCAAAAAAACTCG I
NextBase Color Probe_rs Probe_maf CpG_rs CpG_maf
<character> <character> <character> <numeric> <character> <numeric>
cg18478105 C Grn NA NA NA NA
cg09835024 A Red NA NA NA NA
cg14361672 T Red NA NA NA NA
cg01763666 C Grn NA NA NA NA
cg12950382 A Red rs12882277 0.378464 NA NA
cg02115394 A Red NA NA NA NA
SBE_rs SBE_maf Islands_Name Relation_to_Island
<character> <numeric> <character> <character>
cg18478105 NA NA chr20:61846843-61848103 Island
cg09835024 NA NA chrX:24072558-24073135 Island
cg14361672 NA NA chr9:131464843-131465830 N_Shore
cg01763666 NA NA OpenSea
cg12950382 NA NA OpenSea
cg02115394 NA NA chr13:115000148-115000874 Island
Forward_Sequence
<character>
cg18478105 TCCCGTCTTACGGGATGGATTTCGCTCTCAGGTCCCAGTCTCGTGCGGCGGGGCGGGGAC[CG]CAGCCGGCTGGGCGGGGAAGCCCTGAGCCGGGGAAGTCACGTGGGGCGTGTCCGGAGGCG
cg09835024 AGCCCCGTCATAGGTGGGCGCCGACCCCGAGCCGATCTCGCTTTATTAAATTACAGAAAT[CG]GTATTCAAAAAAAAAAAAAAAAAAGGGCGGGGAGGACACTCCCTCTTCTCTGTTCCCACA
cg14361672 TCACCTTCCCACCTCCTGGAGGACGCTCCTCCACGAAGTGCTGACACAACCTCCTGTAAA[CG]CAGGATGCCAGGACGGGCTGAATGGCCCGCCATAGACTAGGCAGTGACCAGCACACCTCC
cg01763666 CTGGAATGCCAGCTGCTGCTGCTGCTGCAGCTCCTCCACCTTCCTGGCCTCTCTGGCTAG[CG]CCTGCCTCAGCTTAGCTGCCTCTATCTTGAGCTCGCTCACCTCTGCCCGCCTGGCCTCTT
cg12950382 CCCTGCTGCCACCACCTCGGTGCACACACCTACTGGACGCACAGACACACGCATGCCCAC[CG]CCACTCGGGCACGTGCACACACACAAGCACACTGCCACTCTCCGGCACGCGCACACACAA
cg02115394 TTCTGGGGAAAGAAGGCTCAGCAGCCACCTGCTTTTTTGCCCGGGTGGGTGGTCCGGCCC[CG]AGCCCTCCTGACTCTCTCGCCAATGCCCAGAGGCGCCGCAGCGATTCCAGGGAGGCCGCG
SourceSeq UCSC_RefGene_Name
<character> <character>
cg18478105 CGGTCCCCGCCCCGCCGCACGAGACTGGGACCTGAGAGCGAAATCCATCC YTHDF1
cg09835024 GGTGGGCGCCGACCCCGAGCCGATCTCGCTTTATTAAATTACAGAAATCG EIF2S3
cg14361672 CGCAGGATGCCAGGACGGGCTGAATGGCCCGCCATAGACTAGGCAGTGAC PKN3
cg01763666 CGCCTGCCTCAGCTTAGCTGCCTCTATCTTGAGCTCGCTCACCTCTGCCC CCDC57
cg12950382 CGCCACTCGGGCACGTGCACACACACAAGCACACTGCCACTCTCCGGCAC INF2;INF2
cg02115394 GGAATCGCTGCGGCGCCTCTGGGCATTGGCGAGAGAGTCAGGAGGGCTCG CDC16;CDC16
UCSC_RefGene_Accession UCSC_RefGene_Group Phantom4_Enhancers
<character> <character> <character>
cg18478105 NM_017798 TSS200
cg09835024 NM_001415 TSS1500
cg14361672 NM_013355 TSS1500
cg01763666 NM_198082 Body
cg12950382 NM_022489;NM_001031714 Body;Body
cg02115394 NM_003903;NM_001078645 TSS200;TSS200
Phantom5_Enhancers DMR X450k_Enhancer HMM_Island
<character> <character> <character> <character>
cg18478105 20:61317142-61318498
cg09835024
cg14361672
cg01763666 17:77752688-77752973
cg12950382 14:104247518-104247873
cg02115394 13:114018251-114018976
Regulatory_Feature_Name Regulatory_Feature_Group
<character> <character>
cg18478105 20:61846284-61847956 Promoter_Associated
cg09835024 X:24071907-24073667 Promoter_Associated
cg14361672
cg01763666
cg12950382
cg02115394 13:115000009-115001429 Promoter_Associated
GencodeBasicV12_NAME
<character>
cg18478105 YTHDF1;YTHDF1
cg09835024 EIF2S3
cg14361672 PKN3
cg01763666
cg12950382
cg02115394 CDC16;CDC16;CDC16;CDC16;CDC16
GencodeBasicV12_Accession
<character>
cg18478105 ENST00000370334.4;ENST00000370339.3
cg09835024 ENST00000253039.4
cg14361672 ENST00000291906.4
cg01763666
cg12950382
cg02115394 ENST00000375312.3;ENST00000356221.3;ENST00000375310.1;ENST00000252457.5;ENST00000252458.6
GencodeBasicV12_Group
<character>
cg18478105 TSS200;TSS200
cg09835024 TSS200
cg14361672 TSS1500
cg01763666
cg12950382
cg02115394 TSS200;TSS1500;TSS1500;TSS1500;TSS1500
GencodeCompV12_NAME
<character>
cg18478105 YTHDF1;YTHDF1
cg09835024 EIF2S3;EIF2S3;EIF2S3
cg14361672 PKN3
cg01763666
cg12950382 INF2;INF2
cg02115394 CDC16;CDC16;CDC16;CDC16;CDC16;CDC16
GencodeCompV12_Accession
<character>
cg18478105 ENST00000370334.4;ENST00000370339.3
cg09835024 ENST00000487075.1;ENST00000423068.1;ENST00000253039.4
cg14361672 ENST00000291906.4
cg01763666
cg12950382 ENST00000474229.1;ENST00000480763.1
cg02115394 ENST00000360383.3;ENST00000356221.3;ENST00000375310.1;ENST00000494766.1;ENST00000375308.1;ENST00000252458.6
GencodeCompV12_Group
<character>
cg18478105 TSS200;TSS200
cg09835024 TSS1500;TSS1500;TSS200
cg14361672 TSS1500
cg01763666
cg12950382 5'UTR;TSS1500
cg02115394 TSS200;TSS1500;TSS1500;TSS1500;TSS1500;TSS1500
DNase_Hypersensitivity_NAME DNase_Hypersensitivity_Evidence_Count
<character> <character>
cg18478105 chr20:61847520-61847755 3
cg09835024 chrX:24072600-24073395 3
cg14361672
cg01763666 chr17:80159145-80159790 3
cg12950382
cg02115394
OpenChromatin_NAME OpenChromatin_Evidence_Count TFBS_NAME
<character> <character> <character>
cg18478105
cg09835024
cg14361672 chr9:131463740-131463970 3
cg01763666
cg12950382 chr14:105171651-105183138 6
cg02115394 chr13:114999804-115001809 6
TFBS_Evidence_Count Methyl27_Loci Methyl450_Loci Random_Loci
<character> <character> <character> <character>
cg18478105 TRUE
cg09835024 TRUE
cg14361672 TRUE
cg01763666 TRUE
cg12950382 TRUE
cg02115394 TRUE
Read the sample information and IDAT file paths into R.
# absolute path to the directory where the data is (relative to the Rstudio project)
dataDirectory <- here("data/idat")
list.files(dataDirectory, recursive = TRUE)
[1] "202900540047_R01C01_Grn.idat"
[2] "202900540047_R01C01_Red.idat"
[3] "202900540047_R02C01_Grn.idat"
[4] "202900540047_R02C01_Red.idat"
[5] "202900540047_R03C01_Grn.idat"
[6] "202900540047_R03C01_Red.idat"
[7] "202900540047_R04C01_Grn.idat"
[8] "202900540047_R04C01_Red.idat"
[9] "202900540047_R05C01_Grn.idat"
[10] "202900540047_R05C01_Red.idat"
[11] "202900540047_R06C01_Grn.idat"
[12] "202900540047_R06C01_Red.idat"
[13] "202900540047_R07C01_Grn.idat"
[14] "202900540047_R07C01_Red.idat"
[15] "202900540047_R08C01_Grn.idat"
[16] "202900540047_R08C01_Red.idat"
[17] "202900540100_R01C01_Grn.idat"
[18] "202900540100_R01C01_Red.idat"
[19] "202900540100_R02C01_Grn.idat"
[20] "202900540100_R02C01_Red.idat"
[21] "202900540100_R03C01_Grn.idat"
[22] "202900540100_R03C01_Red.idat"
[23] "202900540100_R04C01_Grn.idat"
[24] "202900540100_R04C01_Red.idat"
[25] "202900540100_R05C01_Grn.idat"
[26] "202900540100_R05C01_Red.idat"
[27] "202900540100_R06C01_Grn.idat"
[28] "202900540100_R06C01_Red.idat"
[29] "202900540100_R07C01_Grn.idat"
[30] "202900540100_R07C01_Red.idat"
[31] "202900540100_R08C01_Grn.idat"
[32] "202900540100_R08C01_Red.idat"
[33] "202900540115_R01C01_Grn.idat"
[34] "202900540115_R01C01_Red.idat"
[35] "202900540115_R02C01_Grn.idat"
[36] "202900540115_R02C01_Red.idat"
[37] "202900540115_R03C01_Grn.idat"
[38] "202900540115_R03C01_Red.idat"
[39] "202900540115_R04C01_Grn.idat"
[40] "202900540115_R04C01_Red.idat"
[41] "202900540115_R05C01_Grn.idat"
[42] "202900540115_R05C01_Red.idat"
[43] "202900540115_R06C01_Grn.idat"
[44] "202900540115_R06C01_Red.idat"
[45] "202900540115_R07C01_Grn.idat"
[46] "202900540115_R07C01_Red.idat"
[47] "202900540115_R08C01_Grn.idat"
[48] "202900540115_R08C01_Red.idat"
[49] "202905570075_R01C01_Grn.idat"
[50] "202905570075_R01C01_Red.idat"
[51] "202905570075_R02C01_Grn.idat"
[52] "202905570075_R02C01_Red.idat"
[53] "202905570075_R03C01_Grn.idat"
[54] "202905570075_R03C01_Red.idat"
[55] "202905570075_R04C01_Grn.idat"
[56] "202905570075_R04C01_Red.idat"
[57] "202905570075_R05C01_Grn.idat"
[58] "202905570075_R05C01_Red.idat"
[59] "202905570075_R06C01_Grn.idat"
[60] "202905570075_R06C01_Red.idat"
[61] "202905570075_R07C01_Grn.idat"
[62] "202905570075_R07C01_Red.idat"
[63] "202905570075_R08C01_Grn.idat"
[64] "202905570075_R08C01_Red.idat"
[65] "203013220097_R01C01_Grn.idat"
[66] "203013220097_R01C01_Red.idat"
[67] "203013220097_R02C01_Grn.idat"
[68] "203013220097_R02C01_Red.idat"
[69] "203013220097_R03C01_Grn.idat"
[70] "203013220097_R03C01_Red.idat"
[71] "203013220097_R04C01_Grn.idat"
[72] "203013220097_R04C01_Red.idat"
[73] "203013220097_R05C01_Grn.idat"
[74] "203013220097_R05C01_Red.idat"
[75] "203013220097_R06C01_Grn.idat"
[76] "203013220097_R06C01_Red.idat"
[77] "203013220097_R07C01_Grn.idat"
[78] "203013220097_R07C01_Red.idat"
[79] "203013220097_R08C01_Grn.idat"
[80] "203013220097_R08C01_Red.idat"
[81] "Assessment for poor quality samples.pdf"
[82] "bVals.rds"
[83] "Clustering of cell types.pdf"
[84] "Cross Reactive Probes EPIC array.txt"
[85] "Data Exploration MDS (No Legend).pdf"
[86] "Data Exploration MDS 1 (legend).pdf"
[87] "Data Exploration MDS 2 no legend.pdf"
[88] "DMPs.csv"
[89] "Jovana Workflow Analysis 4.12.18"
[90] "Normalisation (with legend).pdf"
[91] "Normalised (no legend).pdf"
[92] "processedData.RData"
[93] "qcReport.pdf"
[94] "Samplesheet.csv"
# read in the sample sheet for the experiment
targets <- read.csv(here("data/idat/Samplesheet.csv"))[-1,] # leave out the genome control sapmle
# set the path to the idat files relative to R project setup
targets$Basename <- gsub("/home/shivanthan.shanthiku/Shiv_DNAme/DNA Methylation/idat files Exp #1/",
here("data/idat/"), targets$Basename)
# clean up some labels
targets$Sample_Group <- targets$Sample_Label <- gsub("Granuloycte","Granulocyte",targets$Sample_Group)
targets$Sample_Group <- targets$Sample_Label <- gsub("Epithelialcell","EpithelialCell",targets$Sample_Group)
targets
Sample_Name Sample_Well Sample_Plate Sample_Group Sample_Label
2 103516-001-002 B01 1 EpithelialCell EpithelialCell
3 103516-001-003 C01 1 Case Case
4 103516-001-004 D01 1 Control Control
5 103516-001-005 E01 1 Case Case
6 103516-001-006 F01 1 Control Control
7 103516-001-007 G01 1 Case Case
8 103516-001-008 H01 1 Control Control
9 103516-001-009 A02 1 Case Case
10 103516-001-010 B02 1 Control Control
11 103516-001-011 C02 1 Case Case
12 103516-001-012 D02 1 Control Control
13 103516-001-013 E02 1 Case Case
14 103516-001-014 F02 1 Control Control
15 103516-001-015 G02 1 Case Case
16 103516-001-016 H02 1 Control Control
17 103516-001-017 A03 1 Case Case
18 103516-001-018 B03 1 Control Control
19 103516-001-019 C03 1 Case Case
20 103516-001-020 D03 1 Control Control
21 103516-001-021 E03 1 Case Case
22 103516-001-022 F03 1 Control Control
23 103516-001-023 G03 1 Case Case
24 103516-001-024 H03 1 Control Control
25 103516-001-025 A04 1 Case Case
26 103516-001-026 B04 1 Control Control
27 103516-001-027 C04 1 Case Case
28 103516-001-028 D04 1 Control Control
29 103516-001-029 E04 1 Case Case
30 103516-001-030 F04 1 Control Control
31 103516-001-031 G04 1 Control Control
32 103516-001-033 A05 1 Macrophage Macrophage
33 103516-001-034 B05 1 Macrophage Macrophage
34 103516-001-035 C05 1 Macrophage Macrophage
35 103516-001-036 D05 1 Granulocyte Granulocyte
36 103516-001-037 E05 1 Granulocyte Granulocyte
37 103516-001-038 F05 1 Granulocyte Granulocyte
38 103516-001-039 G05 1 Lymphocyte Lymphocyte
39 103516-001-040 H05 1 Lymphocyte Lymphocyte
Sentrix_ID Sentrix_Position
2 2.02906e+11 R02C01
3 2.02906e+11 R03C01
4 2.02906e+11 R04C01
5 2.02906e+11 R05C01
6 2.02906e+11 R06C01
7 2.02906e+11 R07C01
8 2.02906e+11 R08C01
9 2.02901e+11 R01C01
10 2.02901e+11 R02C01
11 2.02901e+11 R03C01
12 2.02901e+11 R04C01
13 2.02901e+11 R05C01
14 2.02901e+11 R06C01
15 2.02901e+11 R07C01
16 2.02901e+11 R08C01
17 2.03013e+11 R01C01
18 2.03013e+11 R02C01
19 2.03013e+11 R03C01
20 2.03013e+11 R04C01
21 2.03013e+11 R05C01
22 2.03013e+11 R06C01
23 2.03013e+11 R07C01
24 2.03013e+11 R08C01
25 2.02901e+11 R01C01
26 2.02901e+11 R02C01
27 2.02901e+11 R03C01
28 2.02901e+11 R04C01
29 2.02901e+11 R05C01
30 2.02901e+11 R06C01
31 2.02901e+11 R07C01
32 2.02901e+11 R01C01
33 2.02901e+11 R02C01
34 2.02901e+11 R03C01
35 2.02901e+11 R04C01
36 2.02901e+11 R05C01
37 2.02901e+11 R06C01
38 2.02901e+11 R07C01
39 2.02901e+11 R08C01
Basename
2 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R02C01
3 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R03C01
4 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R04C01
5 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R05C01
6 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R06C01
7 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R07C01
8 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202905570075_R08C01
9 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R01C01
10 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R02C01
11 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R03C01
12 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R04C01
13 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R05C01
14 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R06C01
15 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R07C01
16 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540115_R08C01
17 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R01C01
18 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R02C01
19 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R03C01
20 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R04C01
21 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R05C01
22 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R06C01
23 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R07C01
24 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/203013220097_R08C01
25 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R01C01
26 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R02C01
27 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R03C01
28 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R04C01
29 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R05C01
30 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R06C01
31 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540100_R07C01
32 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R01C01
33 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R02C01
34 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R03C01
35 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R04C01
36 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R05C01
37 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R06C01
38 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R07C01
39 /oshlack_lab/jovana.maksimovic/projects/MCRI/shivanthan.shanthikumar/paed-cf-methylation/data/idat/202900540047_R08C01
Sample_source
2 A1
3 52H
4 61G
5 26G
6 14G
7 29G
8 M1C005F
9 54F12
10 89C
11 55F12
12 32G
13 12F11
14 50G
15 06F
16 62H
17 23EE10
18 37H
19 53F
20 41G
21 48i
22 25G12
23 45G
24 18H12
25 21E
26 04G11
27 76B10
28 78B10
29 57G
30 08F
31 008G
32 M1
33 M2
34 M3
35 G1
36 G2
37 G3
38 L1
39 L2
Read in the raw methylation data.
# read in the raw data from the IDAT files
rgSet <- read.metharray.exp(targets=targets)
rgSet
class: RGChannelSet
dim: 1051815 38
metadata(0):
assays(2): Green Red
rownames(1051815): 1600101 1600111 ... 99810990 99810992
rowData names(0):
colnames(38): 202905570075_R02C01 202905570075_R03C01 ...
202900540047_R07C01 202900540047_R08C01
colData names(10): Sample_Name Sample_Well ... Sample_source filenames
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
# give the samples descriptive names
targets$ID <- paste(targets$Sample_Group,targets$Sample_Name,sep=".")
sampleNames(rgSet) <- targets$ID
rgSet
class: RGChannelSet
dim: 1051815 38
metadata(0):
assays(2): Green Red
rownames(1051815): 1600101 1600111 ... 99810990 99810992
rowData names(0):
colnames(38): EpithelialCell.103516-001-002 Case.103516-001-003 ...
Lymphocyte.103516-001-039 Lymphocyte.103516-001-040
colData names(10): Sample_Name Sample_Well ... Sample_source filenames
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Calculate the detection P-values for each probe so that we can check for any failed samples.
# QUALITY CONTROL
# calculate the detection p-values
detP <- detectionP(rgSet)
head(detP)
EpithelialCell.103516-001-002 Case.103516-001-003
cg18478105 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00
cg12950382 8.335193e-55 1.249933e-203
cg02115394 0.000000e+00 0.000000e+00
Control.103516-001-004 Case.103516-001-005 Control.103516-001-006
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 1.076029e-203 3.638136e-83 9.125171e-225
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Case.103516-001-007 Control.103516-001-008 Case.103516-001-009
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 6.462425e-67 2.744325e-110 2.623143e-158
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Control.103516-001-010 Case.103516-001-011 Control.103516-001-012
cg18478105 0.000000e+00 0.000000e+00 0
cg09835024 0.000000e+00 0.000000e+00 0
cg14361672 0.000000e+00 0.000000e+00 0
cg01763666 0.000000e+00 0.000000e+00 0
cg12950382 2.381862e-107 1.570558e-220 0
cg02115394 0.000000e+00 0.000000e+00 0
Case.103516-001-013 Control.103516-001-014 Case.103516-001-015
cg18478105 0.000000e+00 0.000000e+00 0.00000e+00
cg09835024 0.000000e+00 0.000000e+00 0.00000e+00
cg14361672 0.000000e+00 0.000000e+00 0.00000e+00
cg01763666 0.000000e+00 0.000000e+00 0.00000e+00
cg12950382 1.741536e-87 6.564336e-269 2.01803e-135
cg02115394 0.000000e+00 0.000000e+00 0.00000e+00
Control.103516-001-016 Case.103516-001-017 Control.103516-001-018
cg18478105 0.000000e+00 0.00000e+00 0.000000e+00
cg09835024 0.000000e+00 0.00000e+00 0.000000e+00
cg14361672 0.000000e+00 0.00000e+00 0.000000e+00
cg01763666 0.000000e+00 0.00000e+00 0.000000e+00
cg12950382 2.982542e-62 4.78107e-112 2.030454e-126
cg02115394 0.000000e+00 0.00000e+00 0.000000e+00
Case.103516-001-019 Control.103516-001-020 Case.103516-001-021
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 2.338471e-51 8.236197e-155 1.190681e-98
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Control.103516-001-022 Case.103516-001-023 Control.103516-001-024
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 8.692799e-115 1.096844e-116 8.419923e-50
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Case.103516-001-025 Control.103516-001-026 Case.103516-001-027
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 2.457464e-49 2.618101e-61 1.238714e-109
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Control.103516-001-028 Case.103516-001-029 Control.103516-001-030
cg18478105 0.000000e+00 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00 0.000000e+00
cg12950382 1.418763e-118 2.516727e-167 5.036695e-91
cg02115394 0.000000e+00 0.000000e+00 0.000000e+00
Control.103516-001-031 Macrophage.103516-001-033
cg18478105 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00
cg12950382 4.294179e-55 4.021151e-92
cg02115394 0.000000e+00 0.000000e+00
Macrophage.103516-001-034 Macrophage.103516-001-035
cg18478105 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00
cg12950382 9.192242e-132 3.284289e-147
cg02115394 0.000000e+00 0.000000e+00
Granulocyte.103516-001-036 Granulocyte.103516-001-037
cg18478105 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00
cg12950382 2.960354e-96 2.999361e-56
cg02115394 0.000000e+00 0.000000e+00
Granulocyte.103516-001-038 Lymphocyte.103516-001-039
cg18478105 0.000000e+00 0.000000e+00
cg09835024 0.000000e+00 0.000000e+00
cg14361672 0.000000e+00 0.000000e+00
cg01763666 0.000000e+00 0.000000e+00
cg12950382 4.455051e-121 7.075156e-93
cg02115394 0.000000e+00 0.000000e+00
Lymphocyte.103516-001-040
cg18478105 0.0000e+00
cg09835024 0.0000e+00
cg14361672 0.0000e+00
cg01763666 0.0000e+00
cg12950382 8.4912e-137
cg02115394 0.0000e+00
# examine mean detection p-values across all samples to identify any failed samples
pal <- brewer.pal(8, "Dark2")
par(mfrow=c(1,2))
barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2,
cex.names=0.8,ylab="Mean detection p-values")
abline(h=0.01,col="red")
legend("top", legend=levels(factor(targets$Sample_Group)), fill=pal,
bg="white")
barplot(colMeans(detP), col=pal[factor(targets$Sample_Group)], las=2,
cex.names=0.8, ylim = c(0,0.001), ylab="Mean detection p-values")
legend("top", legend=levels(factor(targets$Sample_Group)), fill=pal, bg="white")
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Normalise the data.
## NORMALISATION (decision made to used preprocessQuantile as tissue types are NOT vastly different)
# normalize the data; this results in a GenomicRatioSet object
mSetSq <- preprocessQuantile(rgSet)
[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.
# create a MethylSet object from the raw data for plotting
mSetRaw <- preprocessRaw(rgSet)
# visualise what the data looks like before and after normalisation
par(mfrow=c(1,2))
densityPlot(rgSet, sampGroups=targets$Sample_Group,main="Raw", legend=FALSE)
legend("top", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(8,"Dark2"))
densityPlot(getBeta(mSetSq), sampGroups=targets$Sample_Group,
main="Normalized", legend=FALSE)
legend("top", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(8,"Dark2"))
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Explore the data to look for any structure. When we colour ALL the samples by sample type and then by predicted sex, we can see that the largest source of variation is sex, which corresponds to the first principal component.
# DATA EXPLORATION
mDat <- getM(mSetSq)
# MDS plots to look at largest sources of variation
par(mfrow=c(1,2))
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)],pch=19)
legend("topleft", legend=levels(factor(targets$Sample_Group)), text.col=pal,
bg="white", cex=0.7)
sex <- getSex(mSetSq)
Warning in .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff):
An inconsistency was encountered while determining sex. One possibility is
that only one sex is present. We recommend further checks, for example with the
plotSex function.
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex)],pch=19)
legend("topleft", legend=levels(factor(sex$predictedSex)), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
When we colour ONLY the patient samples by sex and disease status, a clear separation by sex in principal component one can been seen.
par(mfrow=c(1,2))
patients <- targets$Sample_Group %in% c("Case","Control")
mDat <- getM(mSetSq[,patients])
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex[patients])],pch=19)
legend("topright", legend=levels(factor(sex$predictedSex[patients])), text.col=pal,
bg="white", cex=0.7)
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group[patients])],pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group[patients])), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Examine the top 4 principal componets for obvious sources of variation. ALL samples are coloured by type. No clear pattern is emerging at this stage.
# Examine higher dimensions to look at other sources of variation
par(mfrow=c(1,3))
mDat <- getM(mSetSq)
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(1,3),pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(2,3),pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(3,4),pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Look at only the sorted cells samples, coloured by predicted sex and cell type. As expected, the samples cluster nicely by cell type.
cells <- !targets$Sample_Group %in% c("Case","Control")
mDat <- getM(mSetSq[,cells])
par(mfrow=c(1,2))
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex[cells])],pch=19)
legend("topright", legend=levels(factor(sex$predictedSex[cells])), text.col=pal,
bg="white", cex=0.7)
plotMDS(mDat, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group[cells])],pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group[cells])), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Filter out poor performing probes, sex chromosome probes, SNP probes and cross reactive probes.
# Filtering
# ensure probes are in the same order in the mSetSq and detP objects
detP <- detP[match(featureNames(mSetSq),rownames(detP)),]
# remove any probes that have failed in one or more samples
keep <- rowSums(detP < 0.01) == ncol(mSetSq)
table(keep)
keep
FALSE TRUE
3988 861871
# subset data
mSetSqFlt <- mSetSq[keep,]
mSetSqFlt
class: GenomicRatioSet
dim: 861871 38
metadata(0):
assays(2): M CN
rownames(861871): cg14817997 cg26928153 ... cg07587934 cg16855331
rowData names(0):
colnames(38): EpithelialCell.103516-001-002 Case.103516-001-003 ...
Lymphocyte.103516-001-039 Lymphocyte.103516-001-040
colData names(13): Sample_Name Sample_Well ... yMed predictedSex
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.32.0
Manifest version: 0.3.0
# if your data includes males and females, remove probes on the sex chromosomes
keep <- !(featureNames(mSetSqFlt) %in% annEPIC$Name[annEPIC$chr %in% c("chrX","chrY")])
table(keep)
keep
FALSE TRUE
19149 842722
mSetSqFlt <- mSetSqFlt[keep,]
# remove probes with SNPs at CpG site
mSetSqFlt <- dropLociWithSnps(mSetSqFlt)
mSetSqFlt
class: GenomicRatioSet
dim: 814007 38
metadata(0):
assays(2): M CN
rownames(814007): cg14817997 cg26928153 ... cg07660283 cg09226288
rowData names(0):
colnames(38): EpithelialCell.103516-001-002 Case.103516-001-003 ...
Lymphocyte.103516-001-039 Lymphocyte.103516-001-040
colData names(13): Sample_Name Sample_Well ... yMed predictedSex
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.32.0
Manifest version: 0.3.0
# exclude cross reactive probes
xReactiveProbes <- read.csv(file=paste(dataDirectory,
"Cross Reactive Probes EPIC array.txt",
sep="/"), stringsAsFactors=FALSE, header = FALSE)
keep <- !(featureNames(mSetSqFlt) %in% xReactiveProbes$V1)
table(keep)
keep
FALSE TRUE
38839 775168
mSetSqFlt <- mSetSqFlt[keep,]
mSetSqFlt
class: GenomicRatioSet
dim: 775168 38
metadata(0):
assays(2): M CN
rownames(775168): cg26928153 cg16269199 ... cg19565306 cg09226288
rowData names(0):
colnames(38): EpithelialCell.103516-001-002 Case.103516-001-003 ...
Lymphocyte.103516-001-039 Lymphocyte.103516-001-040
colData names(13): Sample_Name Sample_Well ... yMed predictedSex
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.32.0
Manifest version: 0.3.0
Calculate M and beta values for downstream use in analysis and visulalisation.
# calculate M-values and beta values for downstream analysis and visualisation
mVals <- getM(mSetSqFlt)
head(mVals[,1:5])
EpithelialCell.103516-001-002 Case.103516-001-003
cg26928153 2.69274029 2.6252647
cg16269199 1.80696522 1.6163797
cg13869341 2.82147909 2.7826519
cg24669183 2.04801653 2.1212532
cg26679879 -0.53550078 -0.8716944
cg22519184 -0.06306404 -0.4739589
Control.103516-001-004 Case.103516-001-005 Control.103516-001-006
cg26928153 2.5794908 2.4622957 2.4280392
cg16269199 1.8007141 1.7352034 1.5535003
cg13869341 2.9804261 2.1558786 2.0769147
cg24669183 2.0591847 1.8864816 2.0036839
cg26679879 -0.4400872 -0.9114735 -0.9197778
cg22519184 -0.2618356 -0.2399486 -0.3174489
bVals <- getBeta(mSetSqFlt)
head(bVals[,1:5])
EpithelialCell.103516-001-002 Case.103516-001-003
cg26928153 0.8660488 0.8605295
cg16269199 0.7777311 0.7540609
cg13869341 0.8760669 0.8731151
cg24669183 0.8052721 0.8131092
cg26679879 0.4082557 0.3533805
cg22519184 0.4890736 0.4186000
Control.103516-001-004 Case.103516-001-005 Control.103516-001-006
cg26928153 0.8566778 0.8464121 0.8432999
cg16269199 0.7769812 0.7690139 0.7458888
cg13869341 0.8875418 0.8167290 0.8083939
cg24669183 0.8064831 0.7871129 0.8004082
cg26679879 0.4243246 0.3471058 0.3458025
cg22519184 0.4547515 0.4585157 0.4452112
Plot the distributions of the M and beta values.
par(mfrow=c(1,2))
densityPlot(bVals, sampGroups=targets$Sample_Group, main="Beta values",
legend=FALSE, xlab="Beta values")
legend("top", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(8,"Dark2"))
densityPlot(mVals, sampGroups=targets$Sample_Group, main="M-values",
legend=FALSE, xlab="M values")
legend("topleft", legend = levels(factor(targets$Sample_Group)),
text.col=brewer.pal(8,"Dark2"))
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Look at the largest sources of variation in the data after filtering. When ALL samples are coloured by type and by sex we now much clearer clustering of the sorted cell samples and patient samples. The control samples appear to be clustering towards the granulocyte samples suggesting that they may be dominated by that cell type. We no longer see any evidence of clustering by sex. The first principal component appears to be capturing the differences between macrophages, granulocytes and lymphocytes whilst the second appears to be largely the difference between granulocytes and the other cell types.
# MDS plots to look at largest sources of variation
par(mfrow=c(1,2))
plotMDS(mVals, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)],pch=19)
legend("bottomleft", legend=levels(factor(targets$Sample_Group)), text.col=pal,
bg="white", cex=0.7)
plotMDS(mVals, top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex)],pch=19)
legend("bottomleft", legend=levels(factor(sex$predictedSex)), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Looking at only the patient samples, we no longer see any clustering by sex although there is evidence of some separation of the case and control samples in the first principle component although it is not clear cut.
par(mfrow=c(1,2))
plotMDS(mVals[,patients], top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex[patients])],pch=19)
legend("topright", legend=levels(factor(sex$predictedSex[patients])), text.col=pal,
bg="white", cex=0.7)
plotMDS(mVals[,patients], top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group[patients])],pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group[patients])), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
Looking at the top 4 principal components with ALL the samples indicates that they are dominated by the differences between the sorted cell types.
# Examine higher dimensions to look at other sources of variation
par(mfrow=c(1,3))
plotMDS(mVals, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(1,3),pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
plotMDS(mVals, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(2,3),pch=19)
legend("topleft", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
plotMDS(mVals, top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group)], dim=c(3,4),pch=19)
legend("bottomright", legend=levels(factor(targets$Sample_Group)), text.col=pal,
cex=0.7, bg="white")
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
When we look at ONLY the sorted cell samples, there is still very clear clustering by cell type and no evidence of clustering by sex.
par(mfrow=c(1,2))
plotMDS(mVals[,cells], top=1000, gene.selection="common",
col=pal[factor(sex$predictedSex[cells])],pch=19)
legend("topright", legend=levels(factor(sex$predictedSex[cells])), text.col=pal,
bg="white", cex=0.7)
plotMDS(mVals[,cells], top=1000, gene.selection="common",
col=pal[factor(targets$Sample_Group[cells])],pch=19)
legend("topright", legend=levels(factor(targets$Sample_Group[cells])), text.col=pal,
bg="white", cex=0.7)
Version | Author | Date |
---|---|---|
02bf97c | Jovana Maksimovic | 2020-03-16 |
The data appears to be of good quality and shows no eivdence of unusual sources of variation. Save the various data objects for faster downstream analysis.
save(annEPIC, mSetSqFlt, rgSet, mVals, bVals, targets,
pal, sex, patients, cells, file = here("data/idat/processedData.RData"))
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /config/RStudio/R/3.6.1/lib64/R/lib/libRblas.so
LAPACK: /config/RStudio/R/3.6.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] IlluminaHumanMethylationEPICmanifest_0.3.0
[2] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[3] stringr_1.4.0
[4] DMRcate_2.0.7
[5] Gviz_1.28.3
[6] minfiData_0.32.0
[7] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[8] IlluminaHumanMethylation450kmanifest_0.4.0
[9] missMethyl_1.20.4
[10] RColorBrewer_1.1-2
[11] minfi_1.32.0
[12] bumphunter_1.26.0
[13] locfit_1.5-9.1
[14] iterators_1.0.12
[15] foreach_1.4.8
[16] Biostrings_2.54.0
[17] XVector_0.24.0
[18] SummarizedExperiment_1.16.1
[19] DelayedArray_0.12.2
[20] BiocParallel_1.20.1
[21] matrixStats_0.56.0
[22] Biobase_2.46.0
[23] GenomicRanges_1.36.1
[24] GenomeInfoDb_1.22.0
[25] IRanges_2.20.2
[26] S4Vectors_0.24.3
[27] BiocGenerics_0.32.0
[28] limma_3.42.2
[29] here_0.1
[30] workflowr_1.6.1
loaded via a namespace (and not attached):
[1] R.utils_2.9.0 tidyselect_0.2.5
[3] RSQLite_2.1.2 AnnotationDbi_1.46.1
[5] htmlwidgets_1.3 munsell_0.5.0
[7] codetools_0.2-16 preprocessCore_1.48.0
[9] statmod_1.4.32 withr_2.1.2
[11] colorspace_1.4-1 knitr_1.28
[13] rstudioapi_0.11 git2r_0.26.1
[15] GenomeInfoDbData_1.2.1 bit64_0.9-7
[17] rhdf5_2.28.0 rprojroot_1.3-2
[19] vctrs_0.2.4 xfun_0.12
[21] biovizBase_1.32.0 BiocFileCache_1.10.2
[23] R6_2.4.1 illuminaio_0.28.0
[25] AnnotationFilter_1.8.0 bitops_1.0-6
[27] reshape_0.8.8 assertthat_0.2.1
[29] promises_1.1.0 scales_1.1.0
[31] bsseq_1.22.0 nnet_7.3-12
[33] gtable_0.3.0 methylumi_2.30.0
[35] ensembldb_2.8.0 rlang_0.4.5
[37] genefilter_1.68.0 splines_3.6.1
[39] rtracklayer_1.44.4 lazyeval_0.2.2
[41] DSS_2.34.0 acepack_1.4.1
[43] GEOquery_2.54.1 dichromat_2.0-0
[45] checkmate_1.9.4 BiocManager_1.30.10
[47] yaml_2.2.1 GenomicFeatures_1.36.4
[49] backports_1.1.5 httpuv_1.5.2
[51] Hmisc_4.2-0 tools_3.6.1
[53] nor1mix_1.3-0 ggplot2_3.3.0
[55] siggenes_1.60.0 Rcpp_1.0.3
[57] plyr_1.8.6 base64enc_0.1-3
[59] progress_1.2.2 zlibbioc_1.30.0
[61] purrr_0.3.3 RCurl_1.95-4.12
[63] BiasedUrn_1.07 prettyunits_1.0.2
[65] rpart_4.1-15 openssl_1.4.1
[67] cluster_2.1.0 fs_1.3.2
[69] magrittr_1.5 data.table_1.12.8
[71] whisker_0.4 ProtGenerics_1.16.0
[73] mime_0.9 hms_0.5.3
[75] evaluate_0.14 xtable_1.8-4
[77] XML_3.98-1.20 mclust_5.4.5
[79] gridExtra_2.3 compiler_3.6.1
[81] biomaRt_2.42.0 tibble_2.1.3
[83] crayon_1.3.4 R.oo_1.22.0
[85] htmltools_0.4.0 later_1.0.0
[87] Formula_1.2-3 tidyr_1.0.2
[89] DBI_1.0.0 ExperimentHub_1.12.0
[91] dbplyr_1.4.2 MASS_7.3-51.5
[93] rappdirs_0.3.1 Matrix_1.2-18
[95] readr_1.3.1 permute_0.9-5
[97] R.methodsS3_1.7.1 quadprog_1.5-8
[99] pkgconfig_2.0.3 GenomicAlignments_1.20.1
[101] registry_0.5-1 foreign_0.8-72
[103] xml2_1.2.5 annotate_1.62.0
[105] rngtools_1.4 pkgmaker_0.27
[107] multtest_2.40.0 beanplot_1.2
[109] ruv_0.9.7.1 bibtex_0.4.2
[111] doRNG_1.7.1 scrime_1.3.5
[113] VariantAnnotation_1.30.1 digest_0.6.25
[115] rmarkdown_2.1 base64_2.0
[117] htmlTable_1.13.2 edgeR_3.26.8
[119] DelayedMatrixStats_1.8.0 curl_4.3
[121] shiny_1.3.2 gtools_3.8.1
[123] Rsamtools_2.0.1 lifecycle_0.2.0
[125] nlme_3.1-145 Rhdf5lib_1.6.1
[127] askpass_1.1 BSgenome_1.52.0
[129] pillar_1.4.3 lattice_0.20-40
[131] httr_1.4.1 survival_2.44-1.1
[133] GO.db_3.8.2 interactiveDisplayBase_1.22.0
[135] glue_1.3.2 BiocVersion_3.10.1
[137] bit_1.1-14 stringi_1.4.6
[139] HDF5Array_1.14.3 blob_1.2.0
[141] AnnotationHub_2.18.0 org.Hs.eg.db_3.8.2
[143] latticeExtra_0.6-28 memoise_1.1.0
[145] dplyr_0.8.3
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /config/RStudio/R/3.6.1/lib64/R/lib/libRblas.so
LAPACK: /config/RStudio/R/3.6.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] IlluminaHumanMethylationEPICmanifest_0.3.0
[2] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[3] stringr_1.4.0
[4] DMRcate_2.0.7
[5] Gviz_1.28.3
[6] minfiData_0.32.0
[7] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[8] IlluminaHumanMethylation450kmanifest_0.4.0
[9] missMethyl_1.20.4
[10] RColorBrewer_1.1-2
[11] minfi_1.32.0
[12] bumphunter_1.26.0
[13] locfit_1.5-9.1
[14] iterators_1.0.12
[15] foreach_1.4.8
[16] Biostrings_2.54.0
[17] XVector_0.24.0
[18] SummarizedExperiment_1.16.1
[19] DelayedArray_0.12.2
[20] BiocParallel_1.20.1
[21] matrixStats_0.56.0
[22] Biobase_2.46.0
[23] GenomicRanges_1.36.1
[24] GenomeInfoDb_1.22.0
[25] IRanges_2.20.2
[26] S4Vectors_0.24.3
[27] BiocGenerics_0.32.0
[28] limma_3.42.2
[29] here_0.1
[30] workflowr_1.6.1
loaded via a namespace (and not attached):
[1] R.utils_2.9.0 tidyselect_0.2.5
[3] RSQLite_2.1.2 AnnotationDbi_1.46.1
[5] htmlwidgets_1.3 munsell_0.5.0
[7] codetools_0.2-16 preprocessCore_1.48.0
[9] statmod_1.4.32 withr_2.1.2
[11] colorspace_1.4-1 knitr_1.28
[13] rstudioapi_0.11 git2r_0.26.1
[15] GenomeInfoDbData_1.2.1 bit64_0.9-7
[17] rhdf5_2.28.0 rprojroot_1.3-2
[19] vctrs_0.2.4 xfun_0.12
[21] biovizBase_1.32.0 BiocFileCache_1.10.2
[23] R6_2.4.1 illuminaio_0.28.0
[25] AnnotationFilter_1.8.0 bitops_1.0-6
[27] reshape_0.8.8 assertthat_0.2.1
[29] promises_1.1.0 scales_1.1.0
[31] bsseq_1.22.0 nnet_7.3-12
[33] gtable_0.3.0 methylumi_2.30.0
[35] ensembldb_2.8.0 rlang_0.4.5
[37] genefilter_1.68.0 splines_3.6.1
[39] rtracklayer_1.44.4 lazyeval_0.2.2
[41] DSS_2.34.0 acepack_1.4.1
[43] GEOquery_2.54.1 dichromat_2.0-0
[45] checkmate_1.9.4 BiocManager_1.30.10
[47] yaml_2.2.1 GenomicFeatures_1.36.4
[49] backports_1.1.5 httpuv_1.5.2
[51] Hmisc_4.2-0 tools_3.6.1
[53] nor1mix_1.3-0 ggplot2_3.3.0
[55] siggenes_1.60.0 Rcpp_1.0.3
[57] plyr_1.8.6 base64enc_0.1-3
[59] progress_1.2.2 zlibbioc_1.30.0
[61] purrr_0.3.3 RCurl_1.95-4.12
[63] BiasedUrn_1.07 prettyunits_1.0.2
[65] rpart_4.1-15 openssl_1.4.1
[67] cluster_2.1.0 fs_1.3.2
[69] magrittr_1.5 data.table_1.12.8
[71] whisker_0.4 ProtGenerics_1.16.0
[73] mime_0.9 hms_0.5.3
[75] evaluate_0.14 xtable_1.8-4
[77] XML_3.98-1.20 mclust_5.4.5
[79] gridExtra_2.3 compiler_3.6.1
[81] biomaRt_2.42.0 tibble_2.1.3
[83] crayon_1.3.4 R.oo_1.22.0
[85] htmltools_0.4.0 later_1.0.0
[87] Formula_1.2-3 tidyr_1.0.2
[89] DBI_1.0.0 ExperimentHub_1.12.0
[91] dbplyr_1.4.2 MASS_7.3-51.5
[93] rappdirs_0.3.1 Matrix_1.2-18
[95] readr_1.3.1 permute_0.9-5
[97] R.methodsS3_1.7.1 quadprog_1.5-8
[99] pkgconfig_2.0.3 GenomicAlignments_1.20.1
[101] registry_0.5-1 foreign_0.8-72
[103] xml2_1.2.5 annotate_1.62.0
[105] rngtools_1.4 pkgmaker_0.27
[107] multtest_2.40.0 beanplot_1.2
[109] ruv_0.9.7.1 bibtex_0.4.2
[111] doRNG_1.7.1 scrime_1.3.5
[113] VariantAnnotation_1.30.1 digest_0.6.25
[115] rmarkdown_2.1 base64_2.0
[117] htmlTable_1.13.2 edgeR_3.26.8
[119] DelayedMatrixStats_1.8.0 curl_4.3
[121] shiny_1.3.2 gtools_3.8.1
[123] Rsamtools_2.0.1 lifecycle_0.2.0
[125] nlme_3.1-145 Rhdf5lib_1.6.1
[127] askpass_1.1 BSgenome_1.52.0
[129] pillar_1.4.3 lattice_0.20-40
[131] httr_1.4.1 survival_2.44-1.1
[133] GO.db_3.8.2 interactiveDisplayBase_1.22.0
[135] glue_1.3.2 BiocVersion_3.10.1
[137] bit_1.1-14 stringi_1.4.6
[139] HDF5Array_1.14.3 blob_1.2.0
[141] AnnotationHub_2.18.0 org.Hs.eg.db_3.8.2
[143] latticeExtra_0.6-28 memoise_1.1.0
[145] dplyr_0.8.3