Last updated: 2024-11-11

Checks: 6 1

Knit directory: paed-airway-allTissues/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20230811)

The command set.seed(20230811) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: cd2a05c

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version cd2a05c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/RDS/
    Ignored:    output/.DS_Store
    Ignored:    output/CSV/.DS_Store
    Ignored:    output/G000231_Neeland_batch1/
    Ignored:    output/G000231_Neeland_batch2_1/
    Ignored:    output/G000231_Neeland_batch2_2/
    Ignored:    output/G000231_Neeland_batch3/
    Ignored:    output/G000231_Neeland_batch4/
    Ignored:    output/G000231_Neeland_batch5/
    Ignored:    output/G000231_Neeland_batch9_1/
    Ignored:    output/RDS/
    Ignored:    output/plots/

Untracked files:
    Untracked:  Annotation_Bronchial_brushings.Rmd
    Untracked:  BAL_Tcell_propeller.xlsx
    Untracked:  BAL_propeller.xlsx
    Untracked:  BB_Tcell_propeller.xlsx
    Untracked:  BB_propeller.xlsx
    Untracked:  NB_Tcell_propeller.xlsx
    Untracked:  NB_propeller.csv
    Untracked:  NB_propeller.xlsx
    Untracked:  analysis/03_Batch_Integration.Rmd
    Untracked:  analysis/Age_proportions.Rmd
    Untracked:  analysis/Age_proportions_AllBatches.Rmd
    Untracked:  analysis/Annotation_BAL.Rmd
    Untracked:  analysis/Annotation_Nasal_brushings.Rmd
    Untracked:  analysis/Batch_Integration_&_Downstream_analysis.Rmd
    Untracked:  analysis/Batch_correction_&_Downstream.Rmd
    Untracked:  analysis/Cell_cycle_regression.Rmd
    Untracked:  analysis/Master_metadata.Rmd
    Untracked:  analysis/Preprocessing_Batch1_Nasal_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch2_Tonsils.Rmd
    Untracked:  analysis/Preprocessing_Batch3_Adenoids.Rmd
    Untracked:  analysis/Preprocessing_Batch4_Bronchial_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch5_Nasal_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch6_BAL.Rmd
    Untracked:  analysis/Preprocessing_Batch7_Bronchial_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch8_Adenoids.Rmd
    Untracked:  analysis/Preprocessing_Batch9_Tonsils.Rmd
    Untracked:  analysis/TonsilsVsAdenoids.Rmd
    Untracked:  analysis/boxplot_proportions_Adenoids.pdf
    Untracked:  analysis/boxplot_proportions_BAL.pdf
    Untracked:  analysis/boxplot_proportions_Bronchial_brushings.pdf
    Untracked:  analysis/boxplot_proportions_Nasal_brushings.pdf
    Untracked:  analysis/boxplot_proportions_Tonsils.pdf
    Untracked:  analysis/cell_cycle_regression.R
    Untracked:  analysis/test.Rmd
    Untracked:  analysis/testing_age_all.Rmd
    Untracked:  color_palette.rds
    Untracked:  color_palette_Oct_2024.rds
    Untracked:  color_palette_v2_level2.rds
    Untracked:  combined_metadata.rds
    Untracked:  data/Cell_labels_Mel/
    Untracked:  data/Cell_labels_Mel_v2/
    Untracked:  data/Cell_labels_Mel_v3/
    Untracked:  data/Cell_labels_modified_Gunjan/
    Untracked:  data/Hs.c2.cp.reactome.v7.1.entrez.rds
    Untracked:  data/Raw_feature_bc_matrix/
    Untracked:  data/celltypes_Mel_GD_v3.xlsx
    Untracked:  data/celltypes_Mel_GD_v4_no_dups.xlsx
    Untracked:  data/celltypes_Mel_modified.xlsx
    Untracked:  data/celltypes_Mel_v2.csv
    Untracked:  data/celltypes_Mel_v2.xlsx
    Untracked:  data/celltypes_Mel_v2_MN.xlsx
    Untracked:  data/celltypes_for_mel_MN.xlsx
    Untracked:  data/earlyAIR_sample_sheets_combined.xlsx
    Untracked:  output/CSV/All_tissues.propeller.xlsx
    Untracked:  output/CSV/Bronchial_brushings/
    Untracked:  output/CSV/Bronchial_brushings_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/
    Untracked:  output/CSV/G000231_Neeland_Adenoids.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Bronchial_brushings.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Nasal_brushings.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Tonsils.propeller.xlsx
    Untracked:  output/CSV/Nasal_brushings/

Unstaged changes:
    Deleted:    02_QC_exploratoryPlots.Rmd
    Deleted:    02_QC_exploratoryPlots.html
    Modified:   analysis/00_AllBatches_overview.Rmd
    Modified:   analysis/01_QC_emptyDrops.Rmd
    Modified:   analysis/02_QC_exploratoryPlots.Rmd
    Modified:   analysis/Adenoids.Rmd
    Modified:   analysis/Age_modeling.Rmd
    Modified:   analysis/Age_modelling_Adenoids.Rmd
    Modified:   analysis/AllBatches_QCExploratory.Rmd
    Modified:   analysis/BAL.Rmd
    Modified:   analysis/Bronchial_brushings.Rmd
    Modified:   analysis/Nasal_brushings.Rmd
    Modified:   analysis/Subclustering_Adenoids.Rmd
    Modified:   analysis/Subclustering_BAL.Rmd
    Modified:   analysis/Subclustering_Bronchial_brushings.Rmd
    Modified:   analysis/Subclustering_Nasal_brushings.Rmd
    Modified:   analysis/Subclustering_Tonsils.Rmd
    Modified:   analysis/Tonsils.Rmd
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c0.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c1.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c10.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c11.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c12.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c13.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c14.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c15.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c16.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c17.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c2.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c3.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c4.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c5.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c6.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c7.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c8.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c9.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c0.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c1.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c10.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c11.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c12.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c13.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c14.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c15.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c16.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c17.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c2.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c3.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c4.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c5.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c6.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c7.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c8.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c9.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Preprocessing_Batch1_Nasal_brushings.Rmd) and HTML (docs/Preprocessing_Batch1_Nasal_brushings.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	c20f60f	Gunjan Dixit	2024-07-08	Updated marker gene dot plots
html	bd5ec04	Gunjan Dixit	2024-05-03	Modified index

Introduction

This RMarkdown performs quality control for the earlyAIR batch- Nasal_brushings- Batch1

The steps are: * Load CellRanger counts
* Run decontX to determine contamination and correct
* Filter cells with low library size and high mitochondrial counts
* Identify doublets
* Scale, Normalize, Run PCA, UMAP, Azimuth annotation before/after doublet removal
* Save Seurat object

suppressPackageStartupMessages({
  library(BiocStyle)
  library(BiocParallel)
  library(tidyverse)
  library(here)
  library(glue)
  library(scran)
  library(scater)
  library(scuttle)
  library(janitor)
  library(cowplot)
  library(patchwork)
  library(scales)
  library(Homo.sapiens)
  library(msigdbr)
  library(EnsDb.Hsapiens.v86)
  library(ensembldb)
  library(readr)
  library(Seurat)
  library(celda)
  library(decontX)
  library(Azimuth)
  library(Matrix)
  library(scDblFinder)
  library(scMerge)
  library(googlesheets4)
  library(lubridate)
  library(ggstats)
})
set.seed(42)

Get Batch_info

batch_path <- here("output/RDS/AllBatches_filtered_SCEs/G000231_batch1_Nasal_brushings.CellRanger_filtered.SCE.rds")

batch_info <- str_match(basename(batch_path), "^(G\\d+_batch\\d+)_([A-Za-z_]+)\\.CellRanger_filtered\\.SCE\\.rds$")
batch_name <- batch_info[, 2]
tissue <- batch_info[, 3]

sce <- readRDS(batch_path)
sce$tissue <- tissue
sce$batch_name <- batch_name

sce

class: SingleCellExperiment 
dim: 18082 43290 
metadata(0):
assays(2): counts logcounts
rownames(18082): SAMD11 NOC2L ... MT-ND6 MT-CYB
rowData names(0):
colnames(43290): AAACCAATCATGAGGTACTTTAGG-1 AAACCAGGTGTCCAATACTTTAGG-1
  ... TTTGCTGAGATTGAGCATTCGGTT-1 TTTGGCGGTAAGGTTGATTCGGTT-1
colData names(7): orig.ident nCount_RNA ... tissue batch_name
reducedDimNames(0):
mainExpName: RNA
altExpNames(0):

CellRanger calls

Filter cells with zero counts across all genes

sce <- sce[rowSums(counts(sce)) > 0, ]
sce

class: SingleCellExperiment 
dim: 17474 43290 
metadata(0):
assays(2): counts logcounts
rownames(17474): SAMD11 NOC2L ... MT-ND6 MT-CYB
rowData names(0):
colnames(43290): AAACCAATCATGAGGTACTTTAGG-1 AAACCAGGTGTCCAATACTTTAGG-1
  ... TTTGCTGAGATTGAGCATTCGGTT-1 TTTGGCGGTAAGGTTGATTCGGTT-1
colData names(7): orig.ident nCount_RNA ... tissue batch_name
reducedDimNames(0):
mainExpName: RNA
altExpNames(0):

cell_counts <- c()
cell_counts["Post CellRanger Filtering"] <- ncol(sce)

Add Barcode metadata

The first 17 characters of the barcodes are the GEM barcode and the last 9 characters are the sample barcode. Create a metadata feature for each of these.

sce$Barcode <- unname(substring(colnames(sce), first = 1, last = 26))
sce$GEM_barcode <- substring(sce$Barcode, first = 1, last = 17)
sce$sample_barcode <- substring(sce$Barcode, first = 18, last = 26)

Pre-processing

DecontX

Correcting for ambient RNA with decontX, actually replacing the raw counts with the decontX counts. These can be forced to be integers rather than doubles later if necessary, but so far it doesn’t seem to be an issue.

sce <- decontX(sce)

--------------------------------------------------

Starting DecontX

--------------------------------------------------

Mon Nov 11 15:26:58 2024 .. Analyzing all cells

Mon Nov 11 15:26:58 2024 .... Generating UMAP and estimating cell types

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Mon Nov 11 15:28:07 2024 .... Estimating contamination

Mon Nov 11 15:28:14 2024 ...... Completed iteration: 10 | converge: 0.03919

Mon Nov 11 15:28:19 2024 ...... Completed iteration: 20 | converge: 0.01257

Mon Nov 11 15:28:25 2024 ...... Completed iteration: 30 | converge: 0.009682

Mon Nov 11 15:28:31 2024 ...... Completed iteration: 40 | converge: 0.004789

Mon Nov 11 15:28:37 2024 ...... Completed iteration: 50 | converge: 0.003704

Mon Nov 11 15:28:43 2024 ...... Completed iteration: 60 | converge: 0.002934

Mon Nov 11 15:28:49 2024 ...... Completed iteration: 70 | converge: 0.002333

Mon Nov 11 15:28:55 2024 ...... Completed iteration: 80 | converge: 0.001852

Mon Nov 11 15:29:00 2024 ...... Completed iteration: 90 | converge: 0.005455

Mon Nov 11 15:29:06 2024 ...... Completed iteration: 100 | converge: 0.0021

Mon Nov 11 15:29:12 2024 ...... Completed iteration: 110 | converge: 0.001555

Mon Nov 11 15:29:18 2024 ...... Completed iteration: 120 | converge: 0.0017

Mon Nov 11 15:29:24 2024 ...... Completed iteration: 130 | converge: 0.001892

Mon Nov 11 15:29:29 2024 ...... Completed iteration: 140 | converge: 0.002202

Mon Nov 11 15:29:35 2024 ...... Completed iteration: 150 | converge: 0.002567

Mon Nov 11 15:29:41 2024 ...... Completed iteration: 160 | converge: 0.002954

Mon Nov 11 15:29:47 2024 ...... Completed iteration: 170 | converge: 0.003292

Mon Nov 11 15:29:53 2024 ...... Completed iteration: 180 | converge: 0.003486

Mon Nov 11 15:29:59 2024 ...... Completed iteration: 190 | converge: 0.003451

Mon Nov 11 15:30:04 2024 ...... Completed iteration: 200 | converge: 0.003361

Mon Nov 11 15:30:10 2024 ...... Completed iteration: 210 | converge: 0.003045

Mon Nov 11 15:30:16 2024 ...... Completed iteration: 220 | converge: 0.0025

Mon Nov 11 15:30:22 2024 ...... Completed iteration: 230 | converge: 0.001937

Mon Nov 11 15:30:28 2024 ...... Completed iteration: 240 | converge: 0.001478

Mon Nov 11 15:30:34 2024 ...... Completed iteration: 250 | converge: 0.00108

Mon Nov 11 15:30:39 2024 ...... Completed iteration: 260 | converge: 0.0009566

Mon Nov 11 15:30:39 2024 .. Calculating final decontaminated matrix

--------------------------------------------------

Completed DecontX. Total time: 3.809982 mins

--------------------------------------------------

assay(sce, "raw_counts") <- counts(sce)
counts(sce) <- decontXcounts(sce)

Filter on library size filter after running decontX

sce <- addPerCellQCMetrics(sce)
sum(sce$sum < 250)

[1] 1986

sce <- sce[, sce$sum >= 250]
cell_counts["Post low-lib Filtering"] <- ncol(sce)

Mitochondrial filtering

Filtering out cells with high mitochondrial content.

is.mito <- grepl(pattern = "^MT", rownames(sce))
sce <- addPerCellQCMetrics(sce, subsets = list(mito = is.mito))
mito_outliers <- isOutlier(sce$subsets_mito_percent, type = "higher")
sum(mito_outliers)

[1] 6626

sce <- sce[, !mito_outliers]
cell_counts["Post Mito Filtering"] <- ncol(sce)

Multiplet filtering

We know that there will be some unidentified multiplets in our data, as higher-occupancy GEMs have many ways to include multiple cells from the same samples. Still working on a way to estimate the number of these but the existing doublet-finding tools work ok. Using scDblFinder as that seemed to have the best effect on the GEM-level counts.

sce <- logNormCounts(sce) %>%
  runPCA() %>%
  runUMAP()

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Run scDblFinder

bp <- MulticoreParam(8, RNGseed=56213)
#sce <- scDblFinder(sce, clusters = T,BPPARAM=bp)

params <- list(
  dbr = list(clusters = TRUE, BPPARAM = bp, dbr.sd = 1),
  dbr_s = list(clusters = TRUE, BPPARAM = bp, dbr.sd = 1, samples = sce$sample_barcode),
  s = list(clusters = TRUE, BPPARAM = bp, samples = sce$sample_barcode),
  cl = list(clusters = TRUE, BPPARAM = bp)
)

# Run scDblFinder for each parameter set, rename columns, and merge results
for (suffix in names(params)) {
  sce_temp <- do.call(scDblFinder, c(list(sce), params[[suffix]]))
  
  # Loop through the relevant columns and rename them with the suffix
  for (colname in c("cluster", "class", "originAmbiguous", "mostLikelyOrigin", 
                    "cxds_score", "difficulty", "weighted", "score")) {
    sce[[paste0("scDblFinder.", colname, "_", suffix)]] <- sce_temp[[paste0("scDblFinder.", colname)]]
  }
}

Warning in (function (sce, clusters = NULL, samples = NULL, clustCor = NULL, :
You are trying to run scDblFinder on a very large number of cells. If these are
from different captures, please specify this using the `samples` argument.TRUE

Clustering cells...

16 clusters

Creating ~25000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 2771 cells excluded from training.

iter=1, 2834 cells excluded from training.

iter=2, 2805 cells excluded from training.

Threshold found:0.403

3079 (8.9%) doublets called

Warning in (function (sce, clusters = NULL, samples = NULL, clustCor = NULL, :
You are trying to run scDblFinder on a very large number of cells. If these are
from different captures, please specify this using the `samples` argument.TRUE

Clustering cells...

16 clusters

Creating ~25000 artificial doublets...

Dimensional reduction

Evaluating kNN...

Training model...

iter=0, 6029 cells excluded from training.

iter=1, 6055 cells excluded from training.

iter=2, 5840 cells excluded from training.

Threshold found:0.244

6013 (17.3%) doublets called

table(sce$scDblFinder.class_dbr)


singlet doublet 
  31599    3079

table(sce$scDblFinder.class_dbr_s)


singlet doublet 
  32417    2261

table(sce$scDblFinder.class_s)


singlet doublet 
  32957    1721

table(sce$scDblFinder.class_cl)


singlet doublet 
  28665    6013

Make Seurat object

seu <- CreateSeuratObject(counts(sce), meta.data = as.data.frame(colData(sce)))

Add GEM metadata to the cell-level objects

seu$cells_per_GEM <- table(seu$GEM_barcode)[seu$GEM_barcode]

table(seu$cells_per_GEM)


    1     2     3     4 
16275 12106  4953  1344

Normalization and Azimuth annotation

seu <- NormalizeData(seu, verbose = F) %>%
  FindVariableFeatures(nfeatures = 2000, verbose = F) %>%
  ScaleData(verbose = F) %>%
  RunPCA(dims = 1:30, verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'

Also defined by 'spam'

options(timeout = max(1000000, getOption("timeout")))
tmp <- RunAzimuth(seu, reference = "lungref")

detected inputs from HUMAN with id type Gene.name

reference rownames detected HUMAN with id type Gene.name

Normalizing query using reference SCT model

Projecting cell embeddings

Finding query neighbors

Finding neighborhoods

Finding anchors

    Found 26809 anchors

Finding integration vectors

Finding integration vector weights

Predicting cell labels
Predicting cell labels
Predicting cell labels
Predicting cell labels
Predicting cell labels
Predicting cell labels


Integrating dataset 2 with reference dataset

Finding integration vectors

Integrating data

Computing nearest neighbors

Running UMAP projection

16:27:45 Read 34678 rows

16:27:45 Processing block 1 of 1

16:27:45 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 20
16:27:46 Initializing by weighted average of neighbor coordinates using 1 thread
16:27:46 Commencing optimization for 67 epochs, with 693560 positive edges
16:27:48 Finished
Projecting reference PCA onto query
Finding integration vector weights
Projecting back the query cells into original PCA space
Finding integration vector weights
Computing scores:
    Finding neighbors of original query cells
    Finding neighbors of transformed query cells
    Computing query SNN
    Determining bandwidth and computing transition probabilities
Total elapsed time: 15.759418964386

seu@meta.data <- tmp@meta.data

out <- here("output",
            "RDS", "AllBatches_scDblFinder_test_SEUs",
             paste0(batch_name, "_", tissue, ".CellRanger.decontX.mito.filter.Azimuth.SEU.rds"))

saveRDS(seu, file = out)

Add Batch specific meta data

f <- c("https://docs.google.com/spreadsheets/d/1FKo-7MweuFDoKBm8DMFcMOuq0LyK_K6GVNAAo_n-ItE/edit#gid=1882418352")
dat <- bind_rows(lapply(1:10, function(sheet) read_sheet(ss = f, sheet = sheet)))
dat

batch_meta <- dat %>%
  dplyr::filter(run == "batch1_1")

#batch_meta$sample_id <- gsub("_", "-", batch_meta$sample_id) #For Batch7

seu$sample_id <- sapply(seu$Sample, function(x) batch_meta$sample_id[batch_meta$donor_id == x])
seu$donor_id <- sapply(seu$Sample, function(x) batch_meta$donor_id[batch_meta$donor_id == x])
seu$sex <- sapply(seu$Sample, function(x) batch_meta$sex[batch_meta$donor_id == x])
seu$age_years <- sapply(seu$Sample, function(x) batch_meta$age_years[batch_meta$donor_id == x])

Clean up no longer-useful metadata

seu@meta.data <- seu@meta.data %>%
  dplyr::select(c(donor_id, sample_id, age_years, sex, nCount_RNA, nFeature_RNA, 
                  Barcode, GEM_barcode, sample_barcode, 
                  tissue, batch_name, sum, detected,
                  cells_per_GEM, 
                  scDblFinder.class, scDblFinder.score,
                  predicted.ann_level_1, predicted.ann_level_1.score,  predicted.ann_level_2, predicted.ann_level_2.score, predicted.ann_level_3, predicted.ann_level_3.score, predicted.ann_level_4, predicted.ann_level_4.score, predicted.ann_level_5, predicted.ann_level_5.score, predicted.ann_finest_level, predicted.ann_finest_level.score))

Save pre-processed objects

out <- here("output",
            "RDS", "AllBatches_Azimuth_SEUs",
             paste0(batch_name, "_", tissue, ".CellRanger.decontX.mito.filter.Azimuth.SEU.rds"))

saveRDS(seu, file = out)

Filter doublets and repeat

seu <- seu[, seu$scDblFinder.class == "singlet"]
cell_counts["Post Doublet Filtering"] <- ncol(sce)

Normalization and Azimuth annotation

seu <- NormalizeData(seu, verbose = F) %>%
  FindVariableFeatures(nfeatures = 2000, verbose = F) %>%
  ScaleData(verbose = F) %>%
  RunPCA(dims = 1:30, verbose = F) %>%
  RunUMAP(dims = 1:30, verbose = F)

options(timeout = max(1000000, getOption("timeout")))
tmp <- RunAzimuth(seu, reference = "lungref") 
seu@meta.data <- tmp@meta.data

this figure shows number of cells eliminated at each filtering stage-

counts_df <- data.frame(
    Stage = factor(names(cell_counts), levels = c("Post CellRanger Filtering", "Post low-lib Filtering","Post Mito Filtering", "Post Doublet Filtering")),
    Cell_Count = as.numeric(cell_counts)
)

a <- ggplot(counts_df, aes(x = Stage, y = Cell_Count, group = 1)) +
    geom_line() + 
    geom_point() +
    theme_minimal() +
    labs(title = paste0(tissue, " ", batch_name, " :Cell Counts After Each Preprocessing Step"))
#ggsave(a, file=paste0(tissue, " ", batch_name, " :Cells_after_filtering.pdf"), width = 10)
a

Add harmonized cell-labels

# Function to map cell types to broad cell label
map_to_broad_cell_label <- function(cell_type, broad_cell_labels_df, label_column) {
  label <- broad_cell_labels_df[[label_column]][broad_cell_labels_df$`Cell Types` == cell_type]
  if (length(label) == 0) {
    return("Unknown")  # Assign to "Unknown" if not found in mapping
  } else {
    return(label)
  }
}

broad_cell_labels <- readxl::read_excel(here("data/celltypes_Mel_v2_MN.xlsx")) #modified cell types based on Tonsils ref v2 

seu$Broad_cell_label_1 <- sapply(seu$predicted.ann_level_4, map_to_broad_cell_label, broad_cell_labels_df = broad_cell_labels, label_column = "Broad cell label level 1")

# Apply mapping to Seurat object for Broad Cell Label 2
seu$Broad_cell_label_2 <- sapply(seu$predicted.ann_level_4, map_to_broad_cell_label, broad_cell_labels_df = broad_cell_labels, label_column = "Broad cell label level 2")

# Apply mapping to Seurat object for Broad Cell Label 3
seu$Broad_cell_label_3 <- sapply(seu$predicted.ann_level_4, map_to_broad_cell_label, broad_cell_labels_df = broad_cell_labels, label_column = "Broad cell label level 3")

table(seu$Broad_cell_label_2 == "Unknown")
table(seu$Broad_cell_label_2 == "NA")

Save pre-processed objects

out <- here("output",
            "RDS", "AllBatches_Azimuth_noDoublets_SEUs",
             paste0(batch_name, "_", tissue, ".CellRanger.decontX.mito.doublet.filter.Azimuth.SEU.rds"))

saveRDS(seu, file = out)

Session Info

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS 15.0.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Australia/Melbourne
 date     2024-11-11
 pandoc   3.1.1 @ /Users/dixitgunjan/Desktop/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package                           * version     date (UTC) lib source
 abind                               1.4-5       2016-07-21 [1] CRAN (R 4.3.0)
 annotate                            1.80.0      2023-10-26 [1] Bioconductor
 AnnotationDbi                     * 1.64.1      2023-11-02 [1] Bioconductor
 AnnotationFilter                  * 1.26.0      2023-10-26 [1] Bioconductor
 Azimuth                           * 0.5.0       2024-02-27 [1] Github (satijalab/azimuth@c3ad1bc)
 babelgene                           22.9        2022-09-29 [1] CRAN (R 4.3.0)
 backports                           1.4.1       2021-12-13 [1] CRAN (R 4.3.0)
 base64enc                           0.1-3       2015-07-28 [1] CRAN (R 4.3.0)
 batchelor                           1.18.1      2023-12-30 [1] Bioconductor 3.18 (R 4.3.2)
 bbmle                               1.0.25.1    2023-12-09 [1] CRAN (R 4.3.1)
 bdsmatrix                           1.3-6       2022-06-03 [1] CRAN (R 4.3.0)
 beachmat                            2.18.1      2024-02-17 [1] Bioconductor 3.18 (R 4.3.2)
 beeswarm                            0.4.0       2021-06-01 [1] CRAN (R 4.3.0)
 Biobase                           * 2.62.0      2023-10-26 [1] Bioconductor
 BiocFileCache                       2.10.1      2023-10-26 [1] Bioconductor
 BiocGenerics                      * 0.48.1      2023-11-02 [1] Bioconductor
 BiocIO                              1.12.0      2023-10-26 [1] Bioconductor
 BiocManager                         1.30.22     2023-08-08 [1] CRAN (R 4.3.0)
 BiocNeighbors                       1.20.2      2024-01-13 [1] Bioconductor 3.18 (R 4.3.2)
 BiocParallel                      * 1.36.0      2023-10-26 [1] Bioconductor
 BiocSingular                        1.18.0      2023-11-06 [1] Bioconductor
 BiocStyle                         * 2.30.0      2023-10-26 [1] Bioconductor
 biomaRt                             2.58.2      2024-02-03 [1] Bioconductor 3.18 (R 4.3.2)
 Biostrings                          2.70.2      2024-01-30 [1] Bioconductor 3.18 (R 4.3.2)
 bit                                 4.0.5       2022-11-15 [1] CRAN (R 4.3.0)
 bit64                               4.0.5       2020-08-30 [1] CRAN (R 4.3.0)
 bitops                              1.0-7       2021-04-24 [1] CRAN (R 4.3.0)
 blob                                1.2.4       2023-03-17 [1] CRAN (R 4.3.0)
 bluster                             1.12.0      2023-12-19 [1] Bioconductor 3.18 (R 4.3.2)
 BSgenome                            1.70.2      2024-02-10 [1] Bioconductor 3.18 (R 4.3.2)
 BSgenome.Hsapiens.UCSC.hg38         1.4.5       2024-02-27 [1] Bioconductor
 bslib                               0.6.1       2023-11-28 [1] CRAN (R 4.3.1)
 cachem                              1.0.8       2023-05-01 [1] CRAN (R 4.3.0)
 caTools                             1.18.2      2021-03-28 [1] CRAN (R 4.3.0)
 celda                             * 1.18.1      2023-12-23 [1] Bioconductor 3.18 (R 4.3.2)
 cellranger                          1.1.0       2016-07-27 [1] CRAN (R 4.3.0)
 checkmate                           2.3.1       2023-12-04 [1] CRAN (R 4.3.1)
 cli                                 3.6.2       2023-12-11 [1] CRAN (R 4.3.1)
 cluster                             2.1.6       2023-12-01 [1] CRAN (R 4.3.1)
 CNEr                                1.38.0      2023-10-24 [1] Bioconductor
 codetools                           0.2-19      2023-02-01 [1] CRAN (R 4.3.2)
 colorspace                          2.1-0       2023-01-23 [1] CRAN (R 4.3.0)
 combinat                            0.0-8       2012-10-29 [1] CRAN (R 4.3.0)
 cowplot                           * 1.1.3       2024-01-22 [1] CRAN (R 4.3.1)
 crayon                              1.5.2       2022-09-29 [1] CRAN (R 4.3.0)
 curl                                5.2.0       2023-12-08 [1] CRAN (R 4.3.1)
 cvTools                             0.3.2       2012-05-14 [1] CRAN (R 4.3.0)
 data.table                          1.15.0      2024-01-30 [1] CRAN (R 4.3.1)
 DBI                                 1.2.2       2024-02-16 [1] CRAN (R 4.3.1)
 dbplyr                              2.4.0       2023-10-26 [1] CRAN (R 4.3.1)
 dbscan                              1.1-12      2023-11-28 [1] CRAN (R 4.3.1)
 decontX                           * 1.0.0       2023-12-23 [1] Bioconductor 3.18 (R 4.3.2)
 DelayedArray                        0.28.0      2023-11-06 [1] Bioconductor
 DelayedMatrixStats                  1.24.0      2023-11-06 [1] Bioconductor
 deldir                              2.0-2       2023-11-23 [1] CRAN (R 4.3.1)
 densEstBayes                        1.0-2.2     2023-03-31 [1] CRAN (R 4.3.0)
 DEoptimR                            1.1-3       2023-10-07 [1] CRAN (R 4.3.1)
 digest                              0.6.34      2024-01-11 [1] CRAN (R 4.3.1)
 DirichletMultinomial                1.44.0      2023-10-26 [1] Bioconductor
 distr                               2.9.3       2024-01-29 [1] CRAN (R 4.3.1)
 doParallel                          1.0.17      2022-02-07 [1] CRAN (R 4.3.0)
 dotCall64                           1.1-1       2023-11-28 [1] CRAN (R 4.3.1)
 dplyr                             * 1.1.4       2023-11-17 [1] CRAN (R 4.3.1)
 dqrng                               0.3.2       2023-11-29 [1] CRAN (R 4.3.1)
 DT                                  0.32        2024-02-19 [1] CRAN (R 4.3.1)
 edgeR                               4.0.16      2024-02-20 [1] Bioconductor 3.18 (R 4.3.2)
 ellipsis                            0.3.2       2021-04-29 [1] CRAN (R 4.3.0)
 enrichR                             3.2         2023-04-14 [1] CRAN (R 4.3.0)
 EnsDb.Hsapiens.v86                * 2.99.0      2024-02-27 [1] Bioconductor
 ensembldb                         * 2.26.0      2023-10-26 [1] Bioconductor
 evaluate                            0.23        2023-11-01 [1] CRAN (R 4.3.1)
 fansi                               1.0.6       2023-12-08 [1] CRAN (R 4.3.1)
 fastDummies                         1.7.3       2023-07-06 [1] CRAN (R 4.3.0)
 fastmap                             1.1.1       2023-02-24 [1] CRAN (R 4.3.0)
 fastmatch                           1.1-4       2023-08-18 [1] CRAN (R 4.3.0)
 filelock                            1.0.3       2023-12-11 [1] CRAN (R 4.3.1)
 fitdistrplus                        1.1-11      2023-04-25 [1] CRAN (R 4.3.0)
 forcats                           * 1.0.0       2023-01-29 [1] CRAN (R 4.3.0)
 foreach                             1.5.2       2022-02-02 [1] CRAN (R 4.3.0)
 foreign                             0.8-86      2023-11-28 [1] CRAN (R 4.3.1)
 Formula                             1.2-5       2023-02-24 [1] CRAN (R 4.3.0)
 fs                                  1.6.3       2023-07-20 [1] CRAN (R 4.3.0)
 future                              1.33.1      2023-12-22 [1] CRAN (R 4.3.1)
 future.apply                        1.11.1      2023-12-21 [1] CRAN (R 4.3.1)
 gargle                              1.5.2       2023-07-20 [1] CRAN (R 4.3.0)
 generics                            0.1.3       2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb                      * 1.38.6      2024-02-10 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData                    1.2.11      2024-02-27 [1] Bioconductor
 GenomicAlignments                   1.38.2      2024-01-20 [1] Bioconductor 3.18 (R 4.3.2)
 GenomicFeatures                   * 1.54.3      2024-02-03 [1] Bioconductor 3.18 (R 4.3.2)
 GenomicRanges                     * 1.54.1      2023-10-30 [1] Bioconductor
 ggbeeswarm                          0.7.2       2023-04-29 [1] CRAN (R 4.3.0)
 ggplot2                           * 3.5.0       2024-02-23 [1] CRAN (R 4.3.1)
 ggrepel                             0.9.5       2024-01-10 [1] CRAN (R 4.3.1)
 ggridges                            0.5.6       2024-01-23 [1] CRAN (R 4.3.1)
 ggstats                           * 0.5.1       2023-11-21 [1] CRAN (R 4.3.1)
 git2r                               0.33.0      2023-11-26 [1] CRAN (R 4.3.1)
 globals                             0.16.2      2022-11-21 [1] CRAN (R 4.3.0)
 glue                              * 1.7.0       2024-01-09 [1] CRAN (R 4.3.1)
 GO.db                             * 3.18.0      2024-02-27 [1] Bioconductor
 goftest                             1.2-3       2021-10-07 [1] CRAN (R 4.3.0)
 googledrive                         2.1.1       2023-06-11 [1] CRAN (R 4.3.0)
 googlesheets4                     * 1.1.1       2023-06-11 [1] CRAN (R 4.3.0)
 gplots                              3.1.3.1     2024-02-02 [1] CRAN (R 4.3.1)
 graph                               1.80.0      2023-10-26 [1] Bioconductor
 gridExtra                           2.3         2017-09-09 [1] CRAN (R 4.3.0)
 gtable                              0.3.4       2023-08-21 [1] CRAN (R 4.3.0)
 gtools                              3.9.5       2023-11-20 [1] CRAN (R 4.3.1)
 hdf5r                               1.3.9       2024-01-14 [1] CRAN (R 4.3.1)
 here                              * 1.0.1       2020-12-13 [1] CRAN (R 4.3.0)
 Hmisc                               5.1-1       2023-09-12 [1] CRAN (R 4.3.0)
 hms                                 1.1.3       2023-03-21 [1] CRAN (R 4.3.0)
 Homo.sapiens                      * 1.3.1       2024-02-27 [1] Bioconductor
 htmlTable                           2.4.2       2023-10-29 [1] CRAN (R 4.3.1)
 htmltools                           0.5.7       2023-11-03 [1] CRAN (R 4.3.1)
 htmlwidgets                         1.6.4       2023-12-06 [1] CRAN (R 4.3.1)
 httpuv                              1.6.14      2024-01-26 [1] CRAN (R 4.3.1)
 httr                                1.4.7       2023-08-15 [1] CRAN (R 4.3.0)
 ica                                 1.0-3       2022-07-08 [1] CRAN (R 4.3.0)
 igraph                              2.0.2       2024-02-17 [1] CRAN (R 4.3.1)
 inline                              0.3.19      2021-05-31 [1] CRAN (R 4.3.0)
 IRanges                           * 2.36.0      2023-10-26 [1] Bioconductor
 irlba                               2.3.5.1     2022-10-03 [1] CRAN (R 4.3.2)
 iterators                           1.0.14      2022-02-05 [1] CRAN (R 4.3.0)
 janitor                           * 2.2.0       2023-02-02 [1] CRAN (R 4.3.0)
 JASPAR2020                          0.99.10     2024-02-27 [1] Bioconductor
 jquerylib                           0.1.4       2021-04-26 [1] CRAN (R 4.3.0)
 jsonlite                            1.8.8       2023-12-04 [1] CRAN (R 4.3.1)
 KEGGREST                            1.42.0      2023-10-26 [1] Bioconductor
 KernSmooth                          2.23-22     2023-07-10 [1] CRAN (R 4.3.2)
 knitr                               1.45        2023-10-30 [1] CRAN (R 4.3.1)
 later                               1.3.2       2023-12-06 [1] CRAN (R 4.3.1)
 lattice                             0.22-5      2023-10-24 [1] CRAN (R 4.3.1)
 lazyeval                            0.2.2       2019-03-15 [1] CRAN (R 4.3.0)
 leiden                              0.4.3.1     2023-11-17 [1] CRAN (R 4.3.1)
 lifecycle                           1.0.4       2023-11-07 [1] CRAN (R 4.3.1)
 limma                               3.58.1      2023-11-02 [1] Bioconductor
 listenv                             0.9.1       2024-01-29 [1] CRAN (R 4.3.1)
 lmtest                              0.9-40      2022-03-21 [1] CRAN (R 4.3.0)
 locfit                              1.5-9.8     2023-06-11 [1] CRAN (R 4.3.0)
 loo                                 2.7.0       2024-02-24 [1] CRAN (R 4.3.1)
 lubridate                         * 1.9.3       2023-09-27 [1] CRAN (R 4.3.1)
 lungref.SeuratData                  2.0.0       2024-02-29 [1] local
 M3Drop                              1.28.0      2023-10-26 [1] Bioconductor
 magrittr                            2.0.3       2022-03-30 [1] CRAN (R 4.3.0)
 MASS                                7.3-60.0.1  2024-01-13 [1] CRAN (R 4.3.1)
 Matrix                            * 1.6-5       2024-01-11 [1] CRAN (R 4.3.1)
 MatrixGenerics                    * 1.14.0      2023-10-26 [1] Bioconductor
 matrixStats                       * 1.2.0       2023-12-11 [1] CRAN (R 4.3.1)
 MCMCprecision                       0.4.0       2019-12-05 [1] CRAN (R 4.3.0)
 memoise                             2.0.1       2021-11-26 [1] CRAN (R 4.3.0)
 metapod                             1.10.1      2023-12-23 [1] Bioconductor 3.18 (R 4.3.2)
 mgcv                                1.9-1       2023-12-21 [1] CRAN (R 4.3.1)
 mime                                0.12        2021-09-28 [1] CRAN (R 4.3.0)
 miniUI                              0.1.1.1     2018-05-18 [1] CRAN (R 4.3.0)
 msigdbr                           * 7.5.1       2022-03-30 [1] CRAN (R 4.3.0)
 munsell                             0.5.0       2018-06-12 [1] CRAN (R 4.3.0)
 mvtnorm                             1.2-4       2023-11-27 [1] CRAN (R 4.3.1)
 nlme                                3.1-164     2023-11-27 [1] CRAN (R 4.3.1)
 nnet                                7.3-19      2023-05-03 [1] CRAN (R 4.3.2)
 numDeriv                            2016.8-1.1  2019-06-06 [1] CRAN (R 4.3.0)
 org.Hs.eg.db                      * 3.18.0      2024-02-27 [1] Bioconductor
 OrganismDbi                       * 1.44.0      2023-10-26 [1] Bioconductor
 parallelly                          1.37.0      2024-02-14 [1] CRAN (R 4.3.1)
 patchwork                         * 1.2.0       2024-01-08 [1] CRAN (R 4.3.1)
 pbapply                             1.7-2       2023-06-27 [1] CRAN (R 4.3.0)
 pbmcref.SeuratData                  1.0.0       2024-10-04 [1] local
 pillar                              1.9.0       2023-03-22 [1] CRAN (R 4.3.0)
 pkgbuild                            1.4.3       2023-12-10 [1] CRAN (R 4.3.1)
 pkgconfig                           2.0.3       2019-09-22 [1] CRAN (R 4.3.0)
 plotly                              4.10.4      2024-01-13 [1] CRAN (R 4.3.1)
 plyr                                1.8.9       2023-10-02 [1] CRAN (R 4.3.1)
 png                                 0.1-8       2022-11-29 [1] CRAN (R 4.3.0)
 polyclip                            1.10-6      2023-09-27 [1] CRAN (R 4.3.1)
 poweRlaw                            0.80.0      2024-01-25 [1] CRAN (R 4.3.1)
 pracma                              2.4.4       2023-11-10 [1] CRAN (R 4.3.1)
 presto                              1.0.0       2024-02-27 [1] Github (immunogenomics/presto@31dc97f)
 prettyunits                         1.2.0       2023-09-24 [1] CRAN (R 4.3.1)
 progress                            1.2.3       2023-12-06 [1] CRAN (R 4.3.1)
 progressr                           0.14.0      2023-08-10 [1] CRAN (R 4.3.0)
 promises                            1.2.1       2023-08-10 [1] CRAN (R 4.3.0)
 ProtGenerics                        1.34.0      2023-10-26 [1] Bioconductor
 proxyC                              0.3.4       2023-10-25 [1] CRAN (R 4.3.1)
 purrr                             * 1.0.2       2023-08-10 [1] CRAN (R 4.3.0)
 QuickJSR                            1.1.3       2024-01-31 [1] CRAN (R 4.3.1)
 R.methodsS3                         1.8.2       2022-06-13 [1] CRAN (R 4.3.0)
 R.oo                                1.26.0      2024-01-24 [1] CRAN (R 4.3.1)
 R.utils                             2.12.3      2023-11-18 [1] CRAN (R 4.3.1)
 R6                                  2.5.1       2021-08-19 [1] CRAN (R 4.3.0)
 RANN                                2.6.1       2019-01-08 [1] CRAN (R 4.3.0)
 rappdirs                            0.3.3       2021-01-31 [1] CRAN (R 4.3.0)
 RBGL                                1.78.0      2023-10-26 [1] Bioconductor
 RColorBrewer                        1.1-3       2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp                                1.0.12      2024-01-09 [1] CRAN (R 4.3.1)
 RcppAnnoy                           0.0.22      2024-01-23 [1] CRAN (R 4.3.1)
 RcppEigen                           0.3.3.9.4   2023-11-02 [1] CRAN (R 4.3.1)
 RcppHNSW                            0.6.0       2024-02-04 [1] CRAN (R 4.3.1)
 RcppParallel                        5.1.7       2023-02-27 [1] CRAN (R 4.3.0)
 RcppRoll                            0.3.0       2018-06-05 [1] CRAN (R 4.3.0)
 RCurl                               1.98-1.14   2024-01-09 [1] CRAN (R 4.3.1)
 readr                             * 2.1.5       2024-01-10 [1] CRAN (R 4.3.1)
 reldist                             1.7-2       2023-02-17 [1] CRAN (R 4.3.0)
 reshape2                            1.4.4       2020-04-09 [1] CRAN (R 4.3.0)
 ResidualMatrix                      1.12.0      2023-11-06 [1] Bioconductor
 restfulr                            0.0.15      2022-06-16 [1] CRAN (R 4.3.0)
 reticulate                          1.35.0      2024-01-31 [1] CRAN (R 4.3.1)
 rhdf5                               2.46.1      2023-12-02 [1] Bioconductor 3.18 (R 4.3.2)
 rhdf5filters                        1.14.1      2023-12-16 [1] Bioconductor 3.18 (R 4.3.2)
 Rhdf5lib                            1.24.2      2024-02-10 [1] Bioconductor 3.18 (R 4.3.2)
 rjson                               0.2.21      2022-01-09 [1] CRAN (R 4.3.0)
 rlang                               1.1.3       2024-01-10 [1] CRAN (R 4.3.1)
 rmarkdown                           2.25        2023-09-18 [1] CRAN (R 4.3.1)
 robustbase                          0.99-2      2024-01-27 [1] CRAN (R 4.3.1)
 ROCR                                1.0-11      2020-05-02 [1] CRAN (R 4.3.0)
 rpart                               4.1.23      2023-12-05 [1] CRAN (R 4.3.1)
 rprojroot                           2.0.4       2023-11-05 [1] CRAN (R 4.3.1)
 Rsamtools                           2.18.0      2023-10-26 [1] Bioconductor
 RSpectra                            0.16-1      2022-04-24 [1] CRAN (R 4.3.0)
 RSQLite                             2.3.5       2024-01-21 [1] CRAN (R 4.3.1)
 rstan                               2.32.5      2024-01-10 [1] CRAN (R 4.3.1)
 rstantools                          2.4.0       2024-01-31 [1] CRAN (R 4.3.1)
 rstudioapi                          0.15.0      2023-07-07 [1] CRAN (R 4.3.0)
 rsvd                                1.0.5       2021-04-16 [1] CRAN (R 4.3.0)
 rtracklayer                         1.62.0      2023-10-26 [1] Bioconductor
 Rtsne                               0.17        2023-12-07 [1] CRAN (R 4.3.1)
 ruv                                 0.9.7.1     2019-08-30 [1] CRAN (R 4.3.0)
 S4Arrays                            1.2.0       2023-10-26 [1] Bioconductor
 S4Vectors                         * 0.40.2      2023-11-25 [1] Bioconductor 3.18 (R 4.3.2)
 sass                                0.4.8       2023-12-06 [1] CRAN (R 4.3.1)
 ScaledMatrix                        1.10.0      2023-11-06 [1] Bioconductor
 scales                            * 1.3.0       2023-11-28 [1] CRAN (R 4.3.1)
 scater                            * 1.30.1      2023-11-16 [1] Bioconductor
 scattermore                         1.2         2023-06-12 [1] CRAN (R 4.3.0)
 scDblFinder                       * 1.16.0      2023-12-23 [1] Bioconductor 3.18 (R 4.3.2)
 scMerge                           * 1.18.0      2023-12-30 [1] Bioconductor 3.18 (R 4.3.2)
 scran                             * 1.30.2      2024-01-23 [1] Bioconductor 3.18 (R 4.3.2)
 sctransform                         0.4.1       2023-10-19 [1] CRAN (R 4.3.1)
 scuttle                           * 1.12.0      2023-11-06 [1] Bioconductor
 seqLogo                             1.68.0      2023-10-26 [1] Bioconductor
 sessioninfo                         1.2.2       2021-12-06 [1] CRAN (R 4.3.0)
 Seurat                            * 5.0.1.9009  2024-02-28 [1] Github (satijalab/seurat@6a3ef5e)
 SeuratData                          0.2.2.9001  2024-02-28 [1] Github (satijalab/seurat-data@0cce240)
 SeuratDisk                          0.0.0.9021  2024-02-27 [1] Github (mojaveazure/seurat-disk@877d4e1)
 SeuratObject                      * 5.0.1       2023-11-17 [1] CRAN (R 4.3.1)
 sfsmisc                             1.1-17      2024-02-01 [1] CRAN (R 4.3.1)
 shiny                               1.8.0       2023-11-17 [1] CRAN (R 4.3.1)
 shinyBS                           * 0.61.1      2022-04-17 [1] CRAN (R 4.3.0)
 shinydashboard                      0.7.2       2021-09-30 [1] CRAN (R 4.3.0)
 shinyjs                             2.1.0       2021-12-23 [1] CRAN (R 4.3.0)
 Signac                              1.12.0      2023-11-08 [1] CRAN (R 4.3.1)
 SingleCellExperiment              * 1.24.0      2023-11-06 [1] Bioconductor
 snakecase                           0.11.1      2023-08-27 [1] CRAN (R 4.3.0)
 sp                                * 2.1-3       2024-01-30 [1] CRAN (R 4.3.1)
 spam                                2.10-0      2023-10-23 [1] CRAN (R 4.3.1)
 SparseArray                         1.2.4       2024-02-10 [1] Bioconductor 3.18 (R 4.3.2)
 sparseMatrixStats                   1.14.0      2023-10-26 [1] Bioconductor
 spatstat.data                       3.0-4       2024-01-15 [1] CRAN (R 4.3.1)
 spatstat.explore                    3.2-6       2024-02-01 [1] CRAN (R 4.3.1)
 spatstat.geom                       3.2-8       2024-01-26 [1] CRAN (R 4.3.1)
 spatstat.random                     3.2-2       2023-11-29 [1] CRAN (R 4.3.1)
 spatstat.sparse                     3.0-3       2023-10-24 [1] CRAN (R 4.3.1)
 spatstat.utils                      3.0-4       2023-10-24 [1] CRAN (R 4.3.1)
 StanHeaders                         2.32.5      2024-01-10 [1] CRAN (R 4.3.1)
 startupmsg                          0.9.6.1     2024-02-12 [1] CRAN (R 4.3.1)
 statmod                             1.5.0       2023-01-06 [1] CRAN (R 4.3.0)
 stringi                             1.8.3       2023-12-11 [1] CRAN (R 4.3.1)
 stringr                           * 1.5.1       2023-11-14 [1] CRAN (R 4.3.1)
 SummarizedExperiment              * 1.32.0      2023-11-06 [1] Bioconductor
 survival                            3.5-8       2024-02-14 [1] CRAN (R 4.3.1)
 tensor                              1.5         2012-05-05 [1] CRAN (R 4.3.0)
 TFBSTools                           1.40.0      2023-10-24 [1] Bioconductor
 TFMPvalue                           0.0.9       2022-10-21 [1] CRAN (R 4.3.0)
 tibble                            * 3.2.1       2023-03-20 [1] CRAN (R 4.3.0)
 tidyr                             * 1.3.1       2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect                          1.2.0       2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse                         * 2.0.0       2023-02-22 [1] CRAN (R 4.3.0)
 timechange                          0.3.0       2024-01-18 [1] CRAN (R 4.3.1)
 tonsilref.SeuratData                2.0.0       2024-02-29 [1] local
 TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2       2024-02-27 [1] Bioconductor
 tzdb                                0.4.0       2023-05-12 [1] CRAN (R 4.3.0)
 utf8                                1.2.4       2023-10-22 [1] CRAN (R 4.3.1)
 uwot                                0.1.16      2023-06-29 [1] CRAN (R 4.3.0)
 vctrs                               0.6.5       2023-12-01 [1] CRAN (R 4.3.1)
 vipor                               0.4.7       2023-12-18 [1] CRAN (R 4.3.1)
 viridis                             0.6.5       2024-01-29 [1] CRAN (R 4.3.1)
 viridisLite                         0.4.2       2023-05-02 [1] CRAN (R 4.3.0)
 whisker                             0.4.1       2022-12-05 [1] CRAN (R 4.3.0)
 withr                               3.0.0       2024-01-16 [1] CRAN (R 4.3.1)
 workflowr                           1.7.1       2023-08-23 [1] CRAN (R 4.3.0)
 WriteXLS                            6.5.0       2024-01-09 [1] CRAN (R 4.3.1)
 xfun                                0.42        2024-02-08 [1] CRAN (R 4.3.1)
 xgboost                             1.7.7.1     2024-01-25 [1] CRAN (R 4.3.1)
 XML                                 3.99-0.16.1 2024-01-22 [1] CRAN (R 4.3.1)
 xml2                                1.3.6       2023-12-04 [1] CRAN (R 4.3.1)
 xtable                              1.8-4       2019-04-21 [1] CRAN (R 4.3.0)
 XVector                             0.42.0      2023-10-26 [1] Bioconductor
 yaml                                2.3.8       2023-12-11 [1] CRAN (R 4.3.1)
 zlibbioc                            1.48.0      2023-10-26 [1] Bioconductor
 zoo                                 1.8-12      2023-04-13 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.0.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggstats_0.5.1                          
 [2] googlesheets4_1.1.1                    
 [3] scMerge_1.18.0                         
 [4] scDblFinder_1.16.0                     
 [5] Azimuth_0.5.0                          
 [6] shinyBS_0.61.1                         
 [7] decontX_1.0.0                          
 [8] celda_1.18.1                           
 [9] Matrix_1.6-5                           
[10] Seurat_5.0.1.9009                      
[11] SeuratObject_5.0.1                     
[12] sp_2.1-3                               
[13] EnsDb.Hsapiens.v86_2.99.0              
[14] ensembldb_2.26.0                       
[15] AnnotationFilter_1.26.0                
[16] msigdbr_7.5.1                          
[17] Homo.sapiens_1.3.1                     
[18] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[19] org.Hs.eg.db_3.18.0                    
[20] GO.db_3.18.0                           
[21] OrganismDbi_1.44.0                     
[22] GenomicFeatures_1.54.3                 
[23] AnnotationDbi_1.64.1                   
[24] scales_1.3.0                           
[25] patchwork_1.2.0                        
[26] cowplot_1.1.3                          
[27] janitor_2.2.0                          
[28] scater_1.30.1                          
[29] scran_1.30.2                           
[30] scuttle_1.12.0                         
[31] SingleCellExperiment_1.24.0            
[32] SummarizedExperiment_1.32.0            
[33] Biobase_2.62.0                         
[34] GenomicRanges_1.54.1                   
[35] GenomeInfoDb_1.38.6                    
[36] IRanges_2.36.0                         
[37] S4Vectors_0.40.2                       
[38] BiocGenerics_0.48.1                    
[39] MatrixGenerics_1.14.0                  
[40] matrixStats_1.2.0                      
[41] glue_1.7.0                             
[42] here_1.0.1                             
[43] lubridate_1.9.3                        
[44] forcats_1.0.0                          
[45] stringr_1.5.1                          
[46] dplyr_1.1.4                            
[47] purrr_1.0.2                            
[48] readr_2.1.5                            
[49] tidyr_1.3.1                            
[50] tibble_3.2.1                           
[51] ggplot2_3.5.0                          
[52] tidyverse_2.0.0                        
[53] BiocParallel_1.36.0                    
[54] BiocStyle_2.30.0                       

loaded via a namespace (and not attached):
  [1] igraph_2.0.2                      graph_1.80.0                     
  [3] Formula_1.2-5                     ica_1.0-3                        
  [5] plotly_4.10.4                     zlibbioc_1.48.0                  
  [7] tidyselect_1.2.0                  bit_4.0.5                        
  [9] doParallel_1.0.17                 lattice_0.22-5                   
 [11] rjson_0.2.21                      M3Drop_1.28.0                    
 [13] blob_1.2.4                        S4Arrays_1.2.0                   
 [15] parallel_4.3.2                    seqLogo_1.68.0                   
 [17] png_0.1-8                         ResidualMatrix_1.12.0            
 [19] cli_3.6.2                         ProtGenerics_1.34.0              
 [21] goftest_1.2-3                     gargle_1.5.2                     
 [23] BiocIO_1.12.0                     bluster_1.12.0                   
 [25] densEstBayes_1.0-2.2              BiocNeighbors_1.20.2             
 [27] Signac_1.12.0                     uwot_0.1.16                      
 [29] curl_5.2.0                        mime_0.12                        
 [31] evaluate_0.23                     leiden_0.4.3.1                   
 [33] stringi_1.8.3                     backports_1.4.1                  
 [35] XML_3.99-0.16.1                   httpuv_1.6.14                    
 [37] magrittr_2.0.3                    rappdirs_0.3.3                   
 [39] splines_4.3.2                     RcppRoll_0.3.0                   
 [41] DT_0.32                           sctransform_0.4.1                
 [43] ggbeeswarm_0.7.2                  sessioninfo_1.2.2                
 [45] DBI_1.2.2                         jquerylib_0.1.4                  
 [47] withr_3.0.0                       git2r_0.33.0                     
 [49] rprojroot_2.0.4                   xgboost_1.7.7.1                  
 [51] lmtest_0.9-40                     RBGL_1.78.0                      
 [53] bdsmatrix_1.3-6                   rtracklayer_1.62.0               
 [55] BiocManager_1.30.22               htmlwidgets_1.6.4                
 [57] fs_1.6.3                          biomaRt_2.58.2                   
 [59] ggrepel_0.9.5                     SparseArray_1.2.4                
 [61] DEoptimR_1.1-3                    cellranger_1.1.0                 
 [63] annotate_1.80.0                   reticulate_1.35.0                
 [65] zoo_1.8-12                        JASPAR2020_0.99.10               
 [67] XVector_0.42.0                    knitr_1.45                       
 [69] TFBSTools_1.40.0                  TFMPvalue_0.0.9                  
 [71] timechange_0.3.0                  foreach_1.5.2                    
 [73] fansi_1.0.6                       caTools_1.18.2                   
 [75] grid_4.3.2                        data.table_1.15.0                
 [77] rhdf5_2.46.1                      ruv_0.9.7.1                      
 [79] R.oo_1.26.0                       poweRlaw_0.80.0                  
 [81] RSpectra_0.16-1                   irlba_2.3.5.1                    
 [83] fastDummies_1.7.3                 ellipsis_0.3.2                   
 [85] lazyeval_0.2.2                    yaml_2.3.8                       
 [87] survival_3.5-8                    scattermore_1.2                  
 [89] crayon_1.5.2                      RcppAnnoy_0.0.22                 
 [91] RColorBrewer_1.1-3                progressr_0.14.0                 
 [93] later_1.3.2                       base64enc_0.1-3                  
 [95] ggridges_0.5.6                    codetools_0.2-19                 
 [97] KEGGREST_1.42.0                   bbmle_1.0.25.1                   
 [99] Rtsne_0.17                        startupmsg_0.9.6.1               
[101] limma_3.58.1                      Rsamtools_2.18.0                 
[103] filelock_1.0.3                    foreign_0.8-86                   
[105] pkgconfig_2.0.3                   xml2_1.3.6                       
[107] sfsmisc_1.1-17                    GenomicAlignments_1.38.2         
[109] spatstat.sparse_3.0-3             BSgenome_1.70.2                  
[111] viridisLite_0.4.2                 xtable_1.8-4                     
[113] plyr_1.8.9                        httr_1.4.7                       
[115] tools_4.3.2                       globals_0.16.2                   
[117] pkgbuild_1.4.3                    checkmate_2.3.1                  
[119] htmlTable_2.4.2                   beeswarm_0.4.0                   
[121] nlme_3.1-164                      loo_2.7.0                        
[123] dbplyr_2.4.0                      hdf5r_1.3.9                      
[125] shinyjs_2.1.0                     digest_0.6.34                    
[127] numDeriv_2016.8-1.1               tzdb_0.4.0                       
[129] reshape2_1.4.4                    cvTools_0.3.2                    
[131] WriteXLS_6.5.0                    viridis_0.6.5                    
[133] rpart_4.1.23                      DirichletMultinomial_1.44.0      
[135] cachem_1.0.8                      BiocFileCache_2.10.1             
[137] polyclip_1.10-6                   proxyC_0.3.4                     
[139] Hmisc_5.1-1                       generics_0.1.3                   
[141] Biostrings_2.70.2                 mvtnorm_1.2-4                    
[143] googledrive_2.1.1                 presto_1.0.0                     
[145] parallelly_1.37.0                 statmod_1.5.0                    
[147] RcppHNSW_0.6.0                    ScaledMatrix_1.10.0              
[149] pbapply_1.7-2                     spam_2.10-0                      
[151] dqrng_0.3.2                       utf8_1.2.4                       
[153] pbmcref.SeuratData_1.0.0          StanHeaders_2.32.5               
[155] gtools_3.9.5                      RcppEigen_0.3.3.9.4              
[157] gridExtra_2.3                     shiny_1.8.0                      
[159] GenomeInfoDbData_1.2.11           R.utils_2.12.3                   
[161] rhdf5filters_1.14.1               RCurl_1.98-1.14                  
[163] memoise_2.0.1                     rmarkdown_2.25                   
[165] R.methodsS3_1.8.2                 future_1.33.1                    
[167] RANN_2.6.1                        spatstat.data_3.0-4              
[169] rstudioapi_0.15.0                 cluster_2.1.6                    
[171] QuickJSR_1.1.3                    whisker_0.4.1                    
[173] rstantools_2.4.0                  spatstat.utils_3.0-4             
[175] hms_1.1.3                         fitdistrplus_1.1-11              
[177] munsell_0.5.0                     colorspace_2.1-0                 
[179] rlang_1.1.3                       DelayedMatrixStats_1.24.0        
[181] sparseMatrixStats_1.14.0          dotCall64_1.1-1                  
[183] shinydashboard_0.7.2              dbscan_1.1-12                    
[185] mgcv_1.9-1                        xfun_0.42                        
[187] CNEr_1.38.0                       iterators_1.0.14                 
[189] reldist_1.7-2                     abind_1.4-5                      
[191] MCMCprecision_0.4.0               rstan_2.32.5                     
[193] Rhdf5lib_1.24.2                   bitops_1.0-7                     
[195] promises_1.2.1                    inline_0.3.19                    
[197] RSQLite_2.3.5                     DelayedArray_0.28.0              
[199] compiler_4.3.2                    prettyunits_1.2.0                
[201] beachmat_2.18.1                   listenv_0.9.1                    
[203] BSgenome.Hsapiens.UCSC.hg38_1.4.5 Rcpp_1.0.12                      
[205] tonsilref.SeuratData_2.0.0        enrichR_3.2                      
[207] edgeR_4.0.16                      workflowr_1.7.1                  
[209] BiocSingular_1.18.0               tensor_1.5                       
[211] MASS_7.3-60.0.1                   progress_1.2.3                   
[213] babelgene_22.9                    spatstat.random_3.2-2            
[215] R6_2.5.1                          fastmap_1.1.1                    
[217] fastmatch_1.1-4                   distr_2.9.3                      
[219] vipor_0.4.7                       ROCR_1.0-11                      
[221] SeuratDisk_0.0.0.9021             nnet_7.3-19                      
[223] rsvd_1.0.5                        gtable_0.3.4                     
[225] KernSmooth_2.23-22                lungref.SeuratData_2.0.0         
[227] miniUI_0.1.1.1                    deldir_2.0-2                     
[229] htmltools_0.5.7                   RcppParallel_5.1.7               
[231] bit64_4.0.5                       spatstat.explore_3.2-6           
[233] lifecycle_1.0.4                   restfulr_0.0.15                  
[235] sass_0.4.8                        vctrs_0.6.5                      
[237] robustbase_0.99-2                 spatstat.geom_3.2-8              
[239] snakecase_0.11.1                  SeuratData_0.2.2.9001            
[241] future.apply_1.11.1               pracma_2.4.4                     
[243] batchelor_1.18.1                  bslib_0.6.1                      
[245] pillar_1.9.0                      gplots_3.1.3.1                   
[247] metapod_1.10.1                    locfit_1.5-9.8                   
[249] combinat_0.0-8                    jsonlite_1.8.8

Preprocessing of Batches: Nasal_brushings_Batch1

Quality Control: Removing ambient RNA, low lib reads, high mito content and doublets

George Howitt, Gunjan Dixit and Jovana Maksimovic

November 11, 2024

Introduction

Get Batch_info

CellRanger calls

Add Barcode metadata

Pre-processing

DecontX

Mitochondrial filtering

Multiplet filtering

Add GEM metadata to the cell-level objects

Normalization and Azimuth annotation

Add Batch specific meta data

Clean up no longer-useful metadata

Save pre-processed objects

Filter doublets and repeat

Normalization and Azimuth annotation

Add harmonized cell-labels

Save pre-processed objects

Session Info