Last updated: 2026-01-27
Checks: 6 1
Knit directory: paed-airway-allTissues/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230811) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 378760b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/naiveB_top10_markers_whitebg.png
Ignored: data/.DS_Store
Ignored: data/RDS/
Ignored: figure/.DS_Store
Ignored: output/.DS_Store
Ignored: output/CAM_results/
Ignored: output/CSV/.DS_Store
Ignored: output/CSV_v2/.DS_Store
Ignored: output/CSV_v2/Adenoids/.DS_Store
Ignored: output/CSV_v2/BAL/.DS_Store
Ignored: output/CSV_v2/Bronchial_brushings/.DS_Store
Ignored: output/CSV_v2/Nasal_brushings/.DS_Store
Ignored: output/CSV_v2/Tonsils/.DS_Store
Ignored: output/DGE/.DS_Store
Ignored: output/DGE/RUVseq_TonsilAtlas/.DS_Store
Ignored: output/DGE/RUVseq_TonsilAtlas_k1/
Ignored: output/DGE/RUVseq_earlyAIR_Tonsils/.DS_Store
Ignored: output/DGE/Tonsils/RUV_Tonsils_Double negative T/
Ignored: output/RDS/
Ignored: output/plots/
Untracked files:
Untracked: All_metadata_combinedplots.Rhistory
Untracked: analysis/03_Batch_Integration.Rmd
Untracked: analysis/Age_proportions.Rmd
Untracked: analysis/Age_proportions_AllBatches.Rmd
Untracked: analysis/All_Batches_QCExploratory_v2.Rmd
Untracked: analysis/All_Umaps_and_marker_plots.Rmd
Untracked: analysis/All_metadata.Rmd
Untracked: analysis/Annotation_BAL.Rmd
Untracked: analysis/Annotation_Bronchial_brushings.Rmd
Untracked: analysis/Annotation_Nasal_brushings.Rmd
Untracked: analysis/BatchCorrection_Adenoids.Rmd
Untracked: analysis/BatchCorrection_Nasal_brushings.Rmd
Untracked: analysis/BatchCorrection_Tonsils.Rmd
Untracked: analysis/Batch_Integration_&_Downstream_analysis.Rmd
Untracked: analysis/Batch_correction_&_Downstream.Rmd
Untracked: analysis/Cell_cycle_regression.Rmd
Untracked: analysis/Clustering_Tonsils_v2.Rmd
Untracked: analysis/Cross-tissue_analysis.Rmd
Untracked: analysis/DGE_RUVSeq_Adenoids.Rmd
Untracked: analysis/DGE_RUVSeq_Bronchial_brushings.Rmd
Untracked: analysis/DGE_RUVSeq_Nasal_brushings.Rmd
Untracked: analysis/DGE_earlyAIR_tAtlas_Naïve_Bcells_Adenoids.Rmd
Untracked: analysis/DGE_earlyAIR_tAtlas_Naïve_Bcells_activated.Rmd
Untracked: analysis/DGE_earlyAIR_tAtlas_Plasma_Bcells.Rmd
Untracked: analysis/DGE_earlyAIR_vs_TonsilAtlas.Rmd
Untracked: analysis/Explore_census.Rmd
Untracked: analysis/Lineage_subclustering_Bcells.Rmd
Untracked: analysis/Lineage_subclustering_Myeloid_cells.Rmd
Untracked: analysis/Lineage_subclustering_Tcells.Rmd
Untracked: analysis/Lineages_subclustering.Rmd
Untracked: analysis/Master_metadata.Rmd
Untracked: analysis/Pediatric_Vs_Adult_Atlases.Rmd
Untracked: analysis/Preprocessing_Batch1_Nasal_brushings.Rmd
Untracked: analysis/Preprocessing_Batch2_Tonsils.Rmd
Untracked: analysis/Preprocessing_Batch3_Adenoids.Rmd
Untracked: analysis/Preprocessing_Batch4_Bronchial_brushings.Rmd
Untracked: analysis/Preprocessing_Batch5_Nasal_brushings.Rmd
Untracked: analysis/Preprocessing_Batch6_BAL.Rmd
Untracked: analysis/Preprocessing_Batch7_Bronchial_brushings.Rmd
Untracked: analysis/Preprocessing_Batch8_Adenoids.Rmd
Untracked: analysis/Preprocessing_Batch9_Tonsils.Rmd
Untracked: analysis/TonsilsVsAdenoids.Rmd
Untracked: analysis/cell_cycle_regression.R
Untracked: analysis/testing_age_all.Rmd
Untracked: code/DGE_utils.R
Untracked: code/pseudobulk_analysis.R
Untracked: data/Cell_labels_Gunjan_v2/
Untracked: data/Cell_labels_Mel/
Untracked: data/Cell_labels_Mel_v2/
Untracked: data/Cell_labels_Mel_v3/
Untracked: data/Cell_labels_modified_Gunjan/
Untracked: data/Gene_sets/
Untracked: data/Hs.c2.cp.reactome.v7.1.entrez.rds
Untracked: data/Raw_feature_bc_matrix/
Untracked: data/cell_labels_Mel_v4_Dec2024/
Untracked: data/celltypes_Mel_GD_v3.xlsx
Untracked: data/celltypes_Mel_GD_v4_no_dups.xlsx
Untracked: data/celltypes_Mel_modified.xlsx
Untracked: data/celltypes_Mel_v2.csv
Untracked: data/celltypes_Mel_v2.xlsx
Untracked: data/celltypes_Mel_v2_MN.xlsx
Untracked: data/celltypes_for_mel_MN.xlsx
Untracked: data/col_palette.xlsx
Untracked: data/earlyAIR_sample_sheets_combined.xlsx
Untracked: figure/DGE_Tonsils_CD8.Rmd/
Untracked: metadata_export.rds
Untracked: metadata_export1.rds
Untracked: naiveB_top10_markers.png
Untracked: naiveB_top10_markers_whitebg.png
Untracked: output/CSV/All_tissues.propeller.xlsx
Untracked: output/CSV/Bronchial_brushings/
Untracked: output/CSV/Bronchial_brushings_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/
Untracked: output/CSV/G000231_Neeland_Adenoids.propeller.xlsx
Untracked: output/CSV/G000231_Neeland_Bronchial_brushings.propeller.xlsx
Untracked: output/CSV/G000231_Neeland_Nasal_brushings.propeller.xlsx
Untracked: output/CSV/G000231_Neeland_Tonsils.propeller.xlsx
Untracked: output/CSV/Nasal_brushings/
Untracked: output/CSV_v3/B_cell_lineage/
Untracked: output/CSV_v3/Epithelial_lineage/
Untracked: output/DGE/Adenoids/
Untracked: output/DGE/Adenoids_B_memory/
Untracked: output/DGE/Adenoids_CD4/
Untracked: output/DGE/Adenoids_CD8/
Untracked: output/DGE/Adenoids_Cycling_GCB/
Untracked: output/DGE/Adenoids_IFN/
Untracked: output/DGE/Bronchial_brushings/
Untracked: output/DGE/DGE_Age_summary_BAL.xlsx
Untracked: output/DGE/Nasal_brushings/
Untracked: output/DGE/RUV/
Untracked: output/DGE/RUVseq_earlyAIR_Adenoids/
Untracked: output/DGE/RUVseq_earlyAIR_Bronchial_brushings/
Untracked: output/DGE/RUVseq_earlyAIR_Nasal_brushings/
Untracked: output/DGE/TonsilAtlas/
Untracked: output/DGE/Tonsils/DGE_Age_summary_Tonsils.xlsx
Untracked: output/DGE/Tonsils/RUVTonsils_Memory B cells/
Untracked: output/DGE/Tonsils/RUV_Tonsils_Memory B cells/
Untracked: output/DGE/Tonsils/Tonsils_Naïve B cells/
Untracked: output/DGE/Tonsils_CD4/
Untracked: output/DGE/Tonsils_CD8/
Untracked: output/DGE/Tonsils_IFN/
Untracked: paed_sub_L1_cells.rds
Unstaged changes:
Deleted: 02_QC_exploratoryPlots.Rmd
Deleted: 02_QC_exploratoryPlots.html
Modified: analysis/00_AllBatches_overview.Rmd
Modified: analysis/01_QC_emptyDrops.Rmd
Modified: analysis/02_QC_exploratoryPlots.Rmd
Modified: analysis/Adenoids.Rmd
Modified: analysis/Adenoids_v2.Rmd
Modified: analysis/Age_modeling.Rmd
Modified: analysis/Age_modelling_Adenoids.Rmd
Modified: analysis/Age_modelling_Tonsils.Rmd
Modified: analysis/AllBatches_QCExploratory.Rmd
Modified: analysis/BAL.Rmd
Modified: analysis/BAL_v2.Rmd
Modified: analysis/Bronchial_brushings.Rmd
Modified: analysis/Bronchial_brushings_v2.Rmd
Modified: analysis/DGE_TonsilAtlas.Rmd
Modified: analysis/DGE_Tonsils_CD8.Rmd
Modified: analysis/DGE_analysis_George.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_CD4_TCM.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_CD4_TFH.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_CD4_TN.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_CD8_TF.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_CD8_TN.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_DZtoLZ_GCB.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_Memory_Bcells.Rmd
Modified: analysis/DGE_earlyAIR_tAtlas_Naïve_Bcells.Rmd
Modified: analysis/Lineage_subclustering_Epithelial_cells.Rmd
Modified: analysis/Nasal_brushings.Rmd
Modified: analysis/Nasal_brushings_v2.Rmd
Modified: analysis/Subclustering_Adenoids.Rmd
Modified: analysis/Subclustering_BAL.Rmd
Modified: analysis/Subclustering_Bronchial_brushings.Rmd
Modified: analysis/Subclustering_Nasal_brushings.Rmd
Modified: analysis/Subclustering_Tonsils.Rmd
Modified: analysis/Tonsils.Rmd
Modified: analysis/Tonsils_v2.Rmd
Modified: analysis/index.Rmd
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c0.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c1.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c10.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c11.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c12.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c13.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c14.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c15.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c16.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c17.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c2.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c3.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c4.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c5.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c6.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c7.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c8.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c9.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c0.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c1.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c10.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c11.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c12.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c13.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c14.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c15.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c16.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c17.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c2.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c3.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c4.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c5.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c6.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c7.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c8.csv
Modified: output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c9.csv
Modified: output/CSV_v2/G000231_Neeland_Nasal_brushings.propeller.xlsx
Modified: output/DGE/RUVseq_TonsilAtlas/DGE_RUVSeq_Age_TonsilAtlas.xlsx
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TCM/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TCM/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TCM/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TFH/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TFH/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TFH/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TN/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TN/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 TN/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg-eff/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg-eff/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg-eff/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 Treg/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 effector/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 effector/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD4 effector/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TF/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TF/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TF/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TN/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TN/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/CD8 TN/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Cycling GCB/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Cycling GCB/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Cycling GCB/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DGE_RUVSeq_Age_Tonsils.xlsx
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DGE_RUVSeq_Tonsils_all_results.rds
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ G2Mphase/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ G2Mphase/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ G2Mphase/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ GCB/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ GCB/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ GCB/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ early Sphase/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ early Sphase/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ early Sphase/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ late Sphase/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ late Sphase/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZ late Sphase/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZtoLZ GCB transition/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZtoLZ GCB transition/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/DZtoLZ GCB transition/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Double negative T/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Double negative T/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Double negative T/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early GC-committed NBC/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early GC-committed NBC/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early GC-committed NBC/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early MBC/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early MBC/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early MBC/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early PC precursor/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early PC precursor/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Early PC precursor/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Follicular dendritic cells/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Follicular dendritic cells/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Follicular dendritic cells/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/GC-commited metabolic activation/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/GC-commited metabolic activation/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/GC-commited metabolic activation/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Gamma delta T/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Gamma delta T/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Gamma delta T/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Memory B cells/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Memory B cells/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Memory B cells/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Monocytes_macrophages/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Monocytes_macrophages/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Monocytes_macrophages/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cell-IFN/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cell-IFN/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cell-IFN/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells activated/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells activated/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells activated/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Naïve B cells/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Plasma B cells/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Plasma B cells/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Plasma B cells/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Pre-BCRi II/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Pre-BCRi II/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/Pre-BCRi II/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/T-IFN/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/T-IFN/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/T-IFN/up_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/TFH-LZ-GC/all_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/TFH-LZ-GC/down_res.csv
Modified: output/DGE/RUVseq_earlyAIR_Tonsils/TFH-LZ-GC/up_res.csv
Deleted: test
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/Lineage_subclustering_Epithelial_cells.Rmd) and
HTML (docs/Lineage_subclustering_Epithelial_cells.html)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table
below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | 378760b | Gunjan Dixit | 2026-01-22 | Updated lineage subclustering |
| html | 378760b | Gunjan Dixit | 2026-01-22 | Updated lineage subclustering |
suppressPackageStartupMessages({
library(BiocStyle)
library(tidyverse)
library(here)
library(dplyr)
library(Seurat)
library(clustree)
library(paletteer)
library(viridis)
library(ggforce)
library(ggridges)
library(kableExtra)
library(RColorBrewer)
library(data.table)
library(dplyr)
library(cowplot)
library(ggplot2)
library(paletteer)
library(patchwork)
library(harmony)
library(BiocParallel)
library(circlize)
library(presto)
library(gtools)
})
paed_sub <- readRDS(here("output/RDS/Lineages_RDS_combined/SEU_Epithelial_Cells.rds"))
if (!"umap.sub" %in% names(paed_sub@reductions)) {
paed_sub <- paed_sub |>
FindVariableFeatures() |>
ScaleData() |>
RunPCA()
gc()
paed_sub[["RNA"]]
paed_sub[["RNA"]] <- JoinLayers(paed_sub[["RNA"]])
gc()
paed_sub <- RunUMAP(paed_sub, dims = 1:30, reduction = "pca",
reduction.name = "umap.sub")
paed_sub <- paed_sub |>
FindNeighbors(dims = 1:30, reduction = "pca") |>
FindClusters(resolution = 0.05)
gc()
saveRDS(paed_sub, here("output/RDS/Lineages_RDS_combined/SEU_Epithelial_Cells.rds"))
} else {
message("skipping processing")
}
skipping processing
paed_sub$tissue <- factor(
paed_sub$tissue,
levels = c("Tonsils", "Adenoids", "Nasal_brushings",
"Bronchial_brushings", "BAL")
)
Excluding counts less than 100 to avoid too many labels in the plot
#sort(table(paed_sub$cell_labels_v2), decreasing = T)
k <- names(which(table(paed_sub$cell_labels_v2) > 50))
paed_sub$cell_labels_v22 <- ifelse(
as.character(paed_sub$cell_labels_v2) %in% k,
as.character(paed_sub$cell_labels_v2),
NA
)
DimPlot(
paed_sub,
group.by = "cell_labels_v22",
reduction = "umap.sub",
label = TRUE,
repel = TRUE,
label.size = 4.5,
shuffle = TRUE, raster = F)

#table( paed_sub$cell_labels_v2,paed_sub$RNA_snn_res.0.5)
opt_res <- "RNA_snn_res.0.05"
n <- nlevels(paed_sub$RNA_snn_res.0.05)
paed_sub$RNA_snn_res.0.05 <- factor(paed_sub$RNA_snn_res.0.05, levels = seq(0,n-1))
paed_sub$seurat_clusters <- NULL
paed_sub$cluster <- paed_sub$RNA_snn_res.0.05
Idents(paed_sub) <- paed_sub$cluster
DimPlot(paed_sub, reduction = "umap.sub", raster =F, repel=T, label = T)

DimPlot(paed_sub, reduction = "umap.sub", raster =F, repel=T, label = T, split.by = "tissue")

new_levels <- c(
"0" = "Non-ciliated Epithelial",
"1" = "Ciliated Epithelial",
"2" = "Ciliated Epithelial",
"3" = "Secretory Epithelial",
"4" = "Non-ciliated Epithelial",
"5" = "Non-ciliated Epithelial",
"6" ="Erythroid cells",
"7" = "Ciliated Epithelial",
"8" = "Non-ciliated Epithelial"
)
paed_sub <- RenameIdents(paed_sub, new_levels)
paed_sub$cell_labels <- NULL
paed_sub$cell_labels_l1 <- Idents(paed_sub)
paed_sub$cell_labels_l1 <- factor(
Idents(paed_sub),
levels = c(
"Non-ciliated Epithelial",
"Secretory Epithelial",
"Ciliated Epithelial",
"Erythroid cells"
)
)
Idents(paed_sub) <- paed_sub$cell_labels_l1
DimPlot(paed_sub, reduction = "umap.sub", raster =F, repel=T, label = T)

files <- c(
paeds = here("../EarlyAir_paper/output/CellTypist_results/cellTtypist_Epithelial_Paed_Covid19.csv"),
immune = here("../EarlyAir_paper/output/CellTypist_results/cellTtypist_Epithelial_Immune_low.csv"),
lung_airway = here("../EarlyAir_paper/output/CellTypist_results/cellTtypist_Epithelial_Lung_Airway.csv"),
lung_atlas = here("../EarlyAir_paper/output/CellTypist_results/cellTtypist_Epithelial_Lung_atlas.csv"),
tonsil = here("../EarlyAir_paper/output/CellTypist_results/cellTtypist_Epithelial_Human_Tonsil.csv")
)
for (nm in names(files)) {
df <- read.csv(files[nm])
paed_sub[[paste0("celltypist_", nm)]] <- df$predicted_labels
}
plot_celltype_heatmap <- function(
seu,
cluster_col = "cluster",
celltype_col,
min_n = 100,
title,
scale = "row",
fontsize = 10
) {
seu@meta.data %>%
dplyr::count(.data[[cluster_col]], .data[[celltype_col]]) %>%
dplyr::filter(n >= min_n) %>%
tidyr::pivot_wider(
names_from = .data[[celltype_col]],
values_from = n,
values_fill = 0
) %>%
as.data.frame() %>%
tibble::column_to_rownames(cluster_col) %>%
t() %>%
pheatmap::pheatmap(
scale = scale,
main = title,
fontsize = fontsize
)
}
plot_celltype_heatmap(paed_sub,celltype_col = "celltypist_paeds",title = "CellTypist Predictions (Paed Covid Study) by Cluster")

plot_celltype_heatmap(paed_sub,celltype_col = "celltypist_lung_airway",title = "CellTypist Predictions (Lung Atlas) by Cluster")

plot_celltype_heatmap( paed_sub,celltype_col = "celltypist_lung_atlas",title = "CellTypist Predictions (Lung Atlas) by Cluster")

plot_celltype_heatmap(paed_sub,celltype_col = "celltypist_tonsil", title = "Azimuth Predictions (Tonsil Atlas) by Cluster")

plot_ct <- function(obj, col) {
k <- names(which(table(obj[[col]][,1]) > 100))
DimPlot(
obj,
group.by = col,
reduction = "umap.sub",
cells = colnames(obj)[obj[[col]][,1] %in% k],
label = TRUE, repel = TRUE, label.size = 2.5,
shuffle = TRUE, raster = FALSE
)
}
plot_ct(paed_sub, "celltypist_paeds") + ggtitle("Paed Covid Study")

plot_ct(paed_sub, "celltypist_immune") + ggtitle("Immune low dataset")

plot_ct(paed_sub, "celltypist_lung_airway") + ggtitle("Lung Airway Study")

plot_ct(paed_sub, "celltypist_lung_atlas") + ggtitle("Lung Atlas")

#plot_ct(paed_sub, "celltypist_tonsil") + ggtitle("Tonsil Atlas")
plot_ct_bar <- function(obj, col) {
k <- names(which(table(obj[[col]][,1]) > 100))
df <- obj@meta.data %>%
filter(get(col) %in% k) %>%
count(cluster, !!sym(col)) %>%
group_by(cluster) %>%
mutate(frac = n / sum(n)) %>%
ungroup()
ggplot(df, aes(x = cluster, y = frac, fill = !!sym(col))) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent) +
labs(x = "Cluster", y = "Fraction of cells", fill = col) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
plot_ct_bar(paed_sub, "celltypist_paeds") + ggtitle("Paed Covid Study")

plot_ct_bar(paed_sub, "celltypist_immune") + ggtitle("Immune low dataset")

plot_ct_bar(paed_sub, "celltypist_lung_airway") + ggtitle("Lung Airway study")

plot_ct_bar(paed_sub, "celltypist_lung_atlas") + ggtitle("Lung Atlas")

plot_ct_bar(paed_sub, "celltypist_tonsil") + ggtitle("Tonsil Atlas")

features <- list(
Epithelial_core = c("EPCAM", "KRT8", "KRT18", "KRT19"),
Basal_cells = c("KRT5", "KRT14", "TP63"),
Ciliated_cells = c("FOXJ1", "PIFO", "DNAH5", "DNAI1"),
Secretory_goblet = c("MUC5AC", "SPDEF", "AGR2"),
Secretory_club = c("SCGB1A1", "SCGB3A1", "SCGB3A2"),
Ionocytes = c("FOXI1", "CFTR", "ASCL3"),
Tuft_cells = c("POU2F3", "TRPM5", "AVIL"),
Squamous_epithelial = c("KRT4", "KRT13", "KRT16"),
Cycling_epithelial = c("MKI67", "TOP2A", "HMGB2"),
EMT_like = c("VIM", "COL1A1", "FN1")
)
for (g in features) {
print(
FeaturePlot(
paed_sub,
features = g,
reduction = "umap.sub",
raster = FALSE,
label = TRUE
#split.by = "tissue"
) #
)
}
Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
Please report the issue at <https://github.com/satijalab/seurat/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
Warning: The following requested variables were not found: KRT18










for (g in features) {
print(
FeaturePlot(
paed_sub,
features = g,
reduction = "umap.sub",
raster = FALSE,
label = TRUE,
split.by = "tissue",
min.cutoff = "q10",
max.cutoff = "q90"
)
)
}
Warning: The following requested variables were not found: KRT18




Warning: All cells have the same value (0.282118808412447) of "SCGB3A1"
Warning: All cells have the same value (0.0869732528804846) of "SCGB3A2"


Warning: All cells have the same value (0.0707303427496486) of "TRPM5"




ciliated <- subset(paed_sub, idents = "Ciliated Epithelial")
keep_cells <- paed_sub$cell_labels_l1 %in% "Ciliated Epithelial"
ciliated@assays$RNA@cells@.Data <- paed_sub@assays$RNA@cells@.Data[keep_cells, ]
ciliated
An object of class Seurat
18076 features across 24034 samples within 1 assay
Active assay: RNA (18076 features, 2000 variable features)
3 layers present: data, counts, scale.data
4 dimensional reductions calculated: pca, umap.unintegrated, umap.l1, umap.sub
ciliated <- ciliated %>%
FindVariableFeatures() %>%
ScaleData() %>%
RunPCA()
Finding variable features for layer counts
Centering and scaling data matrix
Warning: Different features in new layer data than already exists for
scale.data
PC_ 1
Positive: CTSC, KRT7, EPAS1, CREB3L1, AQP5, ADAM28, VEGFA, SERPINB3, RHOV, RDH10
CXCL1, SLC6A14, CYP2F1, KLK11, SPDEF, CEACAM6, PRSS8, SCNN1B, GPRC5A, KIAA1324
MUC5B, NR4A1, UPK1B, S100A16, TSPAN8, ECE1, FCGBP, SLC4A4, TCIM, XBP1
Negative: SYT5, SCGB2A1, SLC28A2, CD200R1, EGLN3, LAPTM5, SYT8, IFITM10, IFT57, WDR97
ADH7, ADGRB2, HSD17B13, APOD, FAM229B, DEUP1, MORN2, ANLN, FOXN4, CDC20
E2F7, NMU, PLK4, NEK2, ALDH3A1, SHISA8, CDKN2A, TMEM200C, AZU1, VIM
PC_ 2
Positive: ISG15, IFI6, OAS3, TAP1, BST2, IFITM3, IDO1, DUOXA2, ISG20, IFIT2
OAS1, CFB, DUOX2, CMPK2, XAF1, WARS, IFI44L, HERC6, OAS2, IFIT1
GBP1, TAP2, UBE2L6, USP18, MX1, OASL, HELZ2, MX2, STAT2, IFI44
Negative: MT-ND3, CHST9, SELENOP, UGT2A1, MT-CYB, C12orf75, FAM216B, DSTN, GSTA2, PNRC1
CASC4, CPD, C1orf141, PROS1, PTGFR, TMEM123, SEC14L3, PPIL6, EIF1, TSC22D1
IFT57, MXRA7, CYP51A1, CP, MT-ND2, ADH1C, MLF1, EIF4G2, RHOQ, UFM1
PC_ 3
Positive: BTG3, CDC20B, TMEM106C, ANLN, USP13, POGLUT3, NEK2, MCIDAS, PLK4, FOXN4
HES6, E2F7, CEP78, DEUP1, CDK1, CDC20, BUB1, LYZ, HYLS1, CCNO
POC1A, STIL, SHISA8, RANBP9, ISYNA1, TUBG1, H2AFZ, EPS8L3, CCDC14, COCH
Negative: SLC34A2, G0S2, VSTM2L, ZFP36, FYB2, TDRP, CEBPD, C8orf34, KSR1, PTMS
BTG2, RHOB, SCGB1A1, OXTR, LBH, DDIT4, SLC9A3R2, FGF14, FCGBP, LAMB3
CXCL8, HES4, NUPR1, PLAUR, SCGB3A1, PAMR1, IRF7, METRN, VIM, NFKBIA
PC_ 4
Positive: LRRC26, CCNO, MCIDAS, SHISA8, FOXN4, HES6, CDC20B, ANLN, CDC20, E2F7
CDT1, BUB1, POC1A, PLK4, ASF1B, USP13, PCSK6, WDR62, RNASE1, NEK2
POGLUT3, TACC3, TMEM200B, HYLS1, EZH2, CIT, COCH, TOP2A, HELLS, ISYNA1
Negative: FAM216B, MORN2, STOML3, AGR3, SPATS2L, LAP3, PARP14, TSPAN19, TMEM123, CASC4
PPIL6, B3GNT5, IFT57, HACD4, CSTB, ACTB, GBP3, CABCOCO1, C12orf75, MLF1
DNAJA1, APOD, NT5C3A, LAMP3, TEX26, IFIT1, MYCBP, OAS2, SLFN5, OPTN
PC_ 5
Positive: SCGB3A1, FLNB, SCGB1A1, FGF14, FYB2, C8orf34, PROS1, TDRP, SLC34A2, PPP1R9A
METRN, CYP2F1, KCNA1, C16orf89, CD4, FMO5, VAV3, MUC5B, MSMB, TSPAN11
IRX2, CEACAM6, PTGER4, SPATS1, PTGFR, IRX5, NKX2-1, LRMP, LAMB3, KLK11
Negative: CD36, SNCA, TCIM, NOS2, SLC5A8, SLC26A2, SLC39A8, SERPINB4, FMO3, IER3
SGK1, ADH7, HBEGF, AREG, GDF15, HSD17B13, FOS, SLC25A25, A4GALT, PTGS2
ARG2, MUC2, CA12, DUSP1, NTRK2, SELP, HMGCS2, ATP13A5, EGR1, FHL2
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4501262 240.4 10623544 567.4 NA 10623544 567.4
Vcells 1495582400 11410.4 2219594182 16934.2 163840 2219521738 16933.7
ciliated <- RunUMAP(ciliated, dims = 1:30, reduction = "pca", reduction.name = "umap.c")
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
12:01:56 UMAP embedding parameters a = 0.9922 b = 1.112
12:01:56 Read 24034 rows and found 30 numeric columns
12:01:56 Using Annoy for neighbor search, n_neighbors = 30
12:01:56 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
12:01:57 Writing NN index file to temp file /var/folders/q8/kw1r78g12qn793xm7g0zvk94x2bh70/T//RtmpOWBeH2/filef98948c5e872
12:01:57 Searching Annoy index using 1 thread, search_k = 3000
12:02:02 Annoy recall = 100%
12:02:02 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
12:02:02 Initializing from normalized Laplacian + noise (using RSpectra)
12:02:03 Commencing optimization for 200 epochs, with 1043194 positive edges
12:02:03 Using rng type: pcg
12:02:09 Optimization finished
meta_data_columns <- colnames(ciliated@meta.data)
columns_to_remove <- grep("^RNA_snn_res", meta_data_columns, value = TRUE)
ciliated@meta.data <- ciliated@meta.data[, !(colnames(ciliated@meta.data) %in% columns_to_remove)]
resolutions <- seq(0.1, 0.5, by = 0.1)
ciliated <- FindNeighbors(ciliated, dims = 1:30, reduction = "pca")
Computing nearest neighbor graph
Computing SNN
ciliated <- FindClusters(ciliated, resolution = resolutions )
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 24034
Number of edges: 932370
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9685
Number of communities: 12
Elapsed time: 1 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 24034
Number of edges: 932370
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9590
Number of communities: 14
Elapsed time: 2 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 24034
Number of edges: 932370
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9514
Number of communities: 19
Elapsed time: 2 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 24034
Number of edges: 932370
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9450
Number of communities: 22
Elapsed time: 2 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 24034
Number of edges: 932370
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9394
Number of communities: 23
Elapsed time: 2 seconds
clustree(ciliated, prefix = "RNA_snn_res.")

DimPlot(ciliated, group.by = "RNA_snn_res.0.1", reduction = "umap.c", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE )

DimPlot(ciliated, reduction = "umap.c", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE, split.by = "tissue" )

DimPlot(ciliated, group.by = "donor", reduction = "umap.c", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE, split.by = "tissue" )
Warning: ggrepel: 22 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Warning: ggrepel: 18 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

opt_res <- "RNA_snn_res.0.1"
n <- nlevels(ciliated$RNA_snn_res.0.1)
ciliated$RNA_snn_res.0.1 <- factor(ciliated$RNA_snn_res.0.1, levels = seq(0,n-1))
ciliated$seurat_clusters <- NULL
ciliated$cluster <- ciliated$RNA_snn_res.0.1
Idents(ciliated) <- ciliated$cluster
k <- names(which(table(ciliated$celltypist_lung_airway) > 100))
ciliated$celltypist_lung2 <- ifelse(ciliated$celltypist_lung_airway %in% k,
ciliated$celltypist_lung_airway, NA)
DimPlot(
ciliated,
group.by = "celltypist_lung2",
reduction = "umap.c",
label = TRUE,
repel = TRUE,
label.size = 2.5,
shuffle = TRUE
)
plot_ct_bar(ciliated, "celltypist_paeds")

plot_ct_bar(ciliated, "celltypist_lung_airway")

plot_ct_bar(ciliated, "celltypist_lung_atlas")

#plot_ct_bar(ciliated, "celltypist_immune")
#plot_ct_bar(ciliated, "celltypist_tonsil")
#plot_ct_bar(ciliated, "predicted.celltype.l2") # Tonsil Atlas level 2
plot_ct_bar(ciliated, "cell_labels_v2") # previous labels

new_levels <- c(
"0" = "Ciliated Epithelial",
"1" = "Ciliated Epithelial",
"2" = "Ciliated Epithelial",
"3" = "Ciliated Epithelial",
"4" = "Deuterosomal",
"5" = "Ciliated Epithelial",
"6" ="Ciliated Epithelial",
"7" = "Ciliated Epithelial",
"8" = "Ciliated Epithelial",
"9" = "Ciliated Epithelial",
"10" = "Ciliated Epithelial",
"11" = "Ciliated Epithelial"
)
ciliated <- RenameIdents(ciliated, new_levels)
ciliated$cell_labels_l2 <- Idents(ciliated)
#ciliated$cell_labels_l3 <- Idents(ciliated)
table(ciliated$cell_labels_l2, ciliated$tissue)
Tonsils Adenoids Nasal_brushings Bronchial_brushings
Ciliated Epithelial 53 236 12633 7369
Deuterosomal 0 8 2063 68
BAL
Ciliated Epithelial 1586
Deuterosomal 18
df_counts <- ciliated@meta.data |>
count(cell_labels_l2, tissue, name = "n")
df_totals <- df_counts |>
group_by(cell_labels_l2) |>
summarise(total = sum(n), .groups = "drop")
ggplot(df_counts, aes(x = cell_labels_l2, y = n, fill = tissue)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = df_totals,
aes(x = cell_labels_l2, y = total, label = total),
vjust = -0.3, size = 3, inherit.aes = FALSE) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "cluster at l2",
y = "Number of cells",
fill = "Tissue")

ciliated <- FindSubCluster(ciliated, cluster = "Ciliated Epithelial", graph.name = "RNA_snn", resolution = 0.1, subcluster.name = "E_sub")
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 21877
Number of edges: 858245
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9702
Number of communities: 11
Elapsed time: 2 seconds
Idents(ciliated) <- ciliated$E_sub
#levels(Idents(paed_sub)) <- mixedsort(levels(Idents(paed_sub)))
#levels(paed_sub$T_sub) <- mixedsort(levels(paed_sub$T_sub))
DimPlot(ciliated, reduction = "umap.c", raster =F, repel=T, label = T, group.by = "E_sub")

ciliated$cell_labels_l3 <- ciliated$E_sub
Idents(ciliated) <- factor(
Idents(ciliated),
levels = c(
"Deuterosomal",
"Ciliated Epithelial_0",
"Ciliated Epithelial_1",
"Ciliated Epithelial_2",
"Ciliated Epithelial_3",
"Ciliated Epithelial_4",
"Ciliated Epithelial_5",
"Ciliated Epithelial_6",
"Ciliated Epithelial_7",
"Ciliated Epithelial_8",
"Ciliated Epithelial_9",
"Ciliated Epithelial_10"
)
)
ciliated$cell_labels_l3 <- Idents(ciliated)
The marker genes for this subclustering can be found here-
Ciliated_population_subclusters
features.no.mt <- rownames(paed_sub)[!grepl("^MT-", rownames(paed_sub))]
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4544925 242.8 10623544 567.4 NA 10623544 567.4
Vcells 1493661024 11395.8 2219594182 16934.2 163840 2219521738 16933.7
paed_sub.markers <- FindAllMarkers(
ciliated,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25,
features = features.no.mt
)
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4621613 246.9 10623544 567.4 NA 10623544 567.4
Vcells 1493997208 11398.3 3196391621 24386.6 163840 3196082098 24384.2
top10 <- paed_sub.markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) %>%
ungroup() %>%
distinct(gene, .keep_all = TRUE) %>%
arrange(cluster, desc(avg_log2FC))
best.wilcox.gene.per.cluster <- paed_sub.markers |>
group_by(cluster) |>
slice_max(avg_log2FC, n = 1) |>
select(cluster, gene)
best.wilcox.gene.per.cluster
# A tibble: 12 × 2
# Groups: cluster [12]
cluster gene
<fct> <chr>
1 Deuterosomal LINGO1
2 Ciliated Epithelial_0 G0S2
3 Ciliated Epithelial_1 LGI1
4 Ciliated Epithelial_2 HSD17B13
5 Ciliated Epithelial_3 LY6D
6 Ciliated Epithelial_4 OASL
7 Ciliated Epithelial_5 KCNA1
8 Ciliated Epithelial_6 PPARGC1A
9 Ciliated Epithelial_7 STATH
10 Ciliated Epithelial_8 NDRG1
11 Ciliated Epithelial_9 SLAMF7
12 Ciliated Epithelial_10 SELP
table(ciliated$cell_labels_l3)
Deuterosomal Ciliated Epithelial_0 Ciliated Epithelial_1
2157 4535 3611
Ciliated Epithelial_2 Ciliated Epithelial_3 Ciliated Epithelial_4
3124 2261 1668
Ciliated Epithelial_5 Ciliated Epithelial_6 Ciliated Epithelial_7
1639 1183 1086
Ciliated Epithelial_8 Ciliated Epithelial_9 Ciliated Epithelial_10
1044 870 856
cluster_colors <- paletteer::paletteer_d("pals::glasbey")[factor(top10$cluster)]
DotPlot(ciliated,
features = unique(top10$gene),
group.by = "E_sub",
cols = c("azure1", "blueviolet"),
dot.scale = 3, assay = "RNA") +
RotatedAxis() +
FontSize(y.text = 8, x.text = 12) +
labs(y = element_blank(), x = element_blank()) +
coord_flip() +
theme(axis.text.y = element_text(color = cluster_colors)) +
ggtitle("Top 10 marker genes per cluster (Seurat)")
Warning: Vectorized input to `element_text()` is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.

out_markers <- here("output",
"CSV_v3","Epithelial_lineage",
paste("Marker_genes_Reclustered_ciliated.",opt_res, sep = ""))
dir.create(out_markers, recursive = TRUE, showWarnings = FALSE)
for (cl in unique(paed_sub.markers$cluster)) {
cluster_data <- paed_sub.markers %>% dplyr::filter(cluster == cl)
file_name <- here(out_markers, paste0("G000231_Neeland_", cl, ".csv"))
if (!file.exists(file_name)) {
write.csv(cluster_data, file = file_name)
}
}
Pairwise DE testing for Ciliated Epithelial cells
ids <- unique(ciliated$cell_labels_l3)
res_list <- list()
for (i in 1:(length(ids) - 1)) {
for (j in (i + 1):length(ids)) {
id1 <- ids[i]
id2 <- ids[j]
res <- FindMarkers(
ciliated,
ident.1 = id1,
ident.2 = id2,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.5,
features = features.no.mt
)
top_genes <- res %>%
filter(pct.1 > 0.1 & pct.2 > 0.1) %>%
arrange(p_val_adj, desc(avg_log2FC)) %>%
head(10) %>%
rownames()
res_list[[paste(id1, "vs", id2, sep = "_")]] <- res
cat("\n\n####", paste(id1, "vs", id2), "\n\n")
print(
VlnPlot(
ciliated,
features = top_genes,
idents = c(id1, id2),
pt.size = 0
) + patchwork::plot_annotation(title = paste(id1, "vs", id2))
)
kable(
head(res, 50),
caption = paste("Top 50 markers for", id1, "vs", id2)
)
}
}


































































non_ciliated <- subset(paed_sub, idents = "Non-ciliated Epithelial")
keep_cells <- paed_sub$cell_labels_l1 %in% "Non-ciliated Epithelial"
non_ciliated@assays$RNA@cells@.Data <- paed_sub@assays$RNA@cells@.Data[keep_cells, ]
non_ciliated
An object of class Seurat
18076 features across 36140 samples within 1 assay
Active assay: RNA (18076 features, 2000 variable features)
3 layers present: data, counts, scale.data
4 dimensional reductions calculated: pca, umap.unintegrated, umap.l1, umap.sub
non_ciliated <- non_ciliated %>%
FindVariableFeatures() %>%
ScaleData() %>%
RunPCA()
Finding variable features for layer counts
Centering and scaling data matrix
Warning: Different features in new layer data than already exists for
scale.data
PC_ 1
Positive: ALOX15, FAT2, MT1X, MKI67, MT-ATP6, UGT2A1, KRT5, SNCA, TOP2A, CD44
TNC, KIFC1, NUSAP1, TYMS, BIRC5, FOXM1, CIT, TPX2, PHF19, SPC24
STMN1, MYBL2, RBP1, AURKB, HLF, HJURP, KIF2C, NOTCH1, TACC3, CDK1
Negative: DUOXA2, DUOX2, C15orf48, ISG15, SPNS2, SECTM1, OAS1, GPRC5A, LCN2, DUSP5
CMPK2, PDZK1IP1, MX2, CFB, IDO1, RSAD2, WARS, LMO7, OAS3, OAS2
OASL, ISG20, PRSS22, GBP1, CEACAM5, IFIT1, CD38, IFI6, CXCL10, IFIT2
PC_ 2
Positive: BPIFB1, MUC5AC, TSPAN8, LYZ, PSCA, MSMB, PROM1, BPIFA1, TFF3, MSLN
FCGBP, RAB37, CEACAM5, RNASE4, PTPRN2, ANG, SERPINB11, PDE8B, CHAD, SYT7
LCN2, HMGCS2, AZGP1, TFF1, FBP1, SLC9A3, CAPN9, ANPEP, GSTA1, BDKRB2
Negative: MKI67, NUSAP1, TOP2A, KIFC1, TK1, TPX2, AURKB, BIRC5, IQGAP3, KIF2C
FOXM1, TYMS, SPC24, CIT, RRM2, ANLN, HIST1H1B, MYBL2, BUB1, HJURP
CDK1, ASF1B, KIF23, TACC3, TROAP, CCNB2, STMN1, LMNB1, ZWINT, PHF19
PC_ 3
Positive: KRT78, MAL, TMPRSS11B, ECM1, PRSS27, TMPRSS11E, RHCG, SPRR3, KRT13, IL36A
RNASE7, SCEL, HSPB8, SPRR1B, HOPX, CRNN, NCCRP1, CNFN, A2ML1, CAMK2N1
EMP1, SPRR2A, KRT80, CRCT1, SPINK5, SPRR2D, S100A8, VSIG10L, S100A12, MUC21
Negative: IDO1, WARS, IFI6, IFITM3, TAP1, TRIM22, IFI44L, MX1, LAP3, IFI44
XAF1, ISG20, OAS2, IFIT2, PARP14, MX2, IFIT1, CMPK2, MT2A, IFIT3
APOL1, HERC6, GBP1, IFI30, TAP2, OAS3, GBP4, SERPING1, CXCL10, CX3CL1
PC_ 4
Positive: BPIFB1, ALPL, LCN2, CEACAM6, MUC5AC, CEACAM5, MACC1, STEAP4, MSLN, MSMB
RAB37, LYZ, PSCA, PI3, ALDH3B1, FBP1, C15orf48, SLURP2, NQO1, FCGBP
CFB, CD24, BPIFA1, PROM1, RNASE4, MAL2, RAMP1, ME1, AZGP1, TSPAN8
Negative: TNC, NUPR1, KRT5, TNS4, KRT17, S100A2, CX3CL1, FOSB, EGR1, CXCL11
BTG2, HBEGF, FGFR3, FAT2, NOTCH1, SERPINB5, CXCL10, IGFBP5, ATF3, KRT6A
JUN, SOCS1, ZFP36, CLCA4, EGR3, TRIB1, ARL4D, HAPLN3, CD44, BCAM
PC_ 5
Positive: IFI6, TMPRSS11B, MAL, ECM1, MUC5B, RNASE7, RHCG, KRT78, ISG15, CRNN
IL36A, PRSS27, CSF3, SLC9A3, XAF1, EPS8L1, MT2A, CRCT1, TFF3, OAS3
TMPRSS11E, CIITA, TNFAIP2, AURKB, IQGAP3, CAMK2N1, NUSAP1, VNN3, MKI67, MX2
Negative: PRDX1, ADH7, MAL2, ANXA1, SGK1, UGT2A1, AKR1C3, UPK1B, GCLC, DSTN
CSTA, TPD52L1, TM4SF1, DSG2, TSC22D1, AKR1C2, CASC4, GPX2, CCDC80, MT-ATP6
RCN1, FABP5, HSPA1B, ATP1B3, ALOX15, ASS1, S100A4, MT-ND3, PRKAR2B, GCLM
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4730528 252.7 15705738 838.8 NA 15705738 838.8
Vcells 2016416182 15384.1 3196391621 24386.6 163840 3196082098 24384.2
non_ciliated <- RunUMAP(non_ciliated, dims = 1:30, reduction = "pca", reduction.name = "umap.nc")
12:06:58 UMAP embedding parameters a = 0.9922 b = 1.112
12:06:58 Read 36140 rows and found 30 numeric columns
12:06:58 Using Annoy for neighbor search, n_neighbors = 30
12:06:58 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
12:07:00 Writing NN index file to temp file /var/folders/q8/kw1r78g12qn793xm7g0zvk94x2bh70/T//RtmpOWBeH2/filef989591187b1
12:07:00 Searching Annoy index using 1 thread, search_k = 3000
12:07:07 Annoy recall = 100%
12:07:07 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
12:07:08 Initializing from normalized Laplacian + noise (using RSpectra)
12:07:08 Commencing optimization for 200 epochs, with 1562686 positive edges
12:07:08 Using rng type: pcg
12:07:18 Optimization finished
meta_data_columns <- colnames(non_ciliated@meta.data)
columns_to_remove <- grep("^RNA_snn_res", meta_data_columns, value = TRUE)
non_ciliated@meta.data <- non_ciliated@meta.data[, !(colnames(non_ciliated@meta.data) %in% columns_to_remove)]
resolutions <- seq(0.1, 0.5, by = 0.1)
non_ciliated <- FindNeighbors(non_ciliated, dims = 1:30, reduction = "pca")
Computing nearest neighbor graph
Computing SNN
non_ciliated <- FindClusters(non_ciliated, resolution = resolutions )
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 36140
Number of edges: 1240015
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9513
Number of communities: 11
Elapsed time: 3 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 36140
Number of edges: 1240015
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9347
Number of communities: 13
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 36140
Number of edges: 1240015
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9235
Number of communities: 15
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 36140
Number of edges: 1240015
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9142
Number of communities: 18
Elapsed time: 4 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 36140
Number of edges: 1240015
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9064
Number of communities: 21
Elapsed time: 4 seconds
clustree(non_ciliated, prefix = "RNA_snn_res.")

DimPlot(non_ciliated, group.by = "RNA_snn_res.0.1", reduction = "umap.nc", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE )

DimPlot(non_ciliated, reduction = "umap.nc", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE, split.by = "tissue" )

#DimPlot(non_ciliated, group.by = "donor", reduction = "umap.nc", label = TRUE, label.size = 2.5, repel = TRUE, raster = FALSE, split.by = "tissue" )
opt_res <- "RNA_snn_res.0.1"
n <- nlevels(non_ciliated$RNA_snn_res.0.1)
non_ciliated$RNA_snn_res.0.1 <- factor(non_ciliated$RNA_snn_res.0.1, levels = seq(0,n-1))
non_ciliated$seurat_clusters <- NULL
non_ciliated$cluster <- non_ciliated$RNA_snn_res.0.1
Idents(non_ciliated) <- non_ciliated$cluster
k <- names(which(table(non_ciliated$celltypist_lung_airway) > 150))
non_ciliated$celltypist_lung2 <- ifelse(non_ciliated$celltypist_lung_airway %in% k,
non_ciliated$celltypist_lung_airway, NA)
DimPlot(
non_ciliated,
group.by = "celltypist_lung2",
reduction = "umap.nc",
label = TRUE,
repel = TRUE,
label.size = 2.5,
shuffle = TRUE
)
plot_ct_bar(non_ciliated, "celltypist_paeds")

plot_ct_bar(non_ciliated, "celltypist_lung_airway")

plot_ct_bar(non_ciliated, "celltypist_lung_atlas")

plot_ct_bar(non_ciliated, "celltypist_immune")

plot_ct_bar(non_ciliated, "celltypist_tonsil")

#plot_ct_bar(non_ciliated, "predicted.celltype.l2") # Tonsil Atlas level 2
plot_ct_bar(non_ciliated, "cell_labels_v2") # previous labels

Checking for known markers-
features <- list(
## Basal lineage
Basal_core = c("KRT5", "KRT14", "TP63", "KRT15", "ITGA6"),
Basal_KHDRBS2 = c("KHDRBS2", "COL17A1", "ITGA6"),
Basal_cycling = c("MKI67", "TOP2A", "CENPF"),
## Squamous / Hillock
Hillock = c("KRT13", "KRT4", "CLCA2"),
Squamous = c("KRT13", "KRT4", "IVL", "SPRR1A", "SPRR2A"),
## Secretory lineage
Secretory = c("SCGB1A1", "KRT8", "KRT18"),
Secretory_GALNT4 = c("GALNT4", "SCGB1A1"),
Duct = c("KRT19", "KRT8"),
Goblet = c("MUC5AC", "SPDEF", "AGR2"),
Goblet_BPIFA2 = c("BPIFA2", "MUC5AC"),
Goblet_PLAU = c("PLAU", "MUC5AC"),
Goblet_inflammatory = c("CXCL8", "IL1B", "MUC5AC"),
Transit_epithelial = c("KRT8", "KRT18", "LGALS3"),
## Rare epithelial types
Ionocyte = c("FOXI1", "CFTR", "ASCL3"),
Brush = c("POU2F3", "TRPM5", "GFI1B")
)
for (g in features) {
print(
FeaturePlot(
non_ciliated,
features = g,
reduction = "umap.nc",
raster = FALSE, repel = T, label = T, ncol = 3
)
)
}















new_levels <- c(
"0" = "Basal_Club_Goblet",
"1" = "Basal_Club_Goblet",
"2" = "Basal_Club_Goblet",
"3" = "Basal_Club_Goblet",
"4" = "Proliferating Epithelial",
"5" = "Basal_Club_Goblet",
"6" ="Basal_Club_Goblet",
"7" = "Duct",
"8" = "Squamous",
"9" = "Ionocytes",
"10" = "Squamous"
)
non_ciliated <- RenameIdents(non_ciliated, new_levels)
non_ciliated$cell_labels_l2 <- Idents(non_ciliated)
non_ciliated$cell_labels_l2 <- factor(
Idents(non_ciliated),
levels = c(
"Basal_Club_Goblet",
"Proliferating Epithelial",
"Duct",
"Squamous",
"Ionocytes"
)
)
non_ciliated$cell_labels_l3 <- non_ciliated$cell_labels_l2
Idents(non_ciliated) <- non_ciliated$cell_labels_l2
table(non_ciliated$cell_labels_l2, non_ciliated$tissue)
Tonsils Adenoids Nasal_brushings Bronchial_brushings
Basal_Club_Goblet 84 59 31921 119
Proliferating Epithelial 1 3 1961 88
Duct 0 0 976 4
Squamous 212 37 51 199
Ionocytes 0 0 173 13
BAL
Basal_Club_Goblet 27
Proliferating Epithelial 3
Duct 2
Squamous 206
Ionocytes 1
df_counts <- non_ciliated@meta.data |>
count(cell_labels_l2, tissue, name = "n")
df_totals <- df_counts |>
group_by(cell_labels_l2) |>
summarise(total = sum(n), .groups = "drop")
ggplot(df_counts, aes(x = cell_labels_l2, y = n, fill = tissue)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = df_totals,
aes(x = cell_labels_l2, y = total, label = total),
vjust = -0.3, size = 3, inherit.aes = FALSE) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(x = "cluster at l2",
y = "Number of cells",
fill = "Tissue")

non_ciliated@meta.data %>%
count(cell_labels_l2, donor) %>%
group_by(cell_labels_l2) %>%
mutate(prop = n / sum(n)) %>% # proportion per L2
ungroup() %>%
ggplot(aes(x = cell_labels_l2, y = prop, fill = donor)) +
geom_col(width = 0.9) +
scale_y_continuous(labels = scales::percent) +
scale_fill_viridis_d(option = "turbo", guide = guide_legend(ncol = 2)) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = NULL,
y = "Proportion",
fill = "Donor",
title = "Donor composition per cluster"
)

Subclustering basal/club/goblet
non_ciliated <- FindSubCluster(non_ciliated, cluster = "Basal_Club_Goblet", graph.name = "RNA_snn", resolution = 0.1, subcluster.name = "E_sub")
Idents(non_ciliated) <- non_ciliated$E_sub
non_ciliated$cell_labels_l3 <- non_ciliated$E_sub
non_ciliated$cell_labels_l3 <- factor(
non_ciliated$cell_labels_l3,
levels = c(
"Basal_Club_Goblet_0",
"Basal_Club_Goblet_1",
"Basal_Club_Goblet_2",
"Basal_Club_Goblet_3",
"Basal_Club_Goblet_4",
"Basal_Club_Goblet_5",
"Proliferating Epithelial",
"Duct",
"Squamous",
"Ionocytes"
)
)
Idents(non_ciliated) <- non_ciliated$cell_labels_l3
DimPlot(non_ciliated, reduction = "umap.nc", raster =F, repel=T, label = T)

The marker genes for this subclustering can be found here-
Non-ciliated_population_subclusters
features.no.mt <- rownames(paed_sub)[!grepl("^MT-", rownames(paed_sub))]
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4719955 252.1 15705738 838.8 NA 15705738 838.8
Vcells 2016983967 15388.4 3196391621 24386.6 163840 3196082098 24384.2
paed_sub.markers <- FindAllMarkers(
non_ciliated,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25,
features = features.no.mt
)
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 4704943 251.3 15705738 838.8 NA 15705738 838.8
Vcells 2016826519 15387.2 3835749945 29264.5 163840 3825797871 29188.6
top10 <- paed_sub.markers %>%
group_by(cluster) %>%
top_n(n = 10, wt = avg_log2FC) %>%
ungroup() %>%
distinct(gene, .keep_all = TRUE) %>%
arrange(cluster, desc(avg_log2FC))
best.wilcox.gene.per.cluster <- paed_sub.markers |>
group_by(cluster) |>
slice_max(avg_log2FC, n = 1) |>
select(cluster, gene)
best.wilcox.gene.per.cluster
# A tibble: 5 × 2
# Groups: cluster [5]
cluster gene
<fct> <chr>
1 Basal_Club_Goblet CCDC190
2 Proliferating Epithelial SPC25
3 Duct DMBT1
4 Squamous IL36A
5 Ionocytes PDE1C
FeaturePlot(
non_ciliated,
features = best.wilcox.gene.per.cluster$gene,
reduction = "umap.nc",
raster = FALSE,
repel = TRUE,
ncol = 1,
label = TRUE,
split.by = "tissue"
)
## Seurat top markers
cluster_colors <- paletteer::paletteer_d("pals::glasbey")[factor(top10$cluster)]
DotPlot(non_ciliated,
features = unique(top10$gene),
group.by = "cell_labels_l3",
cols = c("azure1", "blueviolet"),
dot.scale = 3, assay = "RNA") +
RotatedAxis() +
FontSize(y.text = 8, x.text = 12) +
labs(y = element_blank(), x = element_blank()) +
coord_flip() +
theme(axis.text.y = element_text(color = cluster_colors)) +
ggtitle("Top 10 marker genes per cluster (Seurat)")

marker_counts <- paed_sub.markers %>%
filter(avg_log2FC > 1 & p_val_adj < 0.05) %>%
count(cluster, name = "n_markers") %>%
arrange(desc(n_markers))
print(marker_counts)
cluster n_markers
1 Proliferating Epithelial 595
2 Squamous 532
3 Ionocytes 258
4 Duct 96
5 Basal_Club_Goblet 39
ggplot(marker_counts, aes(x = reorder(cluster, n_markers), y = n_markers)) +
geom_col(fill = "steelblue", alpha = 0.8) +
coord_flip() +
labs(title = "High-confidence markers per subcluster (logFC > 1)",
x = "Subcluster", y = "Number of markers") +
theme_minimal()
## Pairwise DE markers (Non-ciliated) {.tabset}
ids <- unique(non_ciliated$cell_labels_l3)
res_list <- list()
for (i in 1:(length(ids) - 1)) {
for (j in (i + 1):length(ids)) {
id1 <- ids[i]
id2 <- ids[j]
res <- FindMarkers(
ciliated,
ident.1 = id1,
ident.2 = id2,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.5,
features = features.no.mt
)
top_genes <- res %>%
filter(pct.1 > 0.1 & pct.2 > 0.1) %>%
arrange(p_val_adj, desc(avg_log2FC)) %>%
head(10) %>%
rownames()
res_list[[paste(id1, "vs", id2, sep = "_")]] <- res
cat("\n\n##", paste(id1, "vs", id2), "\n\n")
print(
VlnPlot(
ciliated,
features = top_genes,
idents = c(id1, id2),
pt.size = 0
) + patchwork::plot_annotation(title = paste(id1, "vs", id2))
)
kable(
head(res, 50),
caption = paste("Top 50 markers for", id1, "vs", id2)
)
}
}
out_markers <- here("output",
"CSV_v3","Epithelial_lineage",
paste("Marker_genes_Reclustered_Non-ciliated.",opt_res, sep = ""))
dir.create(out_markers, recursive = TRUE, showWarnings = FALSE)
for (cl in unique(paed_sub.markers$cluster)) {
cluster_data <- paed_sub.markers %>% dplyr::filter(cluster == cl)
file_name <- here(out_markers, paste0("G000231_Neeland_", cl, ".csv"))
if (!file.exists(file_name)) {
write.csv(cluster_data, file = file_name)
}
}
paed_sub$cell_labels_l2 <- as.character(paed_sub$cell_labels_l1)
paed_sub$cell_labels_l3 <- as.character(paed_sub$cell_labels_l2)
for (obj in list(ciliated, non_ciliated)) {
cells <- intersect(colnames(paed_sub), colnames(obj))
paed_sub$cell_labels_l2[cells] <- as.character(obj$cell_labels_l2[cells])
paed_sub$cell_labels_l3[cells] <- as.character(obj$cell_labels_l3[cells])
}
paed_sub$cell_labels_l2 <- factor(paed_sub$cell_labels_l2)
paed_sub$cell_labels_l3 <- factor(paed_sub$cell_labels_l3)
## Seurat top markers
cluster_colors <- paletteer::paletteer_d("pals::glasbey")[factor(top10$cluster)]
DotPlot(paed_sub,
features = unique(top10$gene),
group.by = "cell_labels_l3",
cols = c("azure1", "blueviolet"),
dot.scale = 3, assay = "RNA") +
RotatedAxis() +
FontSize(y.text = 8, x.text = 12) +
labs(y = element_blank(), x = element_blank()) +
coord_flip() +
theme(axis.text.y = element_text(color = cluster_colors)) +
ggtitle("Top 10 marker genes per cluster (Seurat)")
Warning: Vectorized input to `element_text()` is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.

DimPlot(paed_sub, reduction = "umap.sub", raster = F, repel = T, label = T, group.by = "lineage")

DimPlot(paed_sub, reduction = "umap.sub", raster = F, repel = T, label = T, group.by = "cell_labels_l1")

paed_sub$cell_labels_l2 <- factor(
paed_sub$cell_labels_l2,
levels = c(
"Basal_Club_Goblet",
"Proliferating Epithelial",
"Ciliated Epithelial",
"Deuterosomal",
"Secretory Epithelial",
"Duct",
"Squamous",
"Ionocytes",
"Erythroid cells"
)
)
DimPlot(paed_sub, reduction = "umap.sub", raster = F, repel = T, label = T, group.by = "cell_labels_l2")

DimPlot(paed_sub, reduction = "umap.sub", raster = F, repel = T, label = T, group.by = "cell_labels_l2", split.by = "tissue")

paed_sub$cell_labels_l3 <- factor(
paed_sub$cell_labels_l3,
levels = c(
"Basal_Club_Goblet_0","Basal_Club_Goblet_1","Basal_Club_Goblet_2",
"Basal_Club_Goblet_3","Basal_Club_Goblet_4","Basal_Club_Goblet_5",
"Proliferating Epithelial",
"Ciliated Epithelial_0","Ciliated Epithelial_1","Ciliated Epithelial_2",
"Ciliated Epithelial_3","Ciliated Epithelial_4","Ciliated Epithelial_5",
"Ciliated Epithelial_6","Ciliated Epithelial_7","Ciliated Epithelial_8",
"Ciliated Epithelial_9","Ciliated Epithelial_10",
"Deuterosomal",
"Secretory Epithelial",
"Duct",
"Squamous",
"Ionocytes",
"Erythroid cells"
)
)
DimPlot(paed_sub, reduction = "umap.sub", raster = F, repel = T, label = T, group.by = "cell_labels_l3")

paed_sub@meta.data %>%
count(cell_labels_l3, donor) %>%
group_by(cell_labels_l3) %>%
mutate(prop = n / sum(n)) %>% # proportion per L3
ungroup() %>%
ggplot(aes(x = cell_labels_l3, y = prop, fill = donor)) +
geom_col(width = 0.9) +
scale_y_continuous(labels = scales::percent) +
scale_fill_viridis_d(option = "turbo", guide = guide_legend(ncol = 2)) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = NULL,
y = "Proportion",
fill = "Donor",
title = "Donor composition per L3 cell type"
)

#saveRDS(paed_sub, "../EarlyAir_paper/output/RDS/Annotated_lineages/SEU_Epithelial_annotated.rds")
Some notes/observations from Epithelial cell lineage- Total Epithelial cells- 67,493 1. Majority of the samples are from Nasal_brushings 2. Ciliated cell subclustering gives distinct clusters 3. No clear distinction between Basal/Club/Goblet cells
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.7.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] future_1.40.0 gtools_3.9.5 presto_1.0.0
[4] circlize_0.4.16 BiocParallel_1.36.0 harmony_1.2.3
[7] Rcpp_1.0.14 patchwork_1.3.0 cowplot_1.1.3
[10] data.table_1.17.2 RColorBrewer_1.1-3 kableExtra_1.4.0
[13] ggridges_0.5.6 ggforce_0.4.2 viridis_0.6.5
[16] viridisLite_0.4.2 paletteer_1.6.0 clustree_0.5.1
[19] ggraph_2.2.1 Seurat_5.0.3 SeuratObject_5.1.0
[22] sp_2.2-0 here_1.0.1 lubridate_1.9.4
[25] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[28] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1
[31] tibble_3.2.1 ggplot2_3.5.2 tidyverse_2.0.0
[34] BiocStyle_2.30.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.3 later_1.4.2
[4] prismatic_1.1.2 polyclip_1.10-7 fastDummies_1.7.5
[7] lifecycle_1.0.4 rprojroot_2.0.4 globals_0.17.0
[10] processx_3.8.6 lattice_0.22-5 MASS_7.3-60.0.1
[13] backports_1.5.0 magrittr_2.0.3 limma_3.58.1
[16] plotly_4.10.4 sass_0.4.10 rmarkdown_2.29
[19] jquerylib_0.1.4 yaml_2.3.10 httpuv_1.6.16
[22] sctransform_0.4.2 spam_2.11-1 spatstat.sparse_3.1-0
[25] reticulate_1.42.0 pbapply_1.7-2 abind_1.4-8
[28] Rtsne_0.17 tweenr_2.0.3 git2r_0.36.2
[31] ggrepel_0.9.6 irlba_2.3.5.1 listenv_0.9.1
[34] spatstat.utils_3.1-4 pheatmap_1.0.12 goftest_1.2-3
[37] RSpectra_0.16-2 spatstat.random_3.4-1 fitdistrplus_1.2-2
[40] parallelly_1.44.0 svglite_2.2.1 leiden_0.4.3.1
[43] codetools_0.2-19 xml2_1.3.8 tidyselect_1.2.1
[46] shape_1.4.6.1 farver_2.1.2 matrixStats_1.5.0
[49] spatstat.explore_3.4-3 jsonlite_2.0.0 tidygraph_1.3.1
[52] progressr_0.15.1 survival_3.5-8 systemfonts_1.2.3
[55] tools_4.3.3 ica_1.0-3 glue_1.8.0
[58] gridExtra_2.3 xfun_0.52 withr_3.0.2
[61] BiocManager_1.30.25 fastmap_1.2.0 callr_3.7.6
[64] digest_0.6.37 timechange_0.3.0 R6_2.6.1
[67] mime_0.13 textshaping_1.0.1 colorspace_2.1-1
[70] scattermore_1.2 tensor_1.5 spatstat.data_3.1-6
[73] utf8_1.2.5 generics_0.1.4 graphlayouts_1.2.2
[76] httr_1.4.7 htmlwidgets_1.6.4 whisker_0.4.1
[79] uwot_0.2.3 pkgconfig_2.0.3 gtable_0.3.6
[82] lmtest_0.9-40 htmltools_0.5.8.1 dotCall64_1.2
[85] scales_1.4.0 png_0.1-8 spatstat.univar_3.1-3
[88] knitr_1.50 rstudioapi_0.17.1 tzdb_0.5.0
[91] reshape2_1.4.4 checkmate_2.3.2 nlme_3.1-164
[94] cachem_1.1.0 zoo_1.8-14 GlobalOptions_0.1.2
[97] KernSmooth_2.23-22 vipor_0.4.7 parallel_4.3.3
[100] miniUI_0.1.2 ggrastr_1.0.2 pillar_1.10.2
[103] grid_4.3.3 vctrs_0.6.5 RANN_2.6.2
[106] promises_1.3.2 xtable_1.8-4 cluster_2.1.6
[109] beeswarm_0.4.0 evaluate_1.0.3 cli_3.6.5
[112] compiler_4.3.3 rlang_1.1.6 crayon_1.5.3
[115] future.apply_1.11.3 labeling_0.4.3 rematch2_2.1.2
[118] ps_1.9.1 ggbeeswarm_0.7.2 getPass_0.2-4
[121] plyr_1.8.9 fs_1.6.6 stringi_1.8.7
[124] deldir_2.0-4 lazyeval_0.2.2 spatstat.geom_3.4-1
[127] Matrix_1.6-5 RcppHNSW_0.6.0 hms_1.1.3
[130] statmod_1.5.0 shiny_1.10.0 ROCR_1.0-11
[133] igraph_2.1.4 memoise_2.0.1 bslib_0.9.0
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.7.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] future_1.40.0 gtools_3.9.5 presto_1.0.0
[4] circlize_0.4.16 BiocParallel_1.36.0 harmony_1.2.3
[7] Rcpp_1.0.14 patchwork_1.3.0 cowplot_1.1.3
[10] data.table_1.17.2 RColorBrewer_1.1-3 kableExtra_1.4.0
[13] ggridges_0.5.6 ggforce_0.4.2 viridis_0.6.5
[16] viridisLite_0.4.2 paletteer_1.6.0 clustree_0.5.1
[19] ggraph_2.2.1 Seurat_5.0.3 SeuratObject_5.1.0
[22] sp_2.2-0 here_1.0.1 lubridate_1.9.4
[25] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[28] purrr_1.0.4 readr_2.1.5 tidyr_1.3.1
[31] tibble_3.2.1 ggplot2_3.5.2 tidyverse_2.0.0
[34] BiocStyle_2.30.0 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.3 later_1.4.2
[4] prismatic_1.1.2 polyclip_1.10-7 fastDummies_1.7.5
[7] lifecycle_1.0.4 rprojroot_2.0.4 globals_0.17.0
[10] processx_3.8.6 lattice_0.22-5 MASS_7.3-60.0.1
[13] backports_1.5.0 magrittr_2.0.3 limma_3.58.1
[16] plotly_4.10.4 sass_0.4.10 rmarkdown_2.29
[19] jquerylib_0.1.4 yaml_2.3.10 httpuv_1.6.16
[22] sctransform_0.4.2 spam_2.11-1 spatstat.sparse_3.1-0
[25] reticulate_1.42.0 pbapply_1.7-2 abind_1.4-8
[28] Rtsne_0.17 tweenr_2.0.3 git2r_0.36.2
[31] ggrepel_0.9.6 irlba_2.3.5.1 listenv_0.9.1
[34] spatstat.utils_3.1-4 pheatmap_1.0.12 goftest_1.2-3
[37] RSpectra_0.16-2 spatstat.random_3.4-1 fitdistrplus_1.2-2
[40] parallelly_1.44.0 svglite_2.2.1 leiden_0.4.3.1
[43] codetools_0.2-19 xml2_1.3.8 tidyselect_1.2.1
[46] shape_1.4.6.1 farver_2.1.2 matrixStats_1.5.0
[49] spatstat.explore_3.4-3 jsonlite_2.0.0 tidygraph_1.3.1
[52] progressr_0.15.1 survival_3.5-8 systemfonts_1.2.3
[55] tools_4.3.3 ica_1.0-3 glue_1.8.0
[58] gridExtra_2.3 xfun_0.52 withr_3.0.2
[61] BiocManager_1.30.25 fastmap_1.2.0 callr_3.7.6
[64] digest_0.6.37 timechange_0.3.0 R6_2.6.1
[67] mime_0.13 textshaping_1.0.1 colorspace_2.1-1
[70] scattermore_1.2 tensor_1.5 spatstat.data_3.1-6
[73] utf8_1.2.5 generics_0.1.4 graphlayouts_1.2.2
[76] httr_1.4.7 htmlwidgets_1.6.4 whisker_0.4.1
[79] uwot_0.2.3 pkgconfig_2.0.3 gtable_0.3.6
[82] lmtest_0.9-40 htmltools_0.5.8.1 dotCall64_1.2
[85] scales_1.4.0 png_0.1-8 spatstat.univar_3.1-3
[88] knitr_1.50 rstudioapi_0.17.1 tzdb_0.5.0
[91] reshape2_1.4.4 checkmate_2.3.2 nlme_3.1-164
[94] cachem_1.1.0 zoo_1.8-14 GlobalOptions_0.1.2
[97] KernSmooth_2.23-22 vipor_0.4.7 parallel_4.3.3
[100] miniUI_0.1.2 ggrastr_1.0.2 pillar_1.10.2
[103] grid_4.3.3 vctrs_0.6.5 RANN_2.6.2
[106] promises_1.3.2 xtable_1.8-4 cluster_2.1.6
[109] beeswarm_0.4.0 evaluate_1.0.3 cli_3.6.5
[112] compiler_4.3.3 rlang_1.1.6 crayon_1.5.3
[115] future.apply_1.11.3 labeling_0.4.3 rematch2_2.1.2
[118] ps_1.9.1 ggbeeswarm_0.7.2 getPass_0.2-4
[121] plyr_1.8.9 fs_1.6.6 stringi_1.8.7
[124] deldir_2.0-4 lazyeval_0.2.2 spatstat.geom_3.4-1
[127] Matrix_1.6-5 RcppHNSW_0.6.0 hms_1.1.3
[130] statmod_1.5.0 shiny_1.10.0 ROCR_1.0-11
[133] igraph_2.1.4 memoise_2.0.1 bslib_0.9.0