Last updated: 2025-02-04

Checks: 5 2

Knit directory: paed-airway-allTissues/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20230811) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
~/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2 output/RDS/AllBatches_Annotation_SEUs_v2

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 54e4ec2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    data/RDS/
    Ignored:    output/.DS_Store
    Ignored:    output/CSV/.DS_Store
    Ignored:    output/G000231_Neeland_batch1/
    Ignored:    output/G000231_Neeland_batch2_1/
    Ignored:    output/G000231_Neeland_batch2_2/
    Ignored:    output/G000231_Neeland_batch3/
    Ignored:    output/G000231_Neeland_batch4/
    Ignored:    output/G000231_Neeland_batch5/
    Ignored:    output/G000231_Neeland_batch9_1/
    Ignored:    output/RDS/
    Ignored:    output/plots/

Untracked files:
    Untracked:  analysis/03_Batch_Integration.Rmd
    Untracked:  analysis/Age_proportions.Rmd
    Untracked:  analysis/Age_proportions_AllBatches.Rmd
    Untracked:  analysis/All_Batches_QCExploratory_v2.Rmd
    Untracked:  analysis/All_metadata.Rmd
    Untracked:  analysis/Annotation_BAL.Rmd
    Untracked:  analysis/Annotation_Bronchial_brushings.Rmd
    Untracked:  analysis/Annotation_Nasal_brushings.Rmd
    Untracked:  analysis/BatchCorrection_Adenoids.Rmd
    Untracked:  analysis/BatchCorrection_Nasal_brushings.Rmd
    Untracked:  analysis/BatchCorrection_Tonsils.Rmd
    Untracked:  analysis/Batch_Integration_&_Downstream_analysis.Rmd
    Untracked:  analysis/Batch_correction_&_Downstream.Rmd
    Untracked:  analysis/Cell_cycle_regression.Rmd
    Untracked:  analysis/Clustering_Tonsils_v2.Rmd
    Untracked:  analysis/DGE_analysis_George.Rmd
    Untracked:  analysis/Master_metadata.Rmd
    Untracked:  analysis/Pediatric_Vs_Adult_Atlases.Rmd
    Untracked:  analysis/Preprocessing_Batch1_Nasal_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch2_Tonsils.Rmd
    Untracked:  analysis/Preprocessing_Batch3_Adenoids.Rmd
    Untracked:  analysis/Preprocessing_Batch4_Bronchial_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch5_Nasal_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch6_BAL.Rmd
    Untracked:  analysis/Preprocessing_Batch7_Bronchial_brushings.Rmd
    Untracked:  analysis/Preprocessing_Batch8_Adenoids.Rmd
    Untracked:  analysis/Preprocessing_Batch9_Tonsils.Rmd
    Untracked:  analysis/TonsilsVsAdenoids.Rmd
    Untracked:  analysis/cell_cycle_regression.R
    Untracked:  analysis/testing_age_all.Rmd
    Untracked:  data/Cell_labels_Gunjan_v2/
    Untracked:  data/Cell_labels_Mel/
    Untracked:  data/Cell_labels_Mel_v2/
    Untracked:  data/Cell_labels_Mel_v3/
    Untracked:  data/Cell_labels_modified_Gunjan/
    Untracked:  data/Gene_sets/
    Untracked:  data/Hs.c2.cp.reactome.v7.1.entrez.rds
    Untracked:  data/Raw_feature_bc_matrix/
    Untracked:  data/cell_labels_Mel_v4_Dec2024/
    Untracked:  data/celltypes_Mel_GD_v3.xlsx
    Untracked:  data/celltypes_Mel_GD_v4_no_dups.xlsx
    Untracked:  data/celltypes_Mel_modified.xlsx
    Untracked:  data/celltypes_Mel_v2.csv
    Untracked:  data/celltypes_Mel_v2.xlsx
    Untracked:  data/celltypes_Mel_v2_MN.xlsx
    Untracked:  data/celltypes_for_mel_MN.xlsx
    Untracked:  data/col_palette.xlsx
    Untracked:  data/earlyAIR_sample_sheets_combined.xlsx
    Untracked:  data/~$col_palette.xlsx
    Untracked:  output/CSV/All_tissues.propeller.xlsx
    Untracked:  output/CSV/Bronchial_brushings/
    Untracked:  output/CSV/Bronchial_brushings_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/
    Untracked:  output/CSV/G000231_Neeland_Adenoids.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Bronchial_brushings.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Nasal_brushings.propeller.xlsx
    Untracked:  output/CSV/G000231_Neeland_Tonsils.propeller.xlsx
    Untracked:  output/CSV/Nasal_brushings/
    Untracked:  output/CSV_v2/G000231_Neeland_Adenoids.propeller.xlsx
    Untracked:  output/CSV_v2/G000231_Neeland_Nasal_brushings.propeller.xlsx
    Untracked:  output/CSV_v2/G000231_Neeland_Tonsils.propeller.xlsx
    Untracked:  output/DGE/
    Untracked:  test_col.csv
    Untracked:  test_col.txt
    Untracked:  test_col.xlsx

Unstaged changes:
    Deleted:    02_QC_exploratoryPlots.Rmd
    Deleted:    02_QC_exploratoryPlots.html
    Modified:   analysis/00_AllBatches_overview.Rmd
    Modified:   analysis/01_QC_emptyDrops.Rmd
    Modified:   analysis/02_QC_exploratoryPlots.Rmd
    Modified:   analysis/Adenoids.Rmd
    Modified:   analysis/Adenoids_v2.Rmd
    Modified:   analysis/Age_modeling.Rmd
    Modified:   analysis/Age_modelling_Adenoids.Rmd
    Modified:   analysis/Age_modelling_Nasal_Brushings.Rmd
    Modified:   analysis/Age_modelling_Tonsils.Rmd
    Modified:   analysis/AllBatches_QCExploratory.Rmd
    Modified:   analysis/BAL.Rmd
    Modified:   analysis/BAL_v2.Rmd
    Modified:   analysis/Bronchial_brushings.Rmd
    Modified:   analysis/Bronchial_brushings_v2.Rmd
    Modified:   analysis/Nasal_brushings.Rmd
    Modified:   analysis/Nasal_brushings_v2.Rmd
    Modified:   analysis/Subclustering_Adenoids.Rmd
    Modified:   analysis/Subclustering_BAL.Rmd
    Modified:   analysis/Subclustering_Bronchial_brushings.Rmd
    Modified:   analysis/Subclustering_Nasal_brushings.Rmd
    Modified:   analysis/Subclustering_Tonsils.Rmd
    Modified:   analysis/Tonsils.Rmd
    Modified:   analysis/Tonsils_v2.Rmd
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c0.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c1.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c10.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c11.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c12.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c13.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c14.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c15.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c16.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c17.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c2.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c3.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c4.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c5.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c6.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c7.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c8.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/REACTOME-cluster-limma-c9.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c0.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c1.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c10.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c11.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c12.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c13.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c14.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c15.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c16.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c17.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c2.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c3.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c4.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c5.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c6.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c7.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c8.csv
    Modified:   output/CSV/BAL_Marker_gene_clusters.limmaTrendRNA_snn_res.0.4/up-cluster-limma-c9.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Load libraries

suppressPackageStartupMessages({
  library(BiocStyle)
  library(tidyverse)
  library(here)
  library(dplyr)
  library(Seurat)
  library(clustree)
  library(paletteer)
  library(viridis)
  library(ggforce)
  library(ggridges)
  library(kableExtra)
  library(RColorBrewer)
  library(data.table)
  library(dplyr)
  library(cowplot)
  library(ggplot2)
  library(paletteer)
  library(patchwork)
  library(harmony)
  library(BiocParallel)
  library(circlize)
})
data_path <- here("~/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2")
tissue_list <- list.files(data_path, pattern = "\\.rds$", full.names = TRUE)

tissue_list
[1] "/Users/dixitgunjan/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2/G000231_Neeland_Adenoids.annotated_clusters.SEU.rds"           
[2] "/Users/dixitgunjan/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2/G000231_Neeland_BAL.annotated_clusters.SEU.rds"                
[3] "/Users/dixitgunjan/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2/G000231_Neeland_Bronchial_brushings.annotated_clusters.SEU.rds"
[4] "/Users/dixitgunjan/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2/G000231_Neeland_Nasal_brushings.annotated_clusters.SEU.rds"    
[5] "/Users/dixitgunjan/projects/paed-airway-atlas/airway-atlas-allTissues/paed-airway-allTissues/output/RDS/AllBatches_Annotation_SEUs_v2/G000231_Neeland_Tonsils.annotated_clusters.SEU.rds"            
metadata_list <- list()

for (tissue in tissue_list) {
  seu <- readRDS(tissue)
  metadata <- seu@meta.data
  metadata$source <- basename(tissue)  
  metadata_list[[tissue]] <- metadata
}

combined_metadata <- bind_rows(metadata_list)
sort(table(combined_metadata$cell_labels_v2), decreasing = T)

                   Naïve B cells                   Memory B cells 
                           87050                            51442 
                     Macrophages    Non-ciliated epithelial cells 
                           37383                            34992 
                         CD4 TFH        Ciliated epithelial cells 
                           25963                            23752 
                          CD4 TN            DZtoLZ GCB transition 
                           17887                            17762 
                       TFH-LZ-GC                 Naïve B cell-IFN 
                           17170                            14607 
                  Plasma B cells                      DZ G2Mphase 
                           14373                            13881 
                       Early MBC                           DZ GCB 
                           13055                            12961 
                         CD8 TRM                 DZ early  Sphase 
                           12317                            10212 
                          CD8 TF     Monocyte and neutrophil-like 
                            9481                             9318 
                  DZ late Sphase                           CD8 TN 
                            8618                             8398 
                    CD4 Treg-eff           Early GC-committed NBC 
                            8293                             8086 
                           T-IFN                      Cycling GCB 
                            7501                             7288 
         Naïve B cells activated                          CD4 TCM 
                            6632                             6562 
           Monocytes/macrophages                      Neutrophils 
                            5481                             5095 
            Intermediate B cells                          CD4 TEM 
                            4749                             4479 
                         CD8 TEM              IFN-activated cells 
                            3909                             3687 
GC-commited metabolic activation                        Monocytes 
                            3613                             3548 
                     Pre-BCRi II                         NK cells 
                            3452                             3352 
          Unconventional T cells                 Plasmacytoid DCs 
                            3331                             3316 
                    CD4 effector       Follicular dendritic cells 
                            2944                             2811 
                 Dendritic cells                        Macro-CCL 
                            2494                             2434 
                        CD4 Treg                      B activated 
                            2296                             2045 
               Double negative T               Early PC precursor 
                            2026                             2020 
                      Mast cells              Macro-proliferating 
                            1877                             1813 
  Proliferating epithelial cells                    Gamma delta T 
                            1799                             1657 
                      Pre-MBC/BC                 NK/gamma-delta T 
                            1651                             1546 
      Secretory epithelial cells              CD4 T proliferating 
                            1142                             1086 
           Proliferating B cells               Proliferating T/NK 
                            1086                             1075 
                  csMBC FCRL4/5+                      Pre-T cells 
                             960                              937 
                      NK-T cells                  Erythroid cells 
                             832                              802 
                Epithelial cells                          B cells 
                             779                              726 
      DZ GCB Noproli-memory like                       MAIT cells 
                             694                              603 
                       Ionocytes                        Cycling T 
                             558                              536 
                           CD8 T                          GCB-IFN 
                             387                              324 
          Basal epithelial cells                       Melanocyte 
                             206                              205 
               Mesothelial cells 
                             199 
unique(combined_metadata$cell_labels_v2)
 [1] "Naïve B cells"                    "Memory B cells"                  
 [3] "Plasma B cells"                   "Naïve B cell-IFN"                
 [5] "Monocytes/macrophages"            "Follicular dendritic cells"      
 [7] "Pre-BCRi II"                      "Plasmacytoid DCs"                
 [9] "Epithelial cells"                 "Mast cells"                      
[11] "Neutrophils"                      "DZtoLZ GCB transition"           
[13] "DZ early  Sphase"                 "Early MBC"                       
[15] "Early GC-committed NBC"           "DZ G2Mphase"                     
[17] "DZ GCB Noproli-memory like"       "DZ GCB"                          
[19] "DZ late Sphase"                   "GC-commited metabolic activation"
[21] "csMBC FCRL4/5+"                   "Cycling GCB"                     
[23] "Early PC precursor"               "Pre-T cells"                     
[25] "CD4 T proliferating"              "CD4 TN"                          
[27] "CD4 TFH"                          "T-IFN"                           
[29] "CD4 TCM"                          "NK-T cells"                      
[31] "CD4 Treg-eff"                     "TFH-LZ-GC"                       
[33] "CD8 TF"                           "CD8 TN"                          
[35] "NK/gamma-delta T"                 "Double negative T"               
[37] "B cells"                          "Basal epithelial cells"          
[39] "Macrophages"                      "Macro-proliferating"             
[41] "Secretory epithelial cells"       "Dendritic cells"                 
[43] "Ciliated epithelial cells"        "Macro-CCL"                       
[45] "Monocytes"                        "Pre-MBC/BC"                      
[47] "Proliferating B cells"            "B activated"                     
[49] "CD4 TEM"                          "CD4 Treg"                        
[51] "CD8 TRM"                          "CD8 TEM"                         
[53] "NK cells"                         "CD8 T"                           
[55] "Proliferating T/NK"               "MAIT cells"                      
[57] "Mesothelial cells"                "Monocyte and neutrophil-like"    
[59] "Non-ciliated epithelial cells"    "Ionocytes"                       
[61] "Intermediate B cells"             "Unconventional T cells"          
[63] "Melanocyte"                       "Proliferating epithelial cells"  
[65] "IFN-activated cells"              "Erythroid cells"                 
[67] "Naïve B cells activated"          "GCB-IFN"                         
[69] "CD4 effector"                     "Gamma delta T"                   
[71] "Cycling T"                       
unique(combined_metadata$Broad_cell_label_3)
 [1] "B cells"                 "Dendritic cells"        
 [3] "Macrophages"             "Doublet query/Other"    
 [5] "Epithelial lineage"      "Granulocytes"           
 [7] "CD4 T cells"             "Gamma delta T cells"    
 [9] "Double negative T cells" "CD8 T cells"            
[11] "Pre B/T cells"           "Cycling T cells"        
[13] "Innate lymphoid cells"   "Natural Killer cells"   
[15] "Other"                   "Monocytes"              
[17] "Neuroendocrine"          "SMG duct"               
[19] "Fibroblast lineage"      "Endothelial lineage"    

Exploratory figures

To see cell counts across all Tissues

cell_type_counts <- sort(table(combined_metadata$cell_labels_v2), decreasing = TRUE) %>% 
  as.data.frame() %>%
  rename(CellType = Var1, Count = Freq)

a <- ggplot(cell_type_counts, aes(x = reorder(CellType, Count), y = Count)) +
  geom_bar(stat = "identity", fill = "purple3") +
  geom_text(aes(label = Count), hjust = -0.1, size = 3) +  # Position the text just outside the bar
  coord_flip() +
  labs(title = "Cell Type counts in earlyAIR Atlas", x = "Cell Types", y = "Cell Count") +
  theme_minimal()

a

cell_type_tissue_counts <- combined_metadata %>%
  group_by(cell_labels_v2, tissue) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  rename(CellType = cell_labels_v2)

total_counts <- cell_type_tissue_counts %>%
  group_by(CellType) %>%
  summarise(TotalCount = sum(Count)) %>%
  arrange(desc(TotalCount))

cell_type_tissue_counts$CellType <- factor(cell_type_tissue_counts$CellType, levels = rev(total_counts$CellType))

ggplot(cell_type_tissue_counts, aes(x = CellType, y = Count, fill = tissue)) +
  geom_bar(stat = "identity") +
  geom_text(data = total_counts, aes(x = CellType, y = TotalCount, label = TotalCount), 
            hjust = -0.1, size = 3, inherit.aes = FALSE) +  
  coord_flip() +  
  scale_fill_brewer(palette = "Set3") +
  labs(title = "Cell Type Distribution by Tissue in earlyAIR Atlas", x = "Cell Types", y = "Cell Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

ggplot(cell_type_tissue_counts, aes(x = CellType, y = Count, fill = tissue)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  geom_text(aes(label = Count), position = position_stack(vjust = 0.5), size = 2.5) +  
  scale_fill_brewer(palette = "Set3") +  # Customize palette
  labs(title = "Cell Type Distribution by Tissue", x = "Cell Types", y = "Cell Count", fill = "Tissue") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 8))
ggplot(combined_metadata, aes(x = cell_labels_v2, fill = tissue)) +
  geom_bar(position = "dodge") +
  facet_wrap(~tissue, scales = "free_y") +
  coord_flip() +
  labs(title = "Cell Type Counts by Tissue", x = "Cell Types", y = "Count") +
  theme_minimal()


ggplot(combined_metadata, aes(x = tissue, fill = cell_labels_v2)) +
  geom_bar(position = "stack") +
  geom_text(stat = "count", aes(label = ..count..), position = position_stack(vjust = 0.5), size = 2.5) +
  labs(title = "Cell Type Counts per Tissue", x = "Tissue", y = "Count") +
  theme_minimal()
#set.seed(012025)
#n <- 71
n <- length(unique(combined_metadata$cell_labels_v2))
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector <- unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
sampled_colors <- sample(col_vector, n, replace = TRUE)
cell_types <- unique(combined_metadata$cell_labels_v2)
#color_palette <- setNames(sampled_colors[1:length(cell_types)], cell_types)
color_palette <- readRDS(here("output/RDS/color_palette_unique.rds"))

proportion_df <- combined_metadata %>%
  group_by(tissue, cell_labels_v2) %>%
  summarise(Count = n()) %>%
  mutate(Proportion = Count / sum(Count))
`summarise()` has grouped output by 'tissue'. You can override using the
`.groups` argument.
sampled_colors_1 <- c("#5c248b", "#1f57a6", "#ffec34", "#00960f", "#BC80BD", "#f06ab9", "#85d519", "#758dc4", "#89c5df", "#5da3cd", "#ffffba", "#009260", "#ffa037", "#A65628", "#E31A1C", "#377EB8" ,"#FDC086", "#FC8D62" ,"#FDDAEC", "#E78AC3")

proportion_df <- combined_metadata %>%
  group_by(tissue, Broad_cell_label_3) %>%
  summarise(Count = n()) %>%
  mutate(Proportion = Count / sum(Count))
`summarise()` has grouped output by 'tissue'. You can override using the
`.groups` argument.
tissue_order <- c("Tonsils", "Adenoids", "Nasal_brushings", "Bronchial_brushings", "BAL")
proportion_df$tissue <- factor(proportion_df$tissue, levels = tissue_order)

p_stacked <- ggplot(proportion_df, aes(x = tissue, y = Proportion, fill = Broad_cell_label_3)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = sampled_colors_1) +  
  ylab("Proportion of Cell Labels") +
  theme_cowplot(font_size = 10) +
  labs(title = "Proportion of Cell Labels per Tissue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5))  # Center the title

print(p_stacked)

combined_metadata <- combined_metadata %>%
  mutate(Broad_category = case_when(
    # B Cells
    cell_labels_v2 %in% c("Naïve B cells", "Memory B cells", "Naïve B cell-IFN", 
                          "Plasma B cells", "DZtoLZ GCB transition", "DZ early  Sphase", 
                          "Early MBC", "Early GC-committed NBC", "DZ G2Mphase", 
                          "DZ GCB Noproli-memory like", "DZ GCB", "DZ late Sphase", 
                          "GC-commited metabolic activation", "csMBC FCRL4/5+", 
                          "Cycling GCB", "Early PC precursor", "Pre-MBC/BC", 
                          "Proliferating B cells", "B cells", "Intermediate B cells", 
                          "Naïve B cells activated", "Pre-BCRi II", "IFN-activated cells", 
                          "GCB-IFN", "B activated") ~ "B Cells",
  
    # T Cells
    cell_labels_v2 %in% c("CD4 TN", "CD4 TFH", "CD4 TCM", "CD4 Treg", 
                          "CD4 Treg-eff", "TFH-LZ-GC", "CD4 TEM", "CD4 effector", 
                          "CD4 T proliferating", "T-IFN", "CD8 TN", "CD8 TF", 
                          "CD8 T", "CD8 TRM", "CD8 TEM", "Double negative T", 
                          "Unconventional T cells", "Gamma delta T", "Cycling T", "Pre-T cells") ~ "T Cells",
    
    # NK Cells
    cell_labels_v2 %in% c("NK cells", "NK-T cells", "NK/gamma-delta T", 
                          "Proliferating T/NK") ~ "NK Cells",
    
    # Monocytes and Macrophages
    cell_labels_v2 %in% c("Monocytes", "Monocytes/macrophages", "Macrophages", 
                          "Macro-proliferating", "Macro-CCL", "Monocyte and neutrophil-like") ~ "Monocytes and Macrophages",
    
    # Dendritic Cells
    cell_labels_v2 %in% c("Dendritic cells", "Plasmacytoid DCs", "Follicular dendritic cells") ~ "Dendritic Cells",
    
    # Neutrophils
    cell_labels_v2 == "Neutrophils" ~ "Neutrophils",
    
    # Innate Lymphoid Cells
    cell_labels_v2 %in% c("MAIT cells", "Innate lymphocytes") ~ "Innate Lymphoid Cells",
    
    # Epithelial Cells
    cell_labels_v2 %in% c("Epithelial cells", "Basal epithelial cells", "Ciliated epithelial cells", 
                          "Non-ciliated epithelial cells", "Secretory epithelial cells", 
                          "Proliferating epithelial cells") ~ "Epithelial Cells",
    
    # Other
    cell_labels_v2 %in% c( "Mast cells", "Erythroid cells", "Ionocytes", "Mesothelial cells", 
                          "Melanocyte",
                          "Naïve / PC/ doublet") ~ "Other",
    
    TRUE ~ "Unclassified"  # Default category for unmatched labels
  ))
head(combined_metadata)
                                  donor_id sample_id age_years sex nCount_RNA
Batch1_AAACAAGCAACTTCGTACTTTAGG-1  eAIR001      s042      3.62   M   2097.863
Batch1_AAACAAGCATCGTTCGACTTTAGG-1  eAIR001      s042      3.62   M   4072.417
Batch1_AAACCAATCCTTTAGGACTTTAGG-1  eAIR001      s042      3.62   M   2848.063
Batch1_AAACCGGTCCGTGACTACTTTAGG-1  eAIR001      s042      3.62   M    434.372
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1  eAIR001      s042      3.62   M  21590.274
Batch1_AAACGTTCATGGCTAAACTTTAGG-1  eAIR001      s042      3.62   M   1144.299
                                  nFeature_RNA                    Barcode
Batch1_AAACAAGCAACTTCGTACTTTAGG-1         1486 AAACAAGCAACTTCGTACTTTAGG-1
Batch1_AAACAAGCATCGTTCGACTTTAGG-1         2506 AAACAAGCATCGTTCGACTTTAGG-1
Batch1_AAACCAATCCTTTAGGACTTTAGG-1         1891 AAACCAATCCTTTAGGACTTTAGG-1
Batch1_AAACCGGTCCGTGACTACTTTAGG-1          443 AAACCGGTCCGTGACTACTTTAGG-1
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1         4311 AAACGTTCAGCCCTTAACTTTAGG-1
Batch1_AAACGTTCATGGCTAAACTTTAGG-1          877 AAACGTTCATGGCTAAACTTTAGG-1
                                        GEM_barcode sample_barcode   tissue
Batch1_AAACAAGCAACTTCGTACTTTAGG-1 AAACAAGCAACTTCGTA      CTTTAGG-1 Adenoids
Batch1_AAACAAGCATCGTTCGACTTTAGG-1 AAACAAGCATCGTTCGA      CTTTAGG-1 Adenoids
Batch1_AAACCAATCCTTTAGGACTTTAGG-1 AAACCAATCCTTTAGGA      CTTTAGG-1 Adenoids
Batch1_AAACCGGTCCGTGACTACTTTAGG-1 AAACCGGTCCGTGACTA      CTTTAGG-1 Adenoids
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1 AAACGTTCAGCCCTTAA      CTTTAGG-1 Adenoids
Batch1_AAACGTTCATGGCTAAACTTTAGG-1 AAACGTTCATGGCTAAA      CTTTAGG-1 Adenoids
                                      batch_name cells_per_GEM.Var1
Batch1_AAACAAGCAACTTCGTACTTTAGG-1 G000231_batch3               <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1 G000231_batch3               <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1 G000231_batch3               <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1 G000231_batch3               <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1 G000231_batch3               <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1 G000231_batch3               <NA>
                                  cells_per_GEM.Freq scDblFinder.class_dbr
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  3               singlet
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  3               singlet
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  2               singlet
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  1               singlet
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  3               singlet
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  1               singlet
                                  scDblFinder.score_dbr predicted.celltype.l1
Batch1_AAACAAGCAACTTCGTACTTTAGG-1          8.364258e-04               B naive
Batch1_AAACAAGCATCGTTCGACTTTAGG-1          3.647992e-04     FCRL4/5+ B memory
Batch1_AAACCAATCCTTTAGGACTTTAGG-1          6.277001e-03               B naive
Batch1_AAACCGGTCCGTGACTACTTTAGG-1          6.357333e-07           B activated
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1          3.589588e-02                    PC
Batch1_AAACGTTCATGGCTAAACTTTAGG-1          2.285886e-04               B naive
                                  predicted.celltype.l2
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                   NBC
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                ncsMBC
Batch1_AAACCAATCCTTTAGGACTTTAGG-1  NBC early activation
Batch1_AAACCGGTCCGTGACTACTTTAGG-1       GC-commited NBC
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1     IgG+ PC precursor
Batch1_AAACGTTCATGGCTAAACTTTAGG-1  NBC early activation
                                  predicted.celltype.l1.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                   0.9986553
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                   0.5014055
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                   0.9326043
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                   0.5793129
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                   0.8757269
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                   0.9211898
                                  predicted.celltype.l2.score percent.mt
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                   0.5650053 0.89363848
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                   0.3133930 0.45582175
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                   0.4845545 0.45551262
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                   0.5580569 0.37925250
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                   0.6296915 0.06705518
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                   0.8442881 0.60955238
                                  mapping.score Broad_cell_label_1
Batch1_AAACAAGCAACTTCGTACTTTAGG-1     0.8641857             Immune
Batch1_AAACAAGCATCGTTCGACTTTAGG-1     0.7719604             Immune
Batch1_AAACCAATCCTTTAGGACTTTAGG-1     0.7590004             Immune
Batch1_AAACCGGTCCGTGACTACTTTAGG-1     0.8996206             Immune
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1     0.8858244             Immune
Batch1_AAACGTTCATGGCTAAACTTTAGG-1     0.8131308             Immune
                                  Broad_cell_label_2 Broad_cell_label_3
Batch1_AAACAAGCAACTTCGTACTTTAGG-1            B cells            B cells
Batch1_AAACAAGCATCGTTCGACTTTAGG-1            B cells            B cells
Batch1_AAACCAATCCTTTAGGACTTTAGG-1            B cells            B cells
Batch1_AAACCGGTCCGTGACTACTTTAGG-1            B cells            B cells
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1            B cells            B cells
Batch1_AAACGTTCATGGCTAAACTTTAGG-1            B cells            B cells
                                  unintegrated_clusters harmony_clusters
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                     0                0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                     2                2
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                     0                0
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                     0                0
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                     9                9
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                     0                0
                                  RNA_snn_res.0.1 RNA_snn_res.0.2
Batch1_AAACAAGCAACTTCGTACTTTAGG-1               0               0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1               0               0
Batch1_AAACCAATCCTTTAGGACTTTAGG-1               0               0
Batch1_AAACCGGTCCGTGACTACTTTAGG-1               0               0
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1               5               5
Batch1_AAACGTTCATGGCTAAACTTTAGG-1               0               0
                                  RNA_snn_res.0.3 RNA_snn_res.0.4
Batch1_AAACAAGCAACTTCGTACTTTAGG-1               0               0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1               2               1
Batch1_AAACCAATCCTTTAGGACTTTAGG-1               0               0
Batch1_AAACCGGTCCGTGACTACTTTAGG-1               0               0
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1               9               9
Batch1_AAACGTTCATGGCTAAACTTTAGG-1               0               0
                                  RNA_snn_res.0.5 RNA_snn_res.0.6
Batch1_AAACAAGCAACTTCGTACTTTAGG-1               0               0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1               1               1
Batch1_AAACCAATCCTTTAGGACTTTAGG-1               0               0
Batch1_AAACCGGTCCGTGACTACTTTAGG-1               8               7
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1              12              12
Batch1_AAACGTTCATGGCTAAACTTTAGG-1               0               0
                                  RNA_snn_res.0.7 RNA_snn_res.0.8
Batch1_AAACAAGCAACTTCGTACTTTAGG-1               0               0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1               1               1
Batch1_AAACCAATCCTTTAGGACTTTAGG-1               0               1
Batch1_AAACCGGTCCGTGACTACTTTAGG-1               7               6
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1              13              13
Batch1_AAACGTTCATGGCTAAACTTTAGG-1               0               0
                                  RNA_snn_res.0.9 RNA_snn_res.1 cluster
Batch1_AAACAAGCAACTTCGTACTTTAGG-1               0             0       0
Batch1_AAACAAGCATCGTTCGACTTTAGG-1               1             1       1
Batch1_AAACCAATCCTTTAGGACTTTAGG-1               0             0       0
Batch1_AAACCGGTCCGTGACTACTTTAGG-1               6             6       0
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1              11            12       9
Batch1_AAACGTTCATGGCTAAACTTTAGG-1               0             0       0
                                     cell_labels cell_labels_v2
Batch1_AAACAAGCAACTTCGTACTTTAGG-1  Naïve B cells  Naïve B cells
Batch1_AAACAAGCATCGTTCGACTTTAGG-1 Memory B cells Memory B cells
Batch1_AAACCAATCCTTTAGGACTTTAGG-1  Naïve B cells  Naïve B cells
Batch1_AAACCGGTCCGTGACTACTTTAGG-1  Naïve B cells  Naïve B cells
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1 Plasma B cells Plasma B cells
Batch1_AAACGTTCATGGCTAAACTTTAGG-1  Naïve B cells  Naïve B cells
                                                                               source
Batch1_AAACAAGCAACTTCGTACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
Batch1_AAACAAGCATCGTTCGACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
Batch1_AAACCAATCCTTTAGGACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
Batch1_AAACCGGTCCGTGACTACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
Batch1_AAACGTTCATGGCTAAACTTTAGG-1 G000231_Neeland_Adenoids.annotated_clusters.SEU.rds
                                  predicted.ann_level_1
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  <NA>
                                  predicted.ann_level_1.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                          NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                          NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                          NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                          NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                          NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                          NA
                                  predicted.ann_level_2
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  <NA>
                                  predicted.ann_level_2.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                          NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                          NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                          NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                          NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                          NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                          NA
                                  predicted.ann_level_3
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  <NA>
                                  predicted.ann_level_3.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                          NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                          NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                          NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                          NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                          NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                          NA
                                  predicted.ann_level_4
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  <NA>
                                  predicted.ann_level_4.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                          NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                          NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                          NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                          NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                          NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                          NA
                                  predicted.ann_level_5
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                  <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                  <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                  <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                  <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                  <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                  <NA>
                                  predicted.ann_level_5.score
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                          NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                          NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                          NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                          NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                          NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                          NA
                                  predicted.ann_finest_level
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                       <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                       <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                       <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                       <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                       <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                       <NA>
                                  predicted.ann_finest_level.score donor sum
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                               NA  <NA>  NA
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                               NA  <NA>  NA
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                               NA  <NA>  NA
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                               NA  <NA>  NA
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                               NA  <NA>  NA
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                               NA  <NA>  NA
                                  detected scDblFinder.class_dbr_s
Batch1_AAACAAGCAACTTCGTACTTTAGG-1       NA                    <NA>
Batch1_AAACAAGCATCGTTCGACTTTAGG-1       NA                    <NA>
Batch1_AAACCAATCCTTTAGGACTTTAGG-1       NA                    <NA>
Batch1_AAACCGGTCCGTGACTACTTTAGG-1       NA                    <NA>
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1       NA                    <NA>
Batch1_AAACGTTCATGGCTAAACTTTAGG-1       NA                    <NA>
                                  scDblFinder.score_dbr_s Broad_category
Batch1_AAACAAGCAACTTCGTACTTTAGG-1                      NA        B Cells
Batch1_AAACAAGCATCGTTCGACTTTAGG-1                      NA        B Cells
Batch1_AAACCAATCCTTTAGGACTTTAGG-1                      NA        B Cells
Batch1_AAACCGGTCCGTGACTACTTTAGG-1                      NA        B Cells
Batch1_AAACGTTCAGCCCTTAACTTTAGG-1                      NA        B Cells
Batch1_AAACGTTCATGGCTAAACTTTAGG-1                      NA        B Cells
combined_metadata$Broad_category <- factor(combined_metadata$Broad_category, 
                                             levels = c("B Cells", "T Cells", "NK Cells", "Monocytes and Macrophages", 
                                                        "Dendritic Cells", "Neutrophils", "Innate Lymphoid Cells", 
                                                        "Epithelial Cells", "Other", "Unclassified"))

combined_metadata$tissue <- factor(combined_metadata$tissue, levels = tissue_order)

ggplot(combined_metadata, aes(x = tissue, fill = Broad_category)) +
  geom_bar(position = "fill") +  
  scale_fill_manual(values = sampled_colors_1) + 
  labs(y = "Proportion", x = "Tissue") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  theme_minimal() + 
  theme(legend.title = element_blank())  

combined_metadata$cell_labels_v2 <- factor(combined_metadata$cell_labels_v2,
                                            levels = unique(combined_metadata$cell_labels_v2))  

ggplot(combined_metadata, aes(x = tissue, fill = cell_labels_v2)) +
  geom_bar(position = "fill") +  
  scale_fill_manual(values = color_palette) +  
  labs(y = "Proportion", x = "Tissue") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +  
  theme_minimal() +
  theme(legend.title = element_blank())  

set.seed(2024)
proportions <- combined_metadata %>%
  group_by(Broad_category, cell_labels_v2) %>%
  summarise(count = n()) %>%
  arrange(Broad_category, desc(count))
`summarise()` has grouped output by 'Broad_category'. You can override using
the `.groups` argument.
combined_metadata <- combined_metadata %>%
  mutate(cell_labels_v2 = factor(cell_labels_v2, 
                                 levels = proportions$cell_labels_v2[order(proportions$Broad_category, -proportions$count)]))
broad_colors <- c("#5c248b", "#ffec34", "#00960f", "#FBB4AE", "#BC80BD", "#f06ab9", "#85d519", "#758dc4" )
cell_contingency <- table(combined_metadata$Broad_category, combined_metadata$cell_labels_v2)

chordDiagram(cell_contingency, 
             transparency = 0.5, 
             annotationTrack = "grid", 
             preAllocateTracks = 1, 
             grid.col = c(broad_colors, color_palette))

circos.track(track.index = 1, panel.fun = function(x, y) {
  circos.text(CELL_META$xcenter, CELL_META$ylim[1], CELL_META$sector.index, 
              facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5))
}, bg.border = NA)

Updated UMAPs

for (tissue in tissue_list) {
  seu <- readRDS(tissue)
  tissue <- seu$tissue
  p4 <- DimPlot(seu, reduction = "umap.merged", raster = FALSE, repel = TRUE, label = TRUE, label.size = 3.5, pt.size = 0.2) +
    ggtitle(paste0(basename(tissue), ": UMAP (Final clusters)")) + 
    scale_color_manual(values = color_palette) + 
    NoLegend()
  
  print(p4)
}

Boxplot of cell populations

cell_type_proportions <- combined_metadata %>%
  group_by(tissue, sample_id, cell_labels_v2) %>%
  summarise(cell_count = n(), .groups = 'drop') %>%
  group_by(tissue, sample_id) %>%
  mutate(total_cells = sum(cell_count)) %>%
  mutate(proportion = cell_count / total_cells) %>%
  ungroup()

tissues <- unique(cell_type_proportions$tissue)

for (tissue in tissues) {
  tissue_data <- cell_type_proportions %>% filter(tissue == !!tissue)
  
  plot <- ggplot(tissue_data, aes(x = cell_labels_v2, y = proportion, fill = cell_labels_v2)) +
    geom_boxplot() +
    scale_fill_manual(values = color_palette) +
    labs(x = "Cell Type", y = "Proportion", title = paste("Median Proportion of Each Cell Type in", tissue)) +
    theme_minimal() +
    theme(
      axis.text.x = element_text(angle = 90, hjust = 1),
      legend.position = "none",
      plot.margin = margin(20, 20, 20, 20)
    )
  
  #ggsave(filename = paste0("boxplot_proportions_", tissue, ".pdf"), plot = plot, width = 12, height = 8, units = "in")
  
  print(plot)
}

for (tissue in tissue_list) {
  seu <- readRDS(tissue)
  
  metadata_df <- data.frame(
    sample = seu$sample_id, 
    donor = seu$donor_id,
    age_years = as.character(seu$age_years),  
    cell_type = seu$cell_labels_v2
  )
  
  metadata_df$age_years <- as.numeric(metadata_df$age_years)
  
  barplot_data <- metadata_df %>%
    group_by(donor, age_years, cell_type) %>%
    summarise(n_cells = n(), .groups = 'drop') %>%
    arrange(donor, age_years)
  
  p <- ggplot(barplot_data, aes(x = reorder(paste(donor, age_years, sep = ":"), age_years), 
                                y = n_cells, fill = cell_type)) +
    geom_bar(stat = "identity") +
    ggtitle(paste0("Cell Type Counts: ", basename(tissue))) +
    labs(x = "Donor:Age (Years)", y = "Count", fill = "Cell Type") +
    scale_fill_manual(values = color_palette) +
    theme_minimal() +
    theme(
      plot.title = element_text(size = 13, hjust = 0.5, face = "bold"),
      legend.position = "top",
      axis.text.x = element_text(angle = 45, hjust = 1)
    )
  
  print(p)
}

Session Info

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] circlize_0.4.16     BiocParallel_1.36.0 harmony_1.2.0      
 [4] Rcpp_1.0.12         patchwork_1.2.0     cowplot_1.1.3      
 [7] data.table_1.15.0   RColorBrewer_1.1-3  kableExtra_1.4.0   
[10] ggridges_0.5.6      ggforce_0.4.2       viridis_0.6.5      
[13] viridisLite_0.4.2   paletteer_1.6.0     clustree_0.5.1     
[16] ggraph_2.1.0        Seurat_5.0.1.9009   SeuratObject_5.0.1 
[19] sp_2.1-3            here_1.0.1          lubridate_1.9.3    
[22] forcats_1.0.0       stringr_1.5.1       dplyr_1.1.4        
[25] purrr_1.0.2         readr_2.1.5         tidyr_1.3.1        
[28] tibble_3.2.1        ggplot2_3.5.0       tidyverse_2.0.0    
[31] BiocStyle_2.30.0    workflowr_1.7.1    

loaded via a namespace (and not attached):
  [1] shape_1.4.6.1          rstudioapi_0.15.0      jsonlite_1.8.8        
  [4] magrittr_2.0.3         spatstat.utils_3.0-4   farver_2.1.1          
  [7] rmarkdown_2.25         GlobalOptions_0.1.2    fs_1.6.3              
 [10] vctrs_0.6.5            ROCR_1.0-11            spatstat.explore_3.2-6
 [13] htmltools_0.5.7        sass_0.4.8             sctransform_0.4.1     
 [16] parallelly_1.37.0      KernSmooth_2.23-22     bslib_0.6.1           
 [19] htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9            
 [22] plotly_4.10.4          zoo_1.8-12             cachem_1.0.8          
 [25] whisker_0.4.1          igraph_2.0.2           mime_0.12             
 [28] lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-5          
 [31] R6_2.5.1               fastmap_1.1.1          fitdistrplus_1.1-11   
 [34] future_1.33.1          shiny_1.8.0            digest_0.6.34         
 [37] colorspace_2.1-0       rematch2_2.1.2         ps_1.7.6              
 [40] rprojroot_2.0.4        tensor_1.5             RSpectra_0.16-1       
 [43] irlba_2.3.5.1          labeling_0.4.3         progressr_0.14.0      
 [46] fansi_1.0.6            spatstat.sparse_3.0-3  timechange_0.3.0      
 [49] polyclip_1.10-6        httr_1.4.7             abind_1.4-5           
 [52] compiler_4.3.2         withr_3.0.0            fastDummies_1.7.3     
 [55] highr_0.10             MASS_7.3-60.0.1        tools_4.3.2           
 [58] lmtest_0.9-40          httpuv_1.6.14          future.apply_1.11.1   
 [61] goftest_1.2-3          glue_1.7.0             callr_3.7.5           
 [64] nlme_3.1-164           promises_1.2.1         grid_4.3.2            
 [67] Rtsne_0.17             getPass_0.2-4          cluster_2.1.6         
 [70] reshape2_1.4.4         generics_0.1.3         gtable_0.3.4          
 [73] spatstat.data_3.0-4    tzdb_0.4.0             hms_1.1.3             
 [76] xml2_1.3.6             tidygraph_1.3.1        utf8_1.2.4            
 [79] spatstat.geom_3.2-8    RcppAnnoy_0.0.22       ggrepel_0.9.5         
 [82] RANN_2.6.1             pillar_1.9.0           spam_2.10-0           
 [85] RcppHNSW_0.6.0         later_1.3.2            splines_4.3.2         
 [88] tweenr_2.0.3           lattice_0.22-5         deldir_2.0-2          
 [91] survival_3.5-8         tidyselect_1.2.0       miniUI_0.1.1.1        
 [94] pbapply_1.7-2          knitr_1.45             git2r_0.33.0          
 [97] gridExtra_2.3          svglite_2.1.3          scattermore_1.2       
[100] xfun_0.42              graphlayouts_1.1.0     matrixStats_1.2.0     
[103] stringi_1.8.3          lazyeval_0.2.2         yaml_2.3.8            
[106] evaluate_0.23          codetools_0.2-19       BiocManager_1.30.22   
[109] cli_3.6.2              uwot_0.1.16            systemfonts_1.0.5     
[112] xtable_1.8-4           reticulate_1.35.0      munsell_0.5.0         
[115] processx_3.8.3         jquerylib_0.1.4        spatstat.random_3.2-2 
[118] globals_0.16.2         png_0.1-8              parallel_4.3.2        
[121] ellipsis_0.3.2         dotCall64_1.1-1        listenv_0.9.1         
[124] scales_1.3.0           leiden_0.4.3.1         rlang_1.1.3           

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] circlize_0.4.16     BiocParallel_1.36.0 harmony_1.2.0      
 [4] Rcpp_1.0.12         patchwork_1.2.0     cowplot_1.1.3      
 [7] data.table_1.15.0   RColorBrewer_1.1-3  kableExtra_1.4.0   
[10] ggridges_0.5.6      ggforce_0.4.2       viridis_0.6.5      
[13] viridisLite_0.4.2   paletteer_1.6.0     clustree_0.5.1     
[16] ggraph_2.1.0        Seurat_5.0.1.9009   SeuratObject_5.0.1 
[19] sp_2.1-3            here_1.0.1          lubridate_1.9.3    
[22] forcats_1.0.0       stringr_1.5.1       dplyr_1.1.4        
[25] purrr_1.0.2         readr_2.1.5         tidyr_1.3.1        
[28] tibble_3.2.1        ggplot2_3.5.0       tidyverse_2.0.0    
[31] BiocStyle_2.30.0    workflowr_1.7.1    

loaded via a namespace (and not attached):
  [1] shape_1.4.6.1          rstudioapi_0.15.0      jsonlite_1.8.8        
  [4] magrittr_2.0.3         spatstat.utils_3.0-4   farver_2.1.1          
  [7] rmarkdown_2.25         GlobalOptions_0.1.2    fs_1.6.3              
 [10] vctrs_0.6.5            ROCR_1.0-11            spatstat.explore_3.2-6
 [13] htmltools_0.5.7        sass_0.4.8             sctransform_0.4.1     
 [16] parallelly_1.37.0      KernSmooth_2.23-22     bslib_0.6.1           
 [19] htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9            
 [22] plotly_4.10.4          zoo_1.8-12             cachem_1.0.8          
 [25] whisker_0.4.1          igraph_2.0.2           mime_0.12             
 [28] lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-5          
 [31] R6_2.5.1               fastmap_1.1.1          fitdistrplus_1.1-11   
 [34] future_1.33.1          shiny_1.8.0            digest_0.6.34         
 [37] colorspace_2.1-0       rematch2_2.1.2         ps_1.7.6              
 [40] rprojroot_2.0.4        tensor_1.5             RSpectra_0.16-1       
 [43] irlba_2.3.5.1          labeling_0.4.3         progressr_0.14.0      
 [46] fansi_1.0.6            spatstat.sparse_3.0-3  timechange_0.3.0      
 [49] polyclip_1.10-6        httr_1.4.7             abind_1.4-5           
 [52] compiler_4.3.2         withr_3.0.0            fastDummies_1.7.3     
 [55] highr_0.10             MASS_7.3-60.0.1        tools_4.3.2           
 [58] lmtest_0.9-40          httpuv_1.6.14          future.apply_1.11.1   
 [61] goftest_1.2-3          glue_1.7.0             callr_3.7.5           
 [64] nlme_3.1-164           promises_1.2.1         grid_4.3.2            
 [67] Rtsne_0.17             getPass_0.2-4          cluster_2.1.6         
 [70] reshape2_1.4.4         generics_0.1.3         gtable_0.3.4          
 [73] spatstat.data_3.0-4    tzdb_0.4.0             hms_1.1.3             
 [76] xml2_1.3.6             tidygraph_1.3.1        utf8_1.2.4            
 [79] spatstat.geom_3.2-8    RcppAnnoy_0.0.22       ggrepel_0.9.5         
 [82] RANN_2.6.1             pillar_1.9.0           spam_2.10-0           
 [85] RcppHNSW_0.6.0         later_1.3.2            splines_4.3.2         
 [88] tweenr_2.0.3           lattice_0.22-5         deldir_2.0-2          
 [91] survival_3.5-8         tidyselect_1.2.0       miniUI_0.1.1.1        
 [94] pbapply_1.7-2          knitr_1.45             git2r_0.33.0          
 [97] gridExtra_2.3          svglite_2.1.3          scattermore_1.2       
[100] xfun_0.42              graphlayouts_1.1.0     matrixStats_1.2.0     
[103] stringi_1.8.3          lazyeval_0.2.2         yaml_2.3.8            
[106] evaluate_0.23          codetools_0.2-19       BiocManager_1.30.22   
[109] cli_3.6.2              uwot_0.1.16            systemfonts_1.0.5     
[112] xtable_1.8-4           reticulate_1.35.0      munsell_0.5.0         
[115] processx_3.8.3         jquerylib_0.1.4        spatstat.random_3.2-2 
[118] globals_0.16.2         png_0.1-8              parallel_4.3.2        
[121] ellipsis_0.3.2         dotCall64_1.1-1        listenv_0.9.1         
[124] scales_1.3.0           leiden_0.4.3.1         rlang_1.1.3