What I Learned From Re-Annotating the Skin Atlas
I spent most of a week in early April re-annotating all 16 myeloid clusters in a cross-disease skin single-cell atlas I have been working on in the Bryson Lab, after realizing that the cluster labels I had been using were scrambled relative to what the marker genes actually supported. Five of the 21 T/NK clusters had similar issues. The embarrassing part is that the atlas had already been shared with our external reviewer, Dr. Robert Modlin, and a set of findings framed around those labels had gone out in an earlier email.
What happened is straightforward in hindsight. At some point during the pipeline reruns, I had regenerated the cluster numbering but reused an older annotation dictionary, which silently assigned the old labels to the new cluster indices. The marker gene plots I was generating in parallel still looked internally consistent (the genes matched the clusters they were plotted on), and the UMAP looked reasonable, so I did not catch it until I went back to verify a specific finding about FOLR2+ resident macrophages in psoriasis. That cluster turned out to be mature dendritic cells. The “psoriasis enrichment” was real for some cluster, however not that one.
I bring this up because I think it is the most important kind of mistake to write down publicly. It is the kind of mistake that does not produce an error, does not crash a script, and does not show up in any automated check. It produces a perfectly coherent-looking result that is wrong at the top layer. I had been using the fact that “the markers line up with the annotations” as evidence the annotations were right, without recognizing that this was circular, because both were being derived from the same re-clustering in parallel.
The fix took about a week. I went back to the raw cluster IDs, regenerated the marker tables from scratch, annotated each cluster manually against a reference panel I curated from the literature (Villani et al. for DCs, Dutertre et al. for macrophages, and a few skin-specific papers for keratinocyte-interacting populations), and re-ran all the downstream enrichment tests. The headline findings changed. What I had been calling a psoriasis-specific FOLR2+ signal disappeared. A cross-disease CXCL13+ Tph expansion became clearer once the correct clusters were aligned, and an intermediate-monocyte signal specific to cancers (melanoma and SCC, not inflammatory diseases) became the strongest finding.
The part I am still thinking about is how to make this kind of mistake harder to make in the first place. I believe the right answer is a small validation script that ingests the cluster IDs and the annotation dictionary independently, re-derives the top markers, and checks each label against a fixed set of canonical markers for that cell type. If the canonical markers do not appear in the top 30 for the cluster the label is assigned to, the script fails loudly. I have started writing this as part of the pipeline and will make it a hard dependency before any result leaves my laptop. I wish I had done this three months ago.