Ingest a second file#

import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"
💡 loaded instance: testuser1/test-flow (lamindb 0.54.1)
ln.track()
💡 notebook imports: anndata==0.9.2 lamindb==0.54.1 lnschema_bionty==0.31.2 pytometry==0.1.4 readfcs==1.1.6 scanpy==1.9.5
💡 Transform(id='SmQmhrhigFPLz8', name='Ingest a second file', short_name='facs1', version='0', type=notebook, updated_at=2023-09-23 14:27:21, created_by_id='DzTjkKse')
💡 Run(id='zD0n79CMqgtJa45Yx1NH', run_at=2023-09-23 14:27:21, transform_id='SmQmhrhigFPLz8', created_by_id='DzTjkKse')

Let us validate and register another .fcs file:

Access #

filepath = ln.dev.datasets.file_fcs()

adata = readfcs.read(filepath)
adata
AnnData object with n_obs × n_vars = 65016 × 16
    var: 'n', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
    uns: 'meta'

Transform: normalize #

import anndata as ad
import pytometry as pm
pm.pp.split_signal(adata, var_key="channel")
pm.tl.normalize_arcsinh(adata, cofactor=150)
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]
adata.to_df().describe()
KI67 CD3 CD28 CD45RO CD8 CD4 CD57 CD14 CCR5 CD19 CD27 CCR7 CD127
count 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000 64593.000000
mean -7.784467 -7.958064 -7.880424 -7.849991 -7.682381 -7.695841 -7.772347 -7.827088 -7.427381 -7.693235 -8.009255 -7.514956 -7.471545
std 30.911205 30.796326 30.847746 30.776819 30.846949 30.873545 30.907915 30.640249 30.767073 30.675623 30.902098 30.668348 30.830299
min -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761 -62.628761
25% -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892 -0.009892
50% -0.000321 -0.000322 -0.000322 -0.000322 -0.000321 -0.000322 -0.000321 -0.000322 -0.000321 -0.000322 -0.000322 -0.000321 -0.000321
75% 1.086298 1.045244 0.819897 1.050630 1.104099 0.987080 0.995414 1.041992 1.145463 0.932001 1.096484 1.150226 1.248759
max 84.386696 84.386627 84.385368 84.398567 84.405098 84.398537 84.402496 84.398567 84.337654 84.382713 84.402489 84.362930 84.374611

Validate cell markers #

Let’s see how many markers validate:

validated = lb.CellMarker.validate(adata.var.index)
7 terms (53.80%) are not validated for name: KI67, CD45RO, CD4, CD14, CCR5, CD19, CCR7

Let’s standardize and re-validate:

adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)
❗ found 1 synonym in Bionty: ['KI67']
   please add corresponding CellMarker records via `.from_values(['Ki67'])`
3 terms (23.10%) are not validated for name: Ki67, CD45RO, CCR5

Next, register non-validated markers from Bionty:

records = lb.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)

Now they pass validation:

validated = lb.CellMarker.validate(adata.var.index)
assert all(validated)

Register #

modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
markers = lb.CellMarker.lookup()
file = ln.File.from_anndata(
    adata,
    description="Flow cytometry file 2",
    field=lb.CellMarker.name,
    modality=modalities.protein,
)
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1230: ImplicitModificationWarning: Trying to modify attribute `.var` of view, initializing view as actual.
  df[key] = c
... storing '$PnR' as categorical
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1230: ImplicitModificationWarning: Trying to modify attribute `.var` of view, initializing view as actual.
  df[key] = c
... storing '$PnG' as categorical
3 terms (100.00%) are not validated for name: FSC-A, FSC-H, SSC-A
❗    no validated features, skip creating feature set
file.save()
file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)
file.features
Features:
  var: FeatureSet(id='qXUXpOspjIFUrCkvrwMZ', n=13, type='number', registry='bionty.CellMarker', hash='cInZdHy3fspNNLGysq01', updated_at=2023-09-23 14:27:26, modality_id='Vx6LQKum', created_by_id='DzTjkKse')
    'CD45RO', 'CD3', 'Cd14', 'CCR5', 'CD127', 'CD28', 'CD57', 'CD27', 'Ccr7', 'Cd19', ...
  external: FeatureSet(id='fz4nKTblqMuXbNYOSyuc', n=2, registry='core.Feature', hash='56DmDcmbv0Qwt6E6RoXs', updated_at=2023-09-23 14:27:26, modality_id='e15pjKn9', created_by_id='DzTjkKse')
    🔗 species (1, bionty.Species): 'human'
    🔗 assay (1, bionty.ExperimentalFactor): 'fluorescence-activated cell sorting'

View data flow:

file.view_flow()
https://d33wubrfki0l68.cloudfront.net/0ba22e9da350b926907b93a3e60eb8fe23a3e2e2/54f0b/_images/4e470921e94d4624e5247087494971eb62e4b502001e993c919e869a2787e2a6.svg

Inspect a PCA fo QC - this dataset looks much like noise:

import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)
https://d33wubrfki0l68.cloudfront.net/188bd2f3d5a02ddc4f1df232e615d74d7edc698b/75092/_images/5b28756b8f68327b19a0852549c06576dab4e38c816942363f6fae90b7564d2c.png