We have released a new public version 1412, as part of our quarterly release schedule. See details at Release Manifests: 1412.
Tutorials remain pinned to v1300 as the latest major version.
The Connectome Annotation Versioning Engine (CAVE) is a suite of tools developed at the Allen Institute and Seung Lab to manage large connectomics data.
To initialize a caveclient, we give it a datastack, which is a name that defines a particular combination of imagery, segmentation, and annotation database. For the MICrONs public data, we use the datastack name minnie65_public.
import numpy as npimport pandas as pdfrom caveclient import CAVEclientdatastack_name ='minnie65_public'client = CAVEclient(datastack_name)# Show the description of the datastackclient.info.get_datastack_info()['description']
'This is the publicly released version of the minnie65 volume and segmentation. '
Materialization versions
Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The Materialization client allows one to interact with the materialized annotation tables that were posted to the annotation service. These are called queries to the dataset, and available from client.materialize. For more, see the CAVEclient Documentation.
Periodic updates are made to the public datastack, which will include updates to the available tables. Some cells will have different pt_root_id because they have undergone proofreading.
Tip
For analysis consistency, is worth checking the version of the data you are using, and consider specifying the version with client.version = your_version
# see the available materialization versionsclient.materialize.get_versions()
[1300, 1078, 117, 661, 343, 1181, 795, 943]
And these are their associated timestamps (all timestamps are in UTC):
for version in client.materialize.get_versions():print(f"Version {version}: {client.materialize.get_timestamp(version)}")
Version 1300: 2025-01-13 10:10:01.286229+00:00
Version 1078: 2024-06-05 10:10:01.203215+00:00
Version 117: 2021-06-11 08:10:00.215114+00:00
Version 661: 2023-04-06 20:17:09.199182+00:00
Version 343: 2022-02-24 08:10:00.184668+00:00
Version 1181: 2024-09-16 10:10:01.121167+00:00
Version 795: 2023-08-23 08:10:01.404268+00:00
Version 943: 2024-01-22 08:10:01.497934+00:00
# set materialization version, for consistencyclient.version =1300# current public as of 1/13/2025
Querying Proofread neurons
Proofread neurons
Proofreading is necessary to obtain accurate reconstructions of a cell. In the MICrONS dataset, the general rule is that dendrites onto cells with a single cell body are sufficiently proofread to trust synaptic connections onto a cell. Axons on the other hand require so much proofreading that only ~1800 cells have axons such that their outputs should be used for analysis.
Table name: proofreading_status_and_strategy
The table proofreading_status_and_strategy describes the status of cells that have undergone manual proofreading.
Because of the inherent difference in the difficulty and time required for different kinds of proofreading, we describe the status of axons and dendrites separately.
Each compartment status may be either:
FALSE: indicates no comprehensive proofreading has been performed, or is not applicable.
TRUE: indicates that false merges have been comprehensively removed, and the compartment is at least ‘clean’. Consult the strategy column if completeness of the compartment is relevant to your research.
Bound spatial point columns associated with the centroid of the cell nucleus
valid_id
The root id of the neuron when it the proofreading assessment was made. NOTE: if this does not match the pt_root_id then the cell has undergone further changes. This is usually and improvement in proofreading, but proceed with caution.
status_dendrite
The status of the dendrite proofreading. May be TRUE or FALSE
status_axon
The status of the axon proofreading. May be TRUE or FALSE
strategy_dendrite
The strategy employed to proofread the dendrite. See strategy table below for details
strategy_axon
The strategy employed to proofread the axon. See strategy table below for details
The specific strategies are as follows (and will update over time):
Proofreading Strategies
Proofreading Strategy Table
Strategy
Description
none
No cleaning, and no extension. Indicates an entry in proofreading_status that is FALSE for that compartment
dendrite_clean
The dendrite had incorrectly-merged axon and dendritic segments comprehensively removed, meaning the input synapses are accurate. The dendrite may be incorrectly truncated by segmentation error. Not all dendrite tips have been checked for extension. No comprehensive attempt was made to re-attach spine heads.
dendrite_extended
The dendrite had incorrectly-merged axon and dendritic segments comprehensively removed, meaning the input synapses are accurate. Every tip was identified, manually inspected, and extended if possible. No comprehensive attempt was made to re-attach spine heads.
axon_column_truncated
AThe axon was extended within the V1 cortical column, with a preference for local connections. In some cases the axon was cut at the column boundary and/or the layer boundary, especially the boundary between layers 2/3 and layer 4. Output synapses represent a sampling of potential partners
axon_interareal
The axon was extended with a preference for branches that projected to other brain areas. Some axon branches were fully extended, but local connections may be incomplete. Output synapses represent a sampling of potential partners.
axon_partially_extended
The axon was extended outward from the soma, following each branch to its termination. Output synapses represent a sampling of potential partners.
axon_fully_extended
Axon was extended outward from the soma, following each branch to its termination. After initial extension, every endpoint was identified, manually inspected, and extended again if possible. Output synapses represent a largely complete sampling of partners.
This table, proofreading_status_and_strategy, supercedes proofreading_status_public_release.
# Standard queryclient.materialize.query_table('proofreading_status_and_strategy')# Content-aware queryclient.materialize.tables.proofreading_status_and_strategy(status_axon='t').query()
Here we query and return the table as of version 1300.
A more unified filter interface is available through a “table manager” interface.
Rather than passing a table name to the query_table function, client.materialize.tables has a subproperty for each table in the database that can be used to filter that table.
where {table_name} is the name of the table you want to filter, {filter options} is a collection of arguments for filtering the query, and {format and timestamp options} are those parameters controlling the format and timestamp of the query.
Caution
Use of this functionality will show a brief warning that the interface is experimental. This is because the interface is still being developed and may change in the near future in response to user feedback.
With this, we can easily query all proofread cells with proofread axons:
For analysis, often you are interested in neurons that are at the intersection of two or more groups. For example: proofread cells that are also layer 2/3 pyramidal cells. The general workflow for this type of analysis is to:
Query from one table, for example the proofreading_status_and_strategy table
Query from another table, for example the aibs_metamodel_celltypes_v661
Merge the two tables on the shared index, in this case pt_root_id
Note the ‘select_columns’ argument differs between the two tables. Thay is because the second table, aibs_metamodel_celltypes_v661 is itself a reference on nucleus_detection_v0. That means the id column returned here is the same as the nucleus id of the cell. This is handy for referencing the same cell across materialization versions as the nucleus id does not change, whereas the pt_root_id will change with proofreading.
Now we can merge the two tables together on the shared index!
But it is worth checking if there are duplicates in either of the tables. How you handle duplicates will depend on your question, and the table you are using. Here we might see duplicates from multi-soma merges in the cell type table
For analytical simplicity, we will drop any multi-soma objects. We will also rename the id column for clarity
# Drop duplicate pt_root_id and rename the nucleus_idcell_type_df = (cell_type_df .drop_duplicates('pt_root_id', keep=False) .rename(columns={'id': 'nucleus_id'}) )cell_type_df.head()
pt_root_id
nucleus_id
classification_system
cell_type
0
864691136274724621
336365
excitatory_neuron
5P-IT
1
864691135489403194
110648
excitatory_neuron
23P
2
864691136147292311
112071
excitatory_neuron
23P
3
864691136050858227
197927
nonneuron
oligo
4
864691135809440972
198087
nonneuron
astrocyte
Now we can merge the two tables with pandas.merge, on index pt_root_id. We will keep the inner join of the two tables: cells that 1) are proofread, and 2) have a cell type
And we have the list of all proofread cells, by their cell type!
We can do this same kind of query more simply by: querying the second table by BOTH the root ids of interest and the cell type of interest. If we wanted only the proofread 23P cells, we could do:
# Query the proofread 23P cells, and merge the proofreading statusproof_23p_df = (client.materialize.tables.aibs_metamodel_celltypes_v661(pt_root_id=proof_df.pt_root_id, cell_type='23P').query( select_columns = {'nucleus_detection_v0': ['pt_root_id', 'id'],'aibs_metamodel_celltypes_v661': ['classification_system','cell_type'], }, ) .rename(columns={'id': 'nucleus_id'}) .merge(proof_df, on='pt_root_id', how='inner') )proof_23p_df.head()