CAVE Query: Proofread Cells

The Connectome Annotation Versioning Engine (CAVE) is a suite of tools developed at the Allen Institute and Seung Lab to manage large connectomics data.

Initial Setup

Before using any programmatic access to the data, you first need to set up your CAVEclient token.

CAVEclient

Most programmatic access to the CAVE services occurs through CAVEclient, a Python client to access various types of data from the online services.

Full documentation for CAVEclient is available here.

To initialize a caveclient, we give it a datastack, which is a name that defines a particular combination of imagery, segmentation, and annotation database. For the MICrONs public data, we use the datastack name minnie65_public.

from caveclient import CAVEclient
datastack_name = 'minnie65_public'
client = CAVEclient(datastack_name)

# Show the description of the datastack
client.info.get_datastack_info()['description']
'This is the publicly released version of the minnie65 volume and segmentation. '

Materialization versions

Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The materialization service provides annotation queries to the dataset. It is available under client.materialize.

Periodic updates are made to the public datastack, which will include updates to the available tables. Some cells will have different pt_root_id because they have undergone proofreading.

It is worth checking the version of the data you are using, and specifying the version for analysis consistency.

# see the available materialization versions
client.materialize.get_versions()
[1300, 1078, 117, 661, 343, 1181, 795, 943]

And these are their associated timestamps (all timestamps are in UTC):

for version in client.materialize.get_versions():
    print(f"Version {version}: {client.materialize.get_timestamp(version)}")
Version 1300: 2025-01-13 10:10:01.286229+00:00
Version 1078: 2024-06-05 10:10:01.203215+00:00
Version 117: 2021-06-11 08:10:00.215114+00:00
Version 661: 2023-04-06 20:17:09.199182+00:00
Version 343: 2022-02-24 08:10:00.184668+00:00
Version 1181: 2024-09-16 10:10:01.121167+00:00
Version 795: 2023-08-23 08:10:01.404268+00:00
Version 943: 2024-01-22 08:10:01.497934+00:00
# set materialization version, for consistency
materialization = 1300 # current public as of 1/13/2025
client.version = materialization

Querying Proofread neurons

Proofread neurons

Proofreading is necessary to obtain accurate reconstructions of a cell. In the MICrONS dataset, the general rule is that dendrites onto cells with a single cell body are sufficiently proofread to trust synaptic connections onto a cell. Axons on the other hand require so much proofread that only ~1,000 cells have axons that were proofread to various degrees such that their outputs can be used for analysis.

The table proofreading_status_and_strategy contains proofreading information about ~1,300 neurons. This website provides the most detailed overview. In brief, axons annotated with any strategy_axon were cleaned of false mergers but not all were fully extended. The most important distinction is axons annotated with axon_column_truncated were only proofread within a certain volume wheras others were proofread without such bias.

proof_all_df = client.materialize.query_table("proofreading_status_and_strategy", desired_resolution=[1, 1, 1], split_positions=True)
proof_all_df["strategy_axon"].value_counts()
strategy_axon
axon_partially_extended    1459
none                        149
axon_interareal             130
axon_fully_extended         127
Name: count, dtype: int64

We can filter our query to only return rows that match a condition by adding a filter to our query:

proof_df = client.materialize.query_table("proofreading_status_and_strategy", filter_in_dict={"strategy_axon": ["axon_partially_extended", "axon_fully_extended", "axon_interareal", "axon_column_truncated"]}, desired_resolution=[1, 1, 1], split_positions=True)
proof_df["strategy_axon"].value_counts()
strategy_axon
axon_partially_extended    1459
axon_interareal             130
axon_fully_extended         127
Name: count, dtype: int64
Back to top