CAVE Quickstart

The Connectome Annotation Versioning Engine (CAVE) is a suite of tools developed at the Allen Institute and Seung Lab to manage large connectomics data.

Initial Setup

Before using any programmatic access to the data, you first need to set up your CAVEclient token.

CAVEclient

Most programmatic access to the CAVE services occurs through CAVEclient, a Python client to access various types of data from the online services.

Full documentation for CAVEclient is available here.

To initialize a caveclient, we give it a datastack, which is a name that defines a particular combination of imagery, segmentation, and annotation database. For the MICrONs public data, we use the datastack name minnie65_public.

from caveclient import CAVEclient
datastack_name = 'minnie65_public'
client = CAVEclient(datastack_name)

# Show the description of the datastack
client.info.get_datastack_info()['description']
'This is the publicly released version of the minnie65 volume and segmentation. '

Materialization versions

Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The materialization service provides annotation queries to the dataset. It is available under client.materialize.

Periodic updates are made to the public datastack, which will include updates to the available tables. Some cells will have different pt_root_id because they have undergone proofreading.

It is worth checking the version of the data you are using, and specifying the version for analysis consistency.

# see the available materialization versions
client.materialize.get_versions()
[1300, 117, 943, 1507, 1621, 1718]

And these are their associated timestamps (all timestamps are in UTC):

for version in client.materialize.get_versions():
    print(f"Version {version}: {client.materialize.get_timestamp(version)}")
Version 1300: 2025-01-13 10:10:01.286229+00:00
Version 117: 2021-06-11 08:10:00.215114+00:00
Version 943: 2024-01-22 08:10:01.497934+00:00
Version 1507: 2025-07-31 08:10:01.117494+00:00
Version 1621: 2025-11-25 08:10:01.094430+00:00
Version 1718: 2026-03-07 08:10:01.190228+00:00
# set materialization version, for consistency
materialization = 1718 # current public as of 3/7/2026
client.version = materialization

CAVEclient Basics

The most frequent use of the CAVEclient is to query the database for annotations like synapses. All database functions are under the client.materialize property. To see what tables are available, use the get_tables function:

client.materialize.get_tables()
['synapses_pni_2',
 'nucleus_detection_v0',
 'vortex_manual_nodes_of_ranvier',
 'bodor_pt_target_proofread',
 'baylor_gnn_cell_type_fine_model_v2',
 'nucleus_alternative_points',
 'nucleus_functional_area_assignment',
 'coregistration_auto_phase3_fwd_apl_vess_combined_v2',
 'aibs_metamodel_mtypes_v661_v2_corrections',
 'vortex_thalamic_proofreading_status',
 'allen_column_mtypes_v2',
 'proofreading_status_and_strategy',
 'bodor_pt_cells',
 'aibs_metamodel_mtypes_v661_v2',
 'aibs_metamodel_celltypes_v661_corrections',
 'vortex_microglia_proofreading_status',
 'allen_v1_column_types_slanted_ref',
 'multi_input_spine_predictions_ssa',
 'aibs_column_nonneuronal_ref',
 'nucleus_ref_neuron_svm',
 'synapse_target_structure',
 'myelin_auto_tags_2points',
 'apl_functional_coreg_vess_fwd',
 'vortex_axon_backtrace_column',
 'cell_type_multifeature_combo',
 'vortex_compartment_targets',
 'baylor_log_reg_cell_type_coarse_v1',
 'vortex_synapse_reattachment',
 'coregistration_auto_phase3_fwd_v2',
 'synapse_target_predictions_ssa_v2',
 'gamlin_2023_mcs',
 'l5et_column',
 'pt_synapse_targets',
 'vortex_peptidergic_proofreading_status',
 'coregistration_manual_v4',
 'cg_cell_type_calls',
 'digital_twin_properties_bcm_coreg_v4',
 'synapse_spine_mapping_v2',
 'vortex_astrocyte_proofreading_status',
 'digital_twin_properties_bcm_coreg_auto_phase3_fwd_v2',
 'digital_twin_properties_bcm_coreg_apl_vess_fwd',
 'gamlin_2023_mcs_met_types',
 'vortex_manual_myelination_v0',
 'synapse_target_predictions_ssa',
 'aibs_metamodel_celltypes_v661']

For each table, you can see the metadata describing that table. For example, let’s look at the nucleus_detection_v0 table:

client.materialize.get_table_metadata('nucleus_detection_v0')
{'valid': True,
 'created': '2020-11-02T18:56:35.530100',
 'schema': 'nucleus_detection',
 'aligned_volume': 'minnie65_phase3',
 'table_name': 'nucleus_detection_v0',
 'id': 89398,
 'schema_type': 'nucleus_detection',
 'user_id': '121',
 'description': 'A table of nuclei detections from a nucleus detection model developed by Shang Mu, Leila Elabbady, Gayathri Mahalingam and Forrest Collman. Pt is the centroid of the nucleus detection. id corresponds to the flat_segmentation_source segmentID. Only included nucleus detections of volume>25 um^3, below which detections are false positives, though some false positives above that threshold remain. ',
 'notice_text': None,
 'reference_table': None,
 'flat_segmentation_source': 'precomputed://https://bossdb-open-data.s3.amazonaws.com/iarpa_microns/minnie/minnie65/nuclei',
 'write_permission': 'PRIVATE',
 'read_permission': 'PUBLIC',
 'last_modified': '2022-10-25T19:24:28.559914',
 'segmentation_source': '',
 'pcg_table_name': 'minnie3_v1',
 'last_updated': '2026-05-12T00:00:00.157782',
 'voxel_resolution': [4.0, 4.0, 40.0]}

You get a dictionary of values. Two fields are particularly important: the description, which offers a text description of the contents of the table and voxel_resolution which defines how the coordinates in the table are defined, in nm/voxel.

Annotation tables

You can also find a semantic description of the most commonly used tables at the Annotation Tables page.

Querying Tables

To get the contents of a table, use the query_table function. This will return the whole contents of a table without any filtering, up to for a maximum limit of 200,000 rows. The table is returned as a Pandas DataFrame and you can immediately use standard Pandas function on it.

cell_type_df = client.materialize.query_table('nucleus_detection_v0')
cell_type_df.head()
id created superceded_id valid volume pt_supervoxel_id pt_root_id pt_position bb_start_position bb_end_position
0 730537 2020-09-28 22:40:41.780734+00:00 <NA> True 32.307938 0 0 [381312, 273984, 19993] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
1 373879 2020-09-28 22:40:41.781788+00:00 <NA> True 229.045044 96218056992431305 864691136090135607 [228816, 239776, 19593] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
2 601340 2020-09-28 22:40:41.782714+00:00 <NA> True 426.138 0 0 [340000, 279152, 20946] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3 201858 2020-09-28 22:40:41.783784+00:00 <NA> True 93.753838 84955554103121097 864691135373893678 [146848, 213600, 26267] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
4 600774 2020-09-28 22:40:41.785273+00:00 <NA> True 135.189789 0 0 [339120, 276112, 19442] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
Caution

While most tables are small enough to be returned in full, the synapse table has hundreds of millions of rows and is too large to download this way

Tables have a collection of columns, some of which specify point in space (columns ending in _position), some a root id (ending in _root_id), and others that contain other information about the object at that point. Before describing some of the most important tables in the database, it’s useful to know about a few advanced options that apply when querying any table.

  • desired_resolution : This parameter allows you to convert the columns specifying spatial points to different resolutions. Many tables are stored at a resolution of 4x4x40 nm/voxel, for example, but you can convert to nanometers by setting desired_resolution=[1,1,1].
  • split_positions : This parameter allows you to split the columns specifying spatial points into separate columns for each dimension. The new column names will be the original column name with _x, _y, and _z appended.
  • select_columns : This parameter allows you to get only a subset of columns from the table. Once you know exactly what you want, this can save you some cleanup.
  • limit : This parameter allows you to limit the number of rows returned. If you are just testing out a query or trying to inspect the kind of data within a table, you can set this to a small number to make sure it works before downloading the whole table. Note that this will show a warning so that you don’t accidentally limit your query when you don’t mean to.

For example, using all of these together:

cell_type_df = client.materialize.query_table('nucleus_detection_v0', split_positions=True, desired_resolution=[1,1,1], select_columns=['pt_position', 'pt_root_id'], limit=10)
cell_type_df
pt_position_x pt_position_y pt_position_z pt_supervoxel_id pt_root_id
0 241856 374464 838720 0 0
1 227200 389120 797160 0 0
2 230144 422336 795320 0 0
3 239488 386432 794120 0 0
4 239744 423488 803120 72978435697419638 864691136050815731
5 245888 384512 800120 0 0
6 249792 391680 807080 0 0
7 243328 403008 794280 0 0
8 247872 386816 805320 0 0
9 260352 416640 802360 73752285724957558 864691135013273238

Filtering Queries

Filtering tables so that you only get data about certain rows back is a very common operation. While there are filtering options in the query_table function (see documentation for more details), a more unified filter interface is available through a “table manager” interface.

Rather than passing a table name to the query_table function, client.materialize.tables has a subproperty for each table in the database that can be used to filter that table.

The general pattern for usage is

client.materialize.tables.{table_name}({filter options}).query({format and timestamp options})

where {table_name} is the name of the table you want to filter, {filter options} is a collection of arguments for filtering the query, and {format and timestamp options} are those parameters controlling the format and timestamp of the query.

For example, let’s look at the table aibs_metamodel_celltypes_v661, which has cell type predictions across the dataset. We can get the whole table as a DataFrame:

cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query()
cell_type_df.head()
id created valid volume pt_supervoxel_id pt_root_id id_ref created_ref valid_ref target_id classification_system cell_type pt_position bb_start_position bb_end_position
0 336365 2020-09-28 22:42:48.966292+00:00 True 272.48819 93606511657924288 864691136274724621 36916 2023-12-19 22:47:18.659864+00:00 True 336365 excitatory_neuron 5P-IT [209760, 180832, 27076] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
1 110648 2020-09-28 22:45:09.650639+00:00 True 328.533447 79385153184885329 864691135489403194 1070 2023-12-19 22:38:00.472115+00:00 True 110648 excitatory_neuron 23P [106448, 129632, 25410] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
2 112071 2020-09-28 22:43:34.088785+00:00 True 272.929413 79035988248401958 864691136147292311 1099 2023-12-19 22:38:00.898837+00:00 True 112071 excitatory_neuron 23P [103696, 149472, 15583] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3 197927 2020-09-28 22:43:10.652649+00:00 True 91.308853 84529699506051734 864691135655940290 13259 2023-12-19 22:41:14.417986+00:00 True 197927 nonneuron oligo [143600, 186192, 26471] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
4 198087 2020-09-28 22:41:36.677186+00:00 True 161.74498 83756261929388963 864691135809440972 13271 2023-12-19 22:41:14.685474+00:00 True 198087 nonneuron astrocyte [137952, 190944, 27361] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]

and we can add similar formatting options as in the last section to the query function:

cell_type_df = client.materialize.tables.aibs_metamodel_celltypes_v661().query(split_positions=True, desired_resolution=[1,1,1], select_columns=['pt_position', 'pt_root_id', 'cell_type'], limit=10)
cell_type_df
cell_type pt_position_x pt_position_y pt_position_z pt_supervoxel_id pt_root_id
0 23P 257600 487936 802760 73613884698831796 864691135724233643
1 23P 260992 493568 801560 73754828345558830 864691136436395166
2 NGC 256256 466432 831040 73613197571380105 864691135462260637
3 23P 255744 480640 833200 73543309863605007 864691136723556861
4 23P 262144 505856 824880 73755240729611855 864691135776658528
5 23P 257536 521728 804440 73615052929975022 864691135941166708
6 23P 251840 552896 832320 73404977556816286 864691135545065768
7 23P 251136 546048 821320 73404771398156190 864691135479369926
8 23P 256000 626368 814000 73548188879300688 864691135697633557
9 astrocyte 324096 417920 658880 75933716324660175 864691135937358133

However, now we can also filter the table to get only cells that are predicted to have cell type "BC" (for “basket cell”).

my_cell_type = "BC"
client.materialize.tables.aibs_metamodel_celltypes_v661(cell_type=my_cell_type).query()
id created valid volume pt_supervoxel_id pt_root_id id_ref created_ref valid_ref target_id classification_system cell_type pt_position bb_start_position bb_end_position
0 369908 2020-09-28 22:40:41.814964+00:00 True 332.862762 96002690286851358 864691136276011533 43009 2023-12-19 22:48:53.577191+00:00 True 369908 inhibitory_neuron BC [227104, 207840, 20841] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
1 193846 2020-09-28 22:40:41.897904+00:00 True 306.148956 82838443188669165 864691135578780933 12051 2023-12-19 22:40:57.133228+00:00 True 193846 inhibitory_neuron BC [131568, 168496, 16452] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
2 615735 2020-09-28 22:40:41.957345+00:00 True 314.539551 112181247505371364 864691135183493378 83044 2023-12-19 22:58:50.269173+00:00 True 615735 inhibitory_neuron BC [344880, 161104, 17084] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3 586907 2020-09-28 22:40:42.170393+00:00 True 377.92514 111408977891427617 864691136116402340 78306 2023-12-19 22:57:37.668262+00:00 True 586907 inhibitory_neuron BC [339488, 174320, 15957] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
4 587191 2020-09-28 22:40:42.173423+00:00 True 221.326218 109931303318731804 864691135884430704 78369 2023-12-19 22:57:38.571850+00:00 True 587191 inhibitory_neuron BC [328384, 174816, 18530] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3360 434713 2020-09-28 22:45:25.293700+00:00 True 496.550049 100367614546691106 864691135811413196 54887 2023-12-19 22:51:51.332495+00:00 True 434713 inhibitory_neuron BC [258672, 223072, 24681] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3361 170777 2020-09-28 22:45:25.310708+00:00 True 499.103668 81230957054577082 864691135065994564 8968 2023-12-19 22:40:09.246333+00:00 True 170777 inhibitory_neuron BC [119600, 250560, 15373] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3362 208056 2020-09-28 22:45:25.401800+00:00 True 521.621643 84540007091735344 864691135801456226 15548 2023-12-19 22:41:48.382554+00:00 True 208056 inhibitory_neuron BC [143472, 262944, 23693] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3363 438586 2020-09-28 22:45:25.430745+00:00 True 529.501404 99807894274485381 864691135212348352 55791 2023-12-19 22:52:02.582669+00:00 True 438586 inhibitory_neuron BC [254912, 247440, 23680] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]
3364 419363 2020-09-28 22:45:25.436862+00:00 True 530.6427 99716496901116512 864691135294355638 50504 2023-12-19 22:50:48.576826+00:00 True 419363 inhibitory_neuron BC [254416, 90336, 20469] [<NA>, <NA>, <NA>] [<NA>, <NA>, <NA>]

3365 rows × 15 columns

or maybe we just want the cell types for a particular collection of root ids:

my_root_ids = [864691135771677771, 864691135560505569, 864691136723556861]
client.materialize.tables.aibs_metamodel_celltypes_v661(pt_root_id=my_root_ids).query()
id created valid volume pt_supervoxel_id pt_root_id id_ref created_ref valid_ref target_id classification_system cell_type pt_position bb_start_position bb_end_position
0 19116 2020-09-28 22:41:51.767906+00:00 t 301.426115 74737997899501359 864691135771677771 11282 2023-12-19 22:40:43.249642+00:00 t 19116 excitatory_neuron 23P [72576, 108656, 20291] [nan, nan, nan] [nan, nan, nan]
1 21783 2020-09-28 22:41:59.966574+00:00 t 263.637074 75795590176519004 864691135560505569 15681 2023-12-19 22:41:50.365399+00:00 t 21783 excitatory_neuron 23P [80128, 124000, 16563] [nan, nan, nan] [nan, nan, nan]
2 4074 2020-09-28 22:42:41.341179+00:00 t 313.678234 73543309863605007 864691136723556861 50080 2023-12-19 22:50:42.474168+00:00 t 4074 excitatory_neuron 23P [63936, 120160, 20830] [nan, nan, nan] [nan, nan, nan]

You can get a list of all parameters than be used for querying with the standard IPython/Jupyter docstring functionality, e.g. client.materialize.tables.aibs_metamodel_celltypes_v661.

Caution

Use of this functionality will show a brief warning that the interface is experimental. This is because the interface is still being developed and may change in the near future in response to user feedback.

Querying Proofread neurons

Proofread neurons

Proofreading is necessary to obtain accurate reconstructions of a cell. In the MICrONS dataset, the general rule is that dendrites onto cells with a single cell body are sufficiently proofread to trust synaptic connections onto a cell. Axons on the other hand require so much proofread that only ~1,000 cells have axons that were proofread to various degrees such that their outputs can be used for analysis.

The table proofreading_status_and_strategy contains proofreading information about ~1,300 neurons. This website provides the most detailed overview. In brief, axons annotated with any strategy_axon were cleaned of false mergers but not all were fully extended. The most important distinction is axons annotated with axon_column_truncated were only proofread within a certain volume wheras others were proofread without such bias.

proof_all_df = client.materialize.query_table("proofreading_status_and_strategy", desired_resolution=[1, 1, 1], split_positions=True)
proof_all_df["strategy_axon"].value_counts()
strategy_axon
axon_partially_extended    979
axon_column_truncated      233
none                       185
axon_interareal            144
axon_fully_extended         80
Name: count, dtype: int64

We can filter our query to only return rows that match a condition by adding a filter to our query:

proof_df = client.materialize.query_table("proofreading_status_and_strategy", filter_in_dict={"strategy_axon": ["axon_partially_extended", "axon_fully_extended", "axon_interareal", "axon_column_truncated"]}, desired_resolution=[1, 1, 1], split_positions=True)
proof_df["strategy_axon"].value_counts()
strategy_axon
axon_column_truncated      598
axon_partially_extended    341
axon_interareal            146
axon_fully_extended         77
Name: count, dtype: int64
Back to top