Obtaining Data for Single Cells#

This section will serve as an introduction to the Allen Cell Types database. We’ll work with the AllenSDK to see what information we can gain about our cells.

First, let’s make sure you have an updated version of the Allen Institute Database installed. If you do not have the version below, this cell will install it for you. There is more information on installing the AllenSDK in the install guide.

# This will ensure that the AllenSDK is installed.
# If not, it will install it for you.
try:
    import allensdk
    if allensdk.__version__ == '2.11.2':
        print('allensdk already installed.')
    else: 
        print('incompatible version of allensdk installed')
        !pip install allensdk==2.11.2
except ImportError as e:
    !pip install allensdk==2.11.2
Collecting allensdk==2.11.2
  Using cached allensdk-2.11.2-py3-none-any.whl (1.7 MB)
Collecting psycopg2-binary<3.0.0,>=2.7
  Using cached psycopg2_binary-2.9.6-cp311-cp311-macosx_10_9_x86_64.whl (2.2 MB)
Collecting hdmf<2.5.0
  Using cached hdmf-2.4.0-py2.py3-none-any.whl (149 kB)
Collecting h5py<3.0.0,>=2.8
  Using cached h5py-2.10.0.tar.gz (301 kB)
  Preparing metadata (setup.py) ... ?25l-
 done
?25hRequirement already satisfied: matplotlib<4.0.0,>=1.4.3 in /Users/ashley/anaconda3/envs/jb/lib/python3.11/site-packages (from allensdk==2.11.2) (3.7.1)
Collecting numpy<1.19.0,>=1.15.4
  Using cached numpy-1.18.5.zip (5.4 MB)
  Installing build dependencies ... ?25l-
 \
 |
 /
 done
?25h  Getting requirements to build wheel ... ?25l-
 \ done
?25h  Preparing metadata (pyproject.toml) ... ?25l-
 \
 |
 /
 -
 \
 |
 /
 -
 \
 done
?25hCollecting pandas<=0.25.3,>=0.25.1
  Downloading pandas-0.25.3.tar.gz (12.6 MB)
?25l     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.6 MB ? eta -:--:--
^C

     ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/12.6 MB 29.8 MB/s eta 0:00:01
?25hERROR: Operation cancelled by user

Next, we’ll import the CellTypesCache module. This module provides tools to allow us to access information from the Allen Cell Types database. We’re passing it a manifest filename as well. CellTypesCache will store the data in the path specified by the manifest filename. You can look under cell_types in your directory, and take a look at the file.

If you’re curious, you can see the full source documentation for the core package on the Allen Brain Atlas website

# Import the "Cell Types Cache" from the AllenSDK core package
from allensdk.core.cell_types_cache import CellTypesCache

# Import CellTypesApi, which will allow us to query the database.
from allensdk.api.queries.cell_types_api import CellTypesApi

# Import packages 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib inline 
import warnings
warnings.filterwarnings('ignore')

# We'll then initialize the cache as 'ctc' (cell types cache)
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')

print('Packages were successfully imported.')
Packages were successfully imported.

Get Available Cells#

The get_cells() method downloads metadata for cells in the database. The database contains human cells and mouse cells. By default, get_cells() downloads metadata for all cells in the database. Alternatively, you can filter the database to only include cells collected from a certain species. Look through the documentation for the CellTypesCache for more information on the get_cells method.

# Download metadata for all cells in the database
all_cells = ctc.get_cells()

# Convert it to a dataframe with the cell id as the index
all_cells_df = pd.DataFrame(all_cells).set_index('id')

print('Number of cells:',len(all_cells_df))
all_cells_df.head()
Number of cells: 2333
reporter_status cell_soma_location species name structure_layer_name structure_area_id structure_area_abbrev transgenic_line dendrite_type apical reconstruction_type disease_state donor_id structure_hemisphere normalized_depth
id
525011903 None [273.0, 354.0, 216.0] Homo Sapiens H16.03.003.01.14.02 3 12113 FroL spiny intact None epilepsy 524848408 right NaN
565871768 positive [8966.56330957526, 1429.52689052075, 8567.3896... Mus musculus Oxtr-2A-Cre;Ai14-293837.03.01.02 5 385 VISp Oxtr-T2A-Cre aspiny NA None 563340157 left 0.582057
469801138 positive [7872.53138541818, 993.212032389272, 3127.1530... Mus musculus Pvalb-IRES-Cre;Ai14-170927.05.02.01 4 385 VISp Pvalb-IRES-Cre aspiny NA None 319194354 right 0.469578
528642047 None [69.0, 254.0, 96.0] Homo Sapiens H16.06.009.01.02.06.05 5 12141 MTG aspiny NA None epilepsy 528574320 left NaN
605889373 positive [9400.0, 1520.74232706376, 2188.13845194139] Mus musculus Vipr2-IRES2-Cre;Slc32a1-T2A-FlpO;Ai65-337419.0... 2/3 385 VISp Slc32a1-T2A-FlpO|Vipr2-IRES2-Cre aspiny NA None 605622695 right NaN

The Allen Institute for Brain Science uses transgenic mouse lines that have Cre-expressing cells to mark specific types of cells in the brain. Lets find out what cre-lines are availabe in our mouse data under the column transgenic_line.

We can take a look at how many different values are stored within a column using the .unique() method.

all_cells_df['transgenic_line'].unique()
array(['', 'Oxtr-T2A-Cre', 'Pvalb-IRES-Cre',
       'Slc32a1-T2A-FlpO|Vipr2-IRES2-Cre', 'Cux2-CreERT2',
       'Scnn1a-Tg3-Cre', 'Htr3a-Cre_NO152', 'Rorb-IRES2-Cre',
       'Ctgf-T2A-dgCre', 'Nkx2-1-CreERT2', 'Vip-IRES-Cre',
       'Chrna2-Cre_OE25', 'Nos1-CreERT2|Sst-IRES-FlpO', 'Scnn1a-Tg2-Cre',
       'Nos1-CreERT2', 'Ntsr1-Cre_GN220', 'Esr2-IRES2-Cre',
       'Tlx3-Cre_PL56', 'Vipr2-IRES2-Cre',
       'Chrna2-Cre_OE25|Pvalb-T2A-Dre', 'Nr5a1-Cre', 'Ndnf-IRES2-dgCre',
       'Sst-IRES-Cre', 'Gad2-IRES-Cre', 'Chat-IRES-Cre-neo',
       'Slc17a6-IRES-Cre', 'Sim1-Cre_KJ18', 'Glt25d2-Cre_NF107',
       'Rbp4-Cre_KL100', 'Penk-IRES2-Cre-neo',
       'Htr3a-Cre_NO152|Pvalb-T2A-Dre', 'Esr2-IRES2-Cre|PhiC31-neo',
       'Gng7-Cre_KH71', 'Pvalb-T2A-FlpO|Vipr2-IRES2-Cre',
       'Pvalb-T2A-CreERT2', 'Esr2-IRES2-Cre-neo|PhiC31-neo',
       'Esr2-IRES2-Cre-neo', 'Vipr2-IRES2-Cre-neo'], dtype=object)

For more information on the Allen Cell Types Cre lines, please visit the Allen Cell Types Cre Lines Glossary for a short description of the cortical expression patterns of the transgenic Cre lines available in the Allen Cell Types dataset.

Get Morphology data#

The dataframe above only contains metadeta about our cells and no information on the morphology or electrophysiology of our cells. In order to get information about the morphology of these cells, we need to use the get_morphology_features() method on our instance of the cell types cache. We will set the indices to be the specimen_id because these ids will align with those in all_cells_df.

# Downloads the morphology features and sets up the dataframe all in one line
morphology_df = pd.DataFrame(ctc.get_morphology_features()).set_index('specimen_id')
print('Length of dataframe:',len(morphology_df))
morphology_df.head()
Length of dataframe: 701
average_bifurcation_angle_local average_bifurcation_angle_remote average_contraction average_diameter average_fragmentation average_parent_daughter_ratio hausdorff_dimension id max_branch_order max_euclidean_distance ... overall_width scale_factor_x scale_factor_y scale_factor_z soma_surface superseded tags total_length total_surface total_volume
specimen_id
478107198 82.727781 NaN 0.864267 0.345092 20.723077 0.964510 NaN 491119743 6.0 99.779724 ... 136.267522 0.1144 0.1144 0.28 435.740270 False 3D Neuron Reconstruction morphology 1666.082926 1803.875644 167.343086
502367941 82.506680 NaN 0.903890 0.634047 105.277778 0.862183 NaN 546781359 3.0 432.383110 ... 370.170045 0.1144 0.1144 0.28 1446.587725 False 3D Neuron Reconstruction morphology 2277.259374 4543.139073 921.571895
515771244 77.536678 NaN 0.863104 0.417929 73.666667 0.926633 NaN 537042261 6.0 373.630444 ... 381.015114 0.1144 0.1144 0.28 287.118123 False 3D Neuron Reconstruction morphology 3589.339062 4704.910407 582.285423
561435279 76.583222 NaN 0.900537 0.400396 95.979167 0.942049 NaN 689123605 11.0 943.382549 ... 524.550156 0.1144 0.1144 0.28 180.994813 False 3D Neuron Reconstruction morphology 5416.228778 6814.933290 740.722806
591268268 72.019250 NaN 0.873518 0.227626 47.535714 1.000000 NaN 657879305 5.0 186.218009 ... 261.459057 0.1144 0.1144 0.28 55.055236 False 3D Neuron Reconstruction morphology 1659.465869 1185.773462 69.144146

5 rows × 29 columns

Now we have two dataframes, one with the metadata for our cells, indexed by id, and another with the morphology data for all cells, also indexed by id. Usefully, these ids are unique to each cell, meaning we can match them across dataframes.

We’ll use join to unite our dataframes. Since for the moment we’re only interested in cells that have morphology data, we’ll join the dataframes “on the right,” effectively dropping all of the cells that are not in the morphology dataframe.

Note: Above, in get the get_cells() method, you can also restrict to experiments with morphology data by usnig the require_morphology = True argument.

# Combine our metadata with our morphology data
meta_morph_df = all_cells_df.join(morphology_df,how='right')
print('Number of cells:',len(meta_morph_df))
meta_morph_df.head()
Number of cells: 701
reporter_status cell_soma_location species name structure_layer_name structure_area_id structure_area_abbrev transgenic_line dendrite_type apical ... overall_width scale_factor_x scale_factor_y scale_factor_z soma_surface superseded tags total_length total_surface total_volume
313861608 positive [9349.44051500564, 1314.78991021449, 3245.6530... Mus musculus Pvalb-IRES-Cre;Ai14-165874.04.02.01 5 385 VISp Pvalb-IRES-Cre aspiny NA ... 358.994232 0.1144 0.1144 0.28 514.500853 False 3D Neuron Reconstruction morphology 3538.876780 6041.095769 945.093080
313862022 positive [9489.24257509068, 1423.65239028148, 2785.4430... Mus musculus Scnn1a-Tg2-Cre;Ai14-167081.03.02.01 4 385 VISp Scnn1a-Tg2-Cre spiny intact ... 287.122628 0.1144 0.1144 0.28 268.777679 False 3D Neuron Reconstruction morphology 3498.090031 3523.606841 306.290931
313862167 positive [9279.53948496313, 1831.03502009105, 2151.2443... Mus musculus Sst-IRES-Cre;Ai14-167638.05.02.01 4 409 VISl Sst-IRES-Cre aspiny NA ... 327.150383 0.1144 0.1144 0.28 361.118559 False 3D Neuron Reconstruction morphology 2644.878298 3883.698604 512.836253
313862167 positive [9279.53948496313, 1831.03502009105, 2151.2443... Mus musculus Sst-IRES-Cre;Ai14-167638.05.02.01 4 409 VISl Sst-IRES-Cre aspiny NA ... 327.150383 0.1144 0.1144 0.28 361.118559 True 3D Neuron Reconstruction morphology 2644.878347 3883.699514 512.835043
313862306 positive [8976.63502144556, 1878.14058922573, 8881.4968... Mus musculus Sst-IRES-Cre;Ai14-167636.04.01.01 6a 385 VISp Sst-IRES-Cre aspiny NA ... 242.781213 0.1144 0.1144 0.28 94.306278 False 3D Neuron Reconstruction morphology 2502.160541 1994.902243 141.878464

5 rows × 44 columns

The Cell Types Database also contains 3D reconstructions of neuronal morphologies. You can create these single cell reconstructions by executing the get_reconstruction() method on your instance of the cells type cache. To do so, you must specify what cell you want to reconstruct by inputing a specimen_id. This method returns a class instance with methods for accessing morphology compartments.

# Get the cell_id for the first cell in our dataframe
cell_id = meta_morph_df.index[0]

# Call for recontruction of desired cell
single_cell_morphology = ctc.get_reconstruction(specimen_id = cell_id)

# Get summary of cell reconstruction 
single_cell_morphology.soma
{'id': 0,
 'type': 1,
 'x': 396.0425,
 'y': 368.9697,
 'z': 26.6608,
 'radius': 6.4254,
 'parent': -1,
 'tree_id': 0,
 'children': [1, 487, 982, 1101, 1664, 1872, 2379, 3077]}

We now have a dictionary of our cell reconstruction. Note that the type field refers to the type of neuronal compartment. The values can be 1 for the soma, 2 for the axon, 3 for dendrites, and 4 for apical dendrites (if present). The x, y, and z, represent the spatial location of the cell soma.

Morphologies also come with marker files, which contain points of interest in the reconstruction. The marker file contains locations where dendrites have been truncated due to slicing and when axons were not reconstructed. The name field indicates the type of marker (10 for dendrite truncation, 20 for no reconstruction).

# Download and store markers for 3D reconstruction 
markers = ctc.get_reconstruction_markers(cell_id) 
markers
[{'x': 417.1024, 'y': 252.13760000000002, 'z': 18.48, 'name': 10},
 {'x': 411.4968, 'y': 325.0104, 'z': 18.48, 'name': 10},
 {'x': 429.4576, 'y': 364.1352, 'z': 19.32, 'name': 10},
 {'x': 378.3208, 'y': 363.3344, 'z': 17.36, 'name': 10},
 {'x': 393.8792, 'y': 404.5184, 'z': 17.08, 'name': 10},
 {'x': 386.9008, 'y': 309.2232, 'z': 19.88, 'name': 10},
 {'x': 366.652, 'y': 503.5888, 'z': 24.08, 'name': 10},
 {'x': 546.2244999999999, 'y': 189.7084, 'z': 129.2262, 'name': 10},
 {'x': 529.3048, 'y': 413.3364, 'z': 99.47279999999999, 'name': 10},
 {'x': 442.8424, 'y': 473.616, 'z': 101.36, 'name': 10},
 {'x': 374.7744, 'y': 475.904, 'z': 86.24, 'name': 10},
 {'x': 541.3716999999999, 'y': 216.2698, 'z': 85.21379999999999, 'name': 10},
 {'x': 431.7868, 'y': 205.0448, 'z': 89.7428, 'name': 10},
 {'x': 614.0992, 'y': 300.9864, 'z': 83.44, 'name': 10},
 {'x': 383.09360000000004, 'y': 371.9785, 'z': 16.929599999999997, 'name': 20},
 {'x': 396.1077, 'y': 368.7706, 'z': 17.968999999999998, 'name': 30}]

We can use this data to draw lines between each node and all its children to get a drawing of the cell. We’ll reconstruct an image from the front and side view of our cell.

Note: Since we’re reconstructing what is likely to be a very complicated neuron, this might take a minute to plot.

# Import necessary toolboxes
from allensdk.core.swc import Marker

# Set up our plot
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True, figsize=(10,10))
ax[0].set_aspect('equal')
ax[1].set_aspect('equal')

# Make a line drawing of x-y and y-z views
for n in single_cell_morphology.compartment_list:
    for c in single_cell_morphology.children_of(n):
        ax[0].plot([n['x'], c['x']], [n['y'], c['y']], color='black')
        ax[1].plot([n['z'], c['z']], [n['y'], c['y']], color='black')

# Plot cut dendrite markers in red
dm = [ m for m in markers if m['name'] == Marker.CUT_DENDRITE ]
ax[0].scatter([m['x'] for m in dm], [m['y'] for m in dm], color='red')
ax[1].scatter([m['z'] for m in dm], [m['y'] for m in dm], color='red')

# Plot no reconstruction markers in blue
nm = [ m for m in markers if m['name'] == Marker.NO_RECONSTRUCTION ]
ax[0].scatter([m['x'] for m in nm], [m['y'] for m in nm], color='blue')
ax[1].scatter([m['z'] for m in nm], [m['y'] for m in nm], color='blue')
ax[0].set_ylabel('y')
ax[0].set_xlabel('x')
ax[1].set_xlabel('z')
ax[0].set_title('Front View')
ax[1].set_title('Side View')

# Show the plot
plt.show()
../_images/93ffe6053bf7d20fcaace78481c73f4ec5ab72e03da3f59e776362eeb989dee8.png

Get Electrophysiology Data#

Now that we know what our cell looks like, let’s take a look at its action potentials. The get_ephys_data() method can download electrophysiology traces for a single cell in the database. This method returns a class instance with helper methods for retrieving stimulus and response traces out of an NWB file. If we take a look at specimen_ephys_data we can see that it is an NWBDataSet object. In order to use get_ephys_data(), you must specify the id of the cell specimen whose electrophysiology you would like to download.

Below we go over methods that can be used to access the electrophysiology data for single cells, the source documentation for all the methods we cover can be found on the Allen Brain Atlas website. The get_experiment_sweep_numbers() method returns all of the sweep numbers for experiments in the file. Each sweep contains metadata and electrophysiology data.

# Get electrophysiological traces of our cell
specimen_ephys_data = ctc.get_ephys_data(specimen_id = cell_id)

# Retrieve sweep numbers for cell
sweep_numbers = specimen_ephys_data.get_experiment_sweep_numbers()

print(type(specimen_ephys_data))
print(sweep_numbers)
<class 'allensdk.core.nwb_data_set.NwbDataSet'>
[10, 100, 101, 102, 103, 104, 105, 108, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 4, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 5, 50, 51, 52, 53, 54, 55, 57, 58, 59, 6, 60, 61, 62, 63, 64, 65, 66, 68, 69, 7, 71, 72, 73, 74, 75, 76, 77, 78, 79, 8, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 9, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

Now that we have sweep numbers to choose from, we can take a look at a sweep’s metadata by calling get_sweep_metadata(). This returns a dictionary containing information such as stimulus paramaters and recording quality.

# Select a sweep number 
sweep_number = sweep_numbers[-1]

# Retrieve metadata for selected sweep
specimen_metadata = specimen_ephys_data.get_sweep_metadata(sweep_number)
print(specimen_metadata)
{'aibs_stimulus_amplitude_pa': 420.0, 'aibs_stimulus_name': 'Short Square - Triple', 'gain': 0.01, 'initial_access_resistance': 15.208168, 'seal': 1.4909363}

The get_sweep() method returns a dictionary containing the stimulus, response, index_range, and sampling rate for a particular sweep.

sweep_data = specimen_ephys_data.get_sweep(sweep_number)
print(sweep_data)
{'stimulus': array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), 'response': array([-0.07440625, -0.0744375 , -0.074375  , ..., -0.07428125,
       -0.074375  , -0.07428125], dtype=float32), 'stimulus_unit': 'Amps', 'index_range': (150000, 618600), 'sampling_rate': 200000.0}

Now that you’ve pulled down some data, chosen a cell, and chosen a sweep number, let’s plot that data. We can look closer at the action potential by plotting the raw recording. Our sweep_data variable has all the data we need to plot our sweep; the stimulus current injected into our cell, the cell’s response, and sampling rate of the sweep.

# Get the stimulus trace (in amps) and convert to pA
stim_current = sweep_data['stimulus'] * 1e12

# Get the voltage trace (in volts) and convert to mV
response_voltage = sweep_data['response'] * 1e3

# Get the sampling rate and can create a time axis for our data
sampling_rate = sweep_data['sampling_rate'] # in Hz
timestamps = (np.arange(0, len(response_voltage)) * (1.0 / sampling_rate))

fig, ax = plt.subplots(2, 1, sharex=True, figsize=(9,9))

ax[0].set_title('Cell: '+str(cell_id)+', Sweep: '+str(sweep_number))
ax[0].plot(timestamps, response_voltage)
ax[0].set_ylabel('Cell Response (mV)')
ax[1].plot(timestamps, stim_current)
ax[1].set_xlabel('Time (sec)')
ax[1].set_ylabel('Input Current (nA)')
ax[1].set_xlim([2,2.2])

plt.show()
../_images/45cb735e472974ae66bfb3f8b9119c09a2153b007a3cf2331675a5b41cf89de7.png

That’s how you can dig into the morphology and raw electrophysiology traces for single cells. In the next notebook, we’ll take a look at some of the features of these action potentials.