Working with NWB in Python#

On the previous page, we demonstrated how to obtain a dataset with DANDI. Now that you have a dataset downloaded, let’s take a closer look at what it contains.

Working with our NWB file in Python requires yet another package, PyNWB. This is a package specifically designed to work with NWB files. As you might expect, this won’t be the last time we see this package. Below, we’ll use the NWBHDF5IO class from this package, which will allow us to easily read NWB files. For more information about PyNWB, please read their documentation.

Note: Please ensure that you have run the environment [setup] before running this notebook.

Step 1. Setup#

# Import modules from the PyNWB package
from pynwb import NWBHDF5IO

Step 2. Read the NWB file#

We can access the data in our NWB file in two steps:

  1. Assign our file as an NWBHDF5IO object

  2. Read our file

The first step is done using the NWBHDF5IO class to create our NWBHDF5IO object and map our file to HDF5 format. Once we have done this, we can use the read() method to return our nwb file.

For more information on how to read NWB files, please visit the Reading an NWB file section from the NWB Basics Tutorial.

Note: Each dataset may contain multiple NWB files for different subjects and sessions for a given experiment. Make sure you specify the exact file path to the single NWB file you wish to read. Below, we’ll give the filename for one .nwb file within the folder that you downloaded in the last chapter.

# set the filename
filename = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb'

# assign file as an NWBHDF5IO object
io = NWBHDF5IO(filename, 'r')

# read the file
nwb_file = io.read()

print('NWB file found and read.')
print(type(nwb_file))
NWB file found and read.
<class 'pynwb.file.NWBFile'>

Step 3. Access Information within the NWB File Hierarchy#

One of the first steps when working with a new dataset is to figure out what is in the dataset, and where. Each NWB file is composed of various groups, which either contain attributes of our file (metadata) or the data itself.

Metadata is a common term to describe all of the information about an experiment. This could include everything from when the experiment was conducted, the ID of the subject (animal, human, goblin, etc.), the equipment details, etc. In essence, the metadata provides the context of the experiment. This is one of the first things you should review when you’re encountering a new dataset.

Here is the structure of a typical NWB file:

NWB_file_structure.png

In order to see which groups are in our file, we can use the fields attribute to return a dictionary containing the the Groups of our NWB file. The dictionary keys are the various groups within the file which we will use to access the data we’re ultimately interested in.

Need a refresher on dictionaries? Check out the Python Review page.

# Get the Groups for the nwb file 
nwb_fields = nwb_file.fields
print(nwb_fields.keys())
dict_keys(['session_description', 'identifier', 'session_start_time', 'timestamps_reference_time', 'file_create_date', 'experimenter', 'related_publications', 'acquisition', 'keywords', 'epoch_tags', 'electrodes', 'electrode_groups', 'devices', 'intervals', 'subject', 'trials', 'units', 'experiment_description', 'institution'])

Experiment Metadata#

Let’s first pull out some metadata for the experiment we downloaded.

If you wish to access the related publications of the experimental data that you just downloaded, you can do so by accessing the related_publications attribute of your NWB file object. Plug in the “doi:” address that prints below into a browser window to check out the original publication describing this data.

# Print the related publication
nwb_file.related_publications
('doi:10.1038/s41586-018-0642-9',)

Each NWB file will also have information on where the experiment was conducted, which lab conducted the experiment, as well as a description of the experiment. This information can be accessed using institution, lab, and experiment_description, attributes on our nwb_file, respectively.

# Get metadata from NWB file 
print('The experiment within this NWB file was conducted at',nwb_file.institution,'.'\
      ,nwb_file.experiment_description)
The experiment within this NWB file was conducted at Janelia Research Campus . Extracellular electrophysiology recordings performed on mouse anterior lateral motor cortex (ALM) in delay response task. Neural activity from two neuron populations, pyramidal track upper and lower, were characterized, in relation to movement execution.

As you might have noticed at this point, we can access datasets from each group in our nwb_file with the following syntax: nwb_file.GROUPNAME, just as we would typically access an attribute of object in Python. Below we will demonstrate some of the most useful groups within an NWB object.

Acquisition#

The acquisition group contains datasets of acquisition data, mainly TimeSeries objects belonging to this NWBFile.

nwb_file.acquisition
{'lick_times': lick_times pynwb.behavior.BehavioralEvents at 0x4850678480
 Fields:
   time_series: {
     lick_left_times <class 'pynwb.base.TimeSeries'>,
     lick_right_times <class 'pynwb.base.TimeSeries'>
   }}

In this file, the acquisition group contains one dataset, lick_times. This dataset has one field, time_series, which contains two time series objects, lick_left_times and lick_right_times. To access the actual data arrays of these objects we must first subset our dataset of interest from the group. We can then use timestamps[:] to return a list of timestamps for when the animal licked.

# select our dataset of interest 
dataset = 'lick_times'
field = 'lick_right_times'

lick_r_dataset = nwb_file.acquisition[dataset][field]

# return the first 10 values in data array 
lick_r_data_array = lick_r_dataset.timestamps[:10][:10]

print(lick_r_data_array)
[336.727461 336.898631 337.144963 337.223463 338.218792 338.579457
 338.927119 338.959289 339.451954 339.587953]

Intervals#

The intervals group contains all time interval tables from the experiment – things like, did the animal respond on the behavioral trial? Usefully, we can take intervals and convert it to a tidy dataframe using to_dataframe().

# Select the group of interest from the nwb file 
intervals = nwb_file.intervals

# Pull out trials and assign it as a dataframe
interval_trials_df = intervals['trials'].to_dataframe()
interval_trials_df.head()
start_time stop_time type response stim_present is_good cue_start_time pole_in_time pole_out_time
id
1 323.171 NaN lick right early lick 0 1 7.49698 3.93949 5.12748
2 329.570 NaN lick right early lick 0 1 11.81280 4.94181 5.94198
3 335.971 NaN lick right incorrect 0 1 6.62964 4.12915 5.12931
4 342.371 NaN lick right incorrect 0 1 6.41964 3.91915 4.91931
5 348.289 NaN lick right incorrect 0 1 12.66990 10.16930 11.16950

In case you’re wondering what these columns are, the description attribute provides a short description on each column of the dataframe.

# return the description of each col in our dataframe
for col in interval_trials_df:
    print(col,':',intervals['trials'][col].description)
start_time : Start time of epoch, in seconds
stop_time : Stop time of epoch, in seconds
type : 
response : 
stim_present : is this a stim or no-stim trial
is_good : good/bad status of trial (bad trials are not analyzed)
cue_start_time : onset of response period
pole_in_time : onset of sample period
pole_out_time : onset of the delay period

Units#

But wait, where’s all of the neural data? The units group in our NWB file contains the processed signals from our individual neurons (units), including information about the spike sorting quality as well as the spike times – when each of these cells fired an action potential. Much like the intervals group, units can also be assigned to a dataframe.

Why “units”? In extracellular electrophysiology, we aren’t recording directly from neurons. Instead, we’re recording from the space around many neurons. As a result, researchers need to take the recorded voltage streams and determine which spikes in voltage originated in different neurons. This process is called spike sorting. Although we can do spike sorting fairly automatically and be fairly confident that we’ve correctly identified different neurons, we can’t know with complete confidence. So, researchers tend to call “neurons” in extracellular recordings “units,” reflecting that we think it’s a separate neuron, but don’t know for sure. You’ll also see “multi-unit activity” (MUA) in some papers, in which case the researchers were unable to separate single neurons.

units = nwb_file.units
units_df = units.to_dataframe()
units_df.head()
depth quality cell_type spike_times electrodes
id
1 665.0 Poor unidentified [933.873288, 948.2774710000008, 950.5357110000... x y z imp ...
2 665.0 Fair unidentified [329.95417899999956, 330.01945899999953, 330.0... x y z imp ...
3 715.0 unidentified [329.94165899999956, 329.9998989999996, 330.00... x y z imp ...
4 715.0 [] unidentified [329.93145899999956, 330.7492189999995, 330.77... x y z imp ...
5 715.0 Fair unidentified [331.09961899999956, 332.14505899999955, 333.3... x y z imp ...

If we’d like to know where these spikes are coming from, we can look at the electrodes attribute. The electrodes group contains metadata about the electrodes used in the experiment, including the location of the electrodes, the type of filtering done on that channel, and which electrode group the electrode belongs to.

# electrode positions 
electrodes = nwb_file.electrodes
electrodes_df = electrodes.to_dataframe()
electrodes_df.head()
x y z imp location filtering group group_name
id
1 0.0 0.0 0.0 -1.0 brain_region: ALM; brain_subregion: N/A; corti... Bandpass filtered 300-6K Hz H-129: 64 pynwb.ecephys.ElectrodeGroup at 0x48... H-129: 64
2 0.0 0.0 0.0 -1.0 brain_region: ALM; brain_subregion: N/A; corti... Bandpass filtered 300-6K Hz H-129: 64 pynwb.ecephys.ElectrodeGroup at 0x48... H-129: 64
3 0.0 0.0 0.0 -1.0 brain_region: ALM; brain_subregion: N/A; corti... Bandpass filtered 300-6K Hz H-129: 64 pynwb.ecephys.ElectrodeGroup at 0x48... H-129: 64
4 0.0 0.0 0.0 -1.0 brain_region: ALM; brain_subregion: N/A; corti... Bandpass filtered 300-6K Hz H-129: 64 pynwb.ecephys.ElectrodeGroup at 0x48... H-129: 64
5 0.0 0.0 0.0 -1.0 brain_region: ALM; brain_subregion: N/A; corti... Bandpass filtered 300-6K Hz H-129: 64 pynwb.ecephys.ElectrodeGroup at 0x48... H-129: 64

Wondering what something in this table is? We can once again dig out the descriptions:

# return the description of each col in our dataframe
for col in electrodes_df:
    print(col,':',nwb_file.electrodes[col].description)
x : the x coordinate of the channel location
y : the y coordinate of the channel location
z : the z coordinate of the channel location
imp : the impedance of the channel
location : the location of channel within the subject e.g. brain region
filtering : description of hardware filtering
group : a reference to the ElectrodeGroup this electrode is a part of
group_name : the name of the ElectrodeGroup this electrode is a part of

Now that we have an idea of what this file contains, we can finally take a look at some of the data! We’ll do that in the next section. 💃


Additional Resources#

For a detailed explanation of all groups contained within an NWB File object please visit the pynwb.file.NWBFile section of the PyNWB documentation.