Working with NWB in Python#

On the previous page, we demonstrated how to obtain a dataset with DANDI. Now that you have a dataset downloaded, let’s take a closer look at what it contains.

Working with our NWB file in Python requires PyNWB, a package specifically designed to work with NWB files.

Below, we’ll use the NWBHDF5IO class from this package, which will allow us to easily read NWB files.

Note: Before running this notebook, please ensure that you have 1) set up your coding environment (How to Use this Book) and 2) completed the previous section to obtain the dataset we’ll be interacting with below.

Step 1. Setup#

# Import modules from the PyNWB package
from pynwb import NWBHDF5IO

Step 2. Read the NWB file#

We can access the data in our NWB file in two steps:

  1. Assign our file as an NWBHDF5IO object: We will use the NWBHDF5IO class to create our NWBHDF5IO object and map our file to HDF5 format.

  2. Read our file using the read() method.

For more information on how to read NWB files, please visit the Reading an NWB file section from the NWB Basics Tutorial.

Note: Each dataset may contain multiple NWB files for different subjects and sessions for a given experiment. Make sure you specify the exact file path to the single NWB file you wish to read. Below, we’ll give the filename for one .nwb file within the folder that you downloaded in the last chapter.

# set the filename
filename = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb'

# assign file as an NWBHDF5IO object
io = NWBHDF5IO(filename, 'r')

# read the file
nwb_file = io.read()

nwb_file
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 5
      2 filename = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb'
      4 # assign file as an NWBHDF5IO object
----> 5 io = NWBHDF5IO(filename, 'r')
      7 # read the file
      8 nwb_file = io.read()

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/pynwb/__init__.py:300, in NWBHDF5IO.__init__(self, **kwargs)
    298 if load_namespaces:
    299     tm = get_type_map()
--> 300     super().load_namespaces(tm, path, file=file_obj, driver=driver, aws_region=aws_region)
    301     manager = BuildManager(tm)
    303     # XXX: Leaving this here in case we want to revert to this strategy for
    304     #      loading cached namespaces
    305     # ns_catalog = NamespaceCatalog(NWBGroupSpec, NWBDatasetSpec, NWBNamespace)
   (...)
    308     # tm.copy_mappers(get_type_map())
    309 else:

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    666 def func_call(*args, **kwargs):
    667     pargs = _check_args(args, kwargs)
--> 668     return func(args[0], **pargs)

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/backends/hdf5/h5tools.py:185, in HDF5IO.load_namespaces(cls, **kwargs)
    174 """Load cached namespaces from a file.
    175 
    176 If `file` is not supplied, then an :py:class:`h5py.File` object will be opened for the given `path`, the
   (...)
    180 :raises ValueError: if both `path` and `file` are supplied but `path` is not the same as the path of `file`.
    181 """
    182 namespace_catalog, path, namespaces, file_obj, driver, aws_region = popargs(
    183     'namespace_catalog', 'path', 'namespaces', 'file', 'driver', 'aws_region', kwargs)
--> 185 open_file_obj = cls.__resolve_file_obj(path, file_obj, driver, aws_region=aws_region)
    186 if file_obj is None:  # need to close the file object that we just opened
    187     with open_file_obj:

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/backends/hdf5/h5tools.py:158, in HDF5IO.__resolve_file_obj(cls, path, file_obj, driver, aws_region)
    156         if aws_region is not None:
    157             file_kwargs.update(aws_region=bytes(aws_region, "ascii"))
--> 158     file_obj = File(path, 'r', **file_kwargs)
    159 return file_obj

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/h5py/_hl/files.py:567, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    558     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    559                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    560                      alignment_threshold=alignment_threshold,
    561                      alignment_interval=alignment_interval,
    562                      meta_block_size=meta_block_size,
    563                      **kwds)
    564     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    565                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    566                      fs_page_size=fs_page_size)
--> 567     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    569 if isinstance(libver, tuple):
    570     self._libver = libver

File ~/anaconda3/envs/jb/lib/python3.11/site-packages/h5py/_hl/files.py:231, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    229     if swmr and swmr_support:
    230         flags |= h5f.ACC_SWMR_READ
--> 231     fid = h5f.open(name, flags, fapl=fapl)
    232 elif mode == 'r+':
    233     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:106, in h5py.h5f.open()

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Task: Look through the file above by clicking on the sideways black triangles to drop down different levels of the file structure. What's there that might be interesting to analyze?

Step 3. Access Information within the NWB File Hierarchy#

One of the first steps when working with a new dataset is to figure out what is in the dataset, and where. Each NWB file is composed of various groups, which either contain attributes of our file (metadata) or the data itself.

Metadata is a common term to describe all of the information about an experiment. This could include everything from when the experiment was conducted, the ID of the subject (animal, human, goblin, etc.), the equipment details, etc. In essence, the metadata provides the context of the experiment. This is one of the first things you should review when you’re encountering a new dataset.

Here is the structure of a typical NWB file:

NWB_file_structure.png

In order to see which groups are in our file, we can use the fields attribute to return a dictionary containing the Groups of our NWB file. The dictionary keys are the various groups within the file which we will use to access the data we’re ultimately interested in.

Need a refresher on dictionaries? Consider working through the free Codecademy Python 3 lesson, or check the other resources on the Data Science in Python page.

# Get the Groups for the nwb file 
nwb_fields = nwb_file.fields
print(nwb_fields.keys())

Experiment Metadata#

Let’s first pull out some metadata for the experiment we downloaded.

If you wish to access the related publications of the experimental data that you just downloaded, you can do so by accessing the related_publications attribute of your NWB file object. Plug in the “doi:” address that prints below into a browser window to check out the original publication describing this data.

# Print the related publication
nwb_file.related_publications

Each NWB file will also have information on where the experiment was conducted, which lab conducted the experiment, as well as a description of the experiment. This information can be accessed using institution, lab, and experiment_description, attributes on our nwb_file, respectively.

# Get metadata from NWB file 
print('The experiment within this NWB file was conducted at',nwb_file.institution,'.'\
      ,nwb_file.experiment_description)

As you might have noticed at this point, we can access datasets from each group in our nwb_file with the following syntax: nwb_file.GROUPNAME, just as we would typically access an attribute of object in Python. Below we will demonstrate some of the most useful groups within an NWB object.

Acquisition#

The acquisition group contains datasets of acquisition data, mainly TimeSeries objects belonging to this NWBFile.

nwb_file.acquisition

In this file, the acquisition group contains one dataset, lick_times. This dataset has one field, time_series, which contains two time series objects, lick_left_times and lick_right_times. To access the actual data arrays of these objects we must first subset our dataset of interest from the group. We can then use timestamps[:] to return a list of timestamps for when the animal licked.

# select our dataset of interest 
dataset = 'lick_times'
field = 'lick_right_times'

lick_r_dataset = nwb_file.acquisition[dataset][field]

# return the first 10 values in data array 
lick_r_data_array = lick_r_dataset.timestamps[:10][:10]

print(lick_r_data_array)

Intervals#

The intervals group contains all time interval tables from the experiment – things like, did the animal respond on the behavioral trial? Usefully, we can take intervals and convert it to a tidy dataframe using to_dataframe().

# Select the group of interest from the nwb file 
intervals = nwb_file.intervals

# Pull out trials and assign it as a dataframe
interval_trials_df = intervals['trials'].to_dataframe()
interval_trials_df.head()

In case you’re wondering what these columns are, the description attribute provides a short description on each column of the dataframe.

# return the description of each col in our dataframe
for col in interval_trials_df:
    print(col,':',intervals['trials'][col].description)

Units#

But wait, where’s all of the neural data? The units group in our NWB file contains the processed signals from our individual neurons (units), including information about the spike sorting quality as well as the spike times – when each of these cells fired an action potential. Much like the intervals group, units can also be assigned to a dataframe.

Why "units"? In extracellular electrophysiology, we aren't recording *directly* from neurons. Instead, we're recording from the space around many neurons. As a result, researchers need to take the recorded voltage streams and determine which spikes in voltage originated in different neurons. This process is called spike sorting (discussed in detail in a future lesson!). Although we can do spike sorting fairly automatically and be fairly confident that we've correctly identified different neurons, we can't know *with complete confidence*. So, researchers tend to call "neurons" in extracellular recordings "units," reflecting that we *think* it's a separate neuron, but don't know for sure. You'll also see "multi-unit activity" (MUA) in some papers, in which case the researchers were unable to separate single neurons.
units = nwb_file.units
units_df = units.to_dataframe()
units_df.head()

If we’d like to know where these spikes are coming from, we can look at the electrodes attribute. The electrodes group contains metadata about the electrodes used in the experiment, including the location of the electrodes, the type of filtering done on that channel, and which electrode group the electrode belongs to.

# electrode positions 
electrodes = nwb_file.electrodes
electrodes_df = electrodes.to_dataframe()
electrodes_df.head()

Wondering what something in this table is? We can once again dig out the descriptions:

Not sure what’s happening below? Consider working through the Codecademy Python 3 course for a refresher on for loops.

# return the description of each col in our dataframe
for col in electrodes_df:
    print(col,':',nwb_file.electrodes[col].description)

Now that we have an idea of what this file contains, we can finally take a look at some of the data! We’ll do that in the next section. 💃


Additional Resources#

  • For a detailed explanation of all groups contained within an NWB File object please visit the pynwb.file.NWBFile section of the PyNWB documentation.

  • The OpenScope DataBook also contains explanations of what is contained within NWB files.

  • Accessing metadata for different kinds of NWB files can be tricky. Here are some useful helper scripts from the OpenScope DataBook.