Working with NWB in Python#
On the previous page, we demonstrated how to obtain a dataset with DANDI. Now that you have a dataset downloaded, let’s take a closer look at what it contains.
Working with our NWB file in Python requires PyNWB, a package specifically designed to work with NWB files.
Below, we’ll use the NWBHDF5IO
class from this package, which will allow us to easily read NWB files.
Note: Before running this notebook, please ensure that you have 1) set up your coding environment (How to Use this Book) and 2) completed the previous section to obtain the dataset we’ll be interacting with below.
Step 1. Setup#
# Import modules from the PyNWB package
from pynwb import NWBHDF5IO
Step 2. Read the NWB file#
We can access the data in our NWB file in two steps:
Assign our file as an NWBHDF5IO object: We will use the
NWBHDF5IO
class to create ourNWBHDF5IO
object and map our file to HDF5 format.Read our file using the
read()
method.
For more information on how to read NWB files, please visit the Reading an NWB file section from the NWB Basics Tutorial.
Note: Each dataset may contain multiple NWB files for different subjects and sessions for a given experiment. Make sure you specify the exact file path to the single NWB file you wish to read. Below, we’ll give the filename for one .nwb file within the folder that you downloaded in the last chapter.
# set the filename
filename = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb'
# assign file as an NWBHDF5IO object
io = NWBHDF5IO(filename, 'r')
# read the file
nwb_file = io.read()
nwb_file
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[2], line 5
2 filename = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb'
4 # assign file as an NWBHDF5IO object
----> 5 io = NWBHDF5IO(filename, 'r')
7 # read the file
8 nwb_file = io.read()
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/pynwb/__init__.py:300, in NWBHDF5IO.__init__(self, **kwargs)
298 if load_namespaces:
299 tm = get_type_map()
--> 300 super().load_namespaces(tm, path, file=file_obj, driver=driver, aws_region=aws_region)
301 manager = BuildManager(tm)
303 # XXX: Leaving this here in case we want to revert to this strategy for
304 # loading cached namespaces
305 # ns_catalog = NamespaceCatalog(NWBGroupSpec, NWBDatasetSpec, NWBNamespace)
(...)
308 # tm.copy_mappers(get_type_map())
309 else:
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/utils.py:668, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
666 def func_call(*args, **kwargs):
667 pargs = _check_args(args, kwargs)
--> 668 return func(args[0], **pargs)
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/backends/hdf5/h5tools.py:185, in HDF5IO.load_namespaces(cls, **kwargs)
174 """Load cached namespaces from a file.
175
176 If `file` is not supplied, then an :py:class:`h5py.File` object will be opened for the given `path`, the
(...)
180 :raises ValueError: if both `path` and `file` are supplied but `path` is not the same as the path of `file`.
181 """
182 namespace_catalog, path, namespaces, file_obj, driver, aws_region = popargs(
183 'namespace_catalog', 'path', 'namespaces', 'file', 'driver', 'aws_region', kwargs)
--> 185 open_file_obj = cls.__resolve_file_obj(path, file_obj, driver, aws_region=aws_region)
186 if file_obj is None: # need to close the file object that we just opened
187 with open_file_obj:
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/hdmf/backends/hdf5/h5tools.py:158, in HDF5IO.__resolve_file_obj(cls, path, file_obj, driver, aws_region)
156 if aws_region is not None:
157 file_kwargs.update(aws_region=bytes(aws_region, "ascii"))
--> 158 file_obj = File(path, 'r', **file_kwargs)
159 return file_obj
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/h5py/_hl/files.py:567, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
558 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
559 locking, page_buf_size, min_meta_keep, min_raw_keep,
560 alignment_threshold=alignment_threshold,
561 alignment_interval=alignment_interval,
562 meta_block_size=meta_block_size,
563 **kwds)
564 fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
565 fs_persist=fs_persist, fs_threshold=fs_threshold,
566 fs_page_size=fs_page_size)
--> 567 fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
569 if isinstance(libver, tuple):
570 self._libver = libver
File ~/anaconda3/envs/jb/lib/python3.11/site-packages/h5py/_hl/files.py:231, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
229 if swmr and swmr_support:
230 flags |= h5f.ACC_SWMR_READ
--> 231 fid = h5f.open(name, flags, fapl=fapl)
232 elif mode == 'r+':
233 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File h5py/h5f.pyx:106, in h5py.h5f.open()
FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = '000006/sub-anm369962/sub-anm369962_ses-20170310.nwb', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Step 3. Access Information within the NWB File Hierarchy#
One of the first steps when working with a new dataset is to figure out what is in the dataset, and where. Each NWB file is composed of various groups, which either contain attributes of our file (metadata) or the data itself.
Here is the structure of a typical NWB file:
In order to see which groups are in our file, we can use the fields
attribute to return a dictionary containing the Groups of our NWB file. The dictionary keys are the various groups within the file which we will use to access the data we’re ultimately interested in.
Need a refresher on dictionaries? Consider working through the free Codecademy Python 3 lesson, or check the other resources on the Data Science in Python page.
# Get the Groups for the nwb file
nwb_fields = nwb_file.fields
print(nwb_fields.keys())
Experiment Metadata#
Let’s first pull out some metadata for the experiment we downloaded.
If you wish to access the related publications of the experimental data that you just downloaded, you can do so by accessing the related_publications
attribute of your NWB file object. Plug in the “doi:” address that prints below into a browser window to check out the original publication describing this data.
# Print the related publication
nwb_file.related_publications
Each NWB file will also have information on where the experiment was conducted, which lab conducted the experiment, as well as a description of the experiment. This information can be accessed using institution
, lab
, and experiment_description
, attributes on our nwb_file, respectively.
# Get metadata from NWB file
print('The experiment within this NWB file was conducted at',nwb_file.institution,'.'\
,nwb_file.experiment_description)
As you might have noticed at this point, we can access datasets from each group in our nwb_file with the following syntax: nwb_file.GROUPNAME
, just as we would typically access an attribute of object in Python. Below we will demonstrate some of the most useful groups within an NWB object.
Acquisition#
The acquisition
group contains datasets of acquisition data, mainly TimeSeries
objects belonging to this NWBFile.
nwb_file.acquisition
In this file, the acquisition group contains one dataset, lick_times
. This dataset has one field, time_series
, which contains two time series objects, lick_left_times
and lick_right_times
. To access the actual data arrays of these objects we must first subset our dataset of interest from the group. We can then use timestamps[:]
to return a list of timestamps for when the animal licked.
# select our dataset of interest
dataset = 'lick_times'
field = 'lick_right_times'
lick_r_dataset = nwb_file.acquisition[dataset][field]
# return the first 10 values in data array
lick_r_data_array = lick_r_dataset.timestamps[:10][:10]
print(lick_r_data_array)
Intervals#
The intervals
group contains all time interval tables from the experiment – things like, did the animal respond on the behavioral trial? Usefully, we can take intervals
and convert it to a tidy dataframe using to_dataframe()
.
# Select the group of interest from the nwb file
intervals = nwb_file.intervals
# Pull out trials and assign it as a dataframe
interval_trials_df = intervals['trials'].to_dataframe()
interval_trials_df.head()
In case you’re wondering what these columns are, the description
attribute provides a short description on each column of the dataframe.
# return the description of each col in our dataframe
for col in interval_trials_df:
print(col,':',intervals['trials'][col].description)
Units#
But wait, where’s all of the neural data? The units
group in our NWB file contains the processed signals from our individual neurons (units), including information about the spike sorting quality as well as the spike times – when each of these cells fired an action potential. Much like the intervals
group, units
can also be assigned to a dataframe.
units = nwb_file.units
units_df = units.to_dataframe()
units_df.head()
If we’d like to know where these spikes are coming from, we can look at the electrodes
attribute. The electrodes
group contains metadata about the electrodes used in the experiment, including the location of the electrodes, the type of filtering done on that channel, and which electrode group the electrode belongs to.
# electrode positions
electrodes = nwb_file.electrodes
electrodes_df = electrodes.to_dataframe()
electrodes_df.head()
Wondering what something in this table is? We can once again dig out the descriptions:
Not sure what’s happening below? Consider working through the Codecademy Python 3 course for a refresher on for loops.
# return the description of each col in our dataframe
for col in electrodes_df:
print(col,':',nwb_file.electrodes[col].description)
Now that we have an idea of what this file contains, we can finally take a look at some of the data! We’ll do that in the next section. 💃
Additional Resources#
For a detailed explanation of all groups contained within an NWB File object please visit the pynwb.file.NWBFile section of the PyNWB documentation.
The OpenScope DataBook also contains explanations of what is contained within NWB files.
Accessing metadata for different kinds of NWB files can be tricky. Here are some useful helper scripts from the OpenScope DataBook.