# How does the brain encode space?

In 2006, a group of researchers published a landmark paper ([Sargolini et al. 2006](https://doi.org/10.1126/science.1125572)) demonstrating that cells in the hippocampus fired in regular spatial intervals. These researchers, May-Britt & Edvard Moser, were [awarded the Nobel Prize](https://www.nobelprize.org/prizes/medicine/2014/press-release/) in 2014 for their efforts.  

This tutorial demonstrates how to access the dataset -- the very one they won the Nobel Prize for! -- published in using `dandi`.

The [dataset](https://dandiarchive.org/dandiset/000582/draft) contains spike times for recorded grid cells from the medial entorhinal cortex (MEC) in rats that explored two-dimensional environments. The behavioral data includes position from the tracking LED(s).

#### Contents:

* [Streaming NWB files](#stream-nwb)
* [Accessing data and metadata](#access-nwb)
* [Accessing behavior data](#position)
* [Accessing spike times](#spike-times)
* [Showing rate maps](#rate-maps)

#### Authors
This notebook was authored by Dorota Jareka, Szonja Weigl, and Ben Dichter. It was edited by Ashley Juavinett.

<mark>**Note #1**: The best way to complete this notebook is through the Dandihub. If you're not already running this book on the Dandihub, read [Using this Book](https://nwb4edu.github.io/Chapter_01/Using_This_Book.html) for instructions on how to run this on the Dandihub.</mark>

<hr>


## Streaming NWB files <a class="anchor" id="stream-nwb"></a>

This section demonstrates how to access the files on DANDI without downloading them. If you need a refresher, we discussed this in [Lesson 1](https://nwb4edu.github.io/Lesson_1/01-Obtaining_Datasets_with_DANDI.html). You can also reference the [Streaming NWB files](https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/streaming.html) tutorial from `PyNWB`.

The `DandiAPIClient` can be used to get the S3 URL of [this NWB file](https://dandiarchive.org/dandiset/000582/draft) stored in the DANDI Archive. 

In [None]:
from dandi.dandiapi import DandiAPIClient

dandiset_id, nwbfile_path = "000582", "sub-10073/sub-10073_ses-17010302_behavior+ecephys.nwb" # file size ~15.6MB

# Get the location of the file on DANDI
with DandiAPIClient() as client:
    asset = client.get_dandiset(dandiset_id, 'draft').get_asset_by_path(nwbfile_path)
    s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
    
print(s3_url)

Create a virtual filesystem using `fsspec` which will take care of requesting data from the S3 bucket whenever data is read from the virtual file.

In [None]:
from fsspec.implementations.cached import CachingFileSystem
from fsspec import filesystem
from h5py import File
from pynwb import NWBHDF5IO

# first, create a virtual filesystem based on the http protocol
fs=filesystem("http")

# create a cache to save downloaded data to disk (optional)
fs = CachingFileSystem(
    fs=fs,
    cache_storage="nwb-cache",  # Local folder for the cache
)

file_system = fs.open(s3_url, "rb")
file = File(file_system, mode="r")
# Open the file with NWBHDF5IO
io = NWBHDF5IO(file=file, load_namespaces=True)

nwbfile = io.read()
nwbfile

## Access metadata <a class="anchor" id="access-nwb"></a>

First, let's take a look at the metadata in this file.

`subject` is an attribute of the `nwbfile`. It holds information about the experimental subject, such as age (in [ISO 8601 Duration format](https://en.wikipedia.org/wiki/ISO_8601#Durations)), sex, and species in latin binomial nomenclature.

<div class="alert alert-success"><b>Tasks</b>:
    
1. Inspect the subject field. If you need a reminder for how to do this, see [Step 3 in Lesson 1](https://nwb4edu.github.io/Lesson_1/02-Working_with_NWB_format_in_Python.html).
    
2. Take a look at other attributes of `nwbfile` as well. Hint: you can hit 'tab' after `nwbfile.` to see all of the attributes and methods of the `nwbfile` object.
</div>

In [None]:
# Look at subject here


## Accessing behavior data <a class="anchor" id="position"></a>

The "behavior" processing module holds the behavior data in the NWB file which can be accessed as
`nwbfile.processing["behavior"]`.

### Position
"Position" gives us the location of the mouse in space. The position data is stored in a `SpatialSeries` object which can be accessed from the "Position" container as `nwbfile.processing["behavior"]["Position"]`.

<div class="alert alert-success"><b>Task</b>: Look at the original paper. How did the researchers figure out the position of the animal?</div>

Note that not all sessions have position data from two tracking LEDs.

In [None]:
spatial_series = nwbfile.processing["behavior"]["Position"]["SpatialSeriesLED1"]

# Inspect conversion and data
print(spatial_series.conversion)
print(spatial_series.data)

Now that we have the behavioral data, we can plot it. The `conversion` field tells us how to translate the values in `data` to meters. The `data` object here has 3,000 entries for x positions (at index 0) and y positions (at index 1). So, the first thing we'll do is convert the data into meters, and then we can plot it.

In [None]:
import matplotlib.pyplot as plt

# Extract & convert x and y positions
x =spatial_series.data[:, 0] * spatial_series.conversion
y =spatial_series.data[:, 1] * spatial_series.conversion

# Plot and label!
fig, ax = plt.subplots(1,1,figsize=(4,4))
plt.plot(x,y)
plt.xlabel('X (meters)')
plt.ylabel('Y (meters)')
plt.show()

## Accessing spike times <a class="anchor" id="spike-times"></a>

As a reminder, the `Units` table holds the spike times which can be accessed as `nwbfile.units` and can also be converted to a pandas `DataFrame`.

<div class="alert alert-success"><b>Task</b>: Access <code>units</code> below and convert it to a dataframe. Assign this to <code>units_df</code>. If you need a reminder for how to do this, refer back to Lesson 1. Inspect the entire dataframe.</div>

In [None]:
nwbfile.units.to_dataframe()

(Site note: for an interactive visualization of spike times and position, try out [Neurosift](https://flatironinstitute.github.io/neurosift/?p=/nwb&url=https://dandiarchive.s3.amazonaws.com/blobs/ec1/842/ec1842a0-2229-4096-8dcd-d42b49f9dd49).

## Visualizing rate maps <a class="anchor" id="rate-maps"></a>

As you can see in the dataframe above, there are 8 recorded neurons (indices 0 through 7) in this dataset.  This section demonstrates how to show the rate maps of those recorded cells. We will use [PYthon Neural Analysis Package](https://pynapple-org.github.io/pynapple/) (`pynapple`) to calculate the rate maps. The first cell will install `pynapple`.

In [3]:
try:
    import pynapple  
    print('pynapple imported.')
    
except ImportError as e:
    !pip install pynapple

pynapple imported.


Using the `compute_2d_tuning_curves()` function from `pynapple` (imported above as `nap`), we can compute firing rate as a function of position (map of neural activity as the animal explored the environment).

In [None]:
import pynapple as nap

# Compute position over time
position_over_time = nap.TsdFrame(
    d=spatial_series.data[:],
    t=spatial_series.timestamps[:],
    columns=["x","y"],
)

spike_times_group = nap.TsGroup({cell_id: nap.Ts(spikes) for cell_id, spikes in enumerate(nwbfile.units["spike_times"])})

num_bins = 15

rate_maps, position_bins = nap.compute_2d_tuning_curves(
    spike_times_group,
    position_over_time,
    num_bins,
)

print(type(rate_maps))
print(len(rate_maps))

The `rate_maps` object generated above is a dictionary, in which each entry's key is the unit ID, and the value is the rate map.

<div class="alert alert-success"><b>Task</b>: Using <code>plt.imshow()</code>, look at a few rate maps! As a reminder, you can extract the value of a dictionary using the syntax <code>dictionary_name[key]</code>.</div>

In [None]:
fig, ax = plt.subplots(1,1,figsize=(4,4))

# Plot a rate map or two here!


## Visualizing grid cells activity

To determine whether the firing fields of individual cells formed a grid structure, we will calculate the spatial autocorrelation for the rate map of each cell.

The autocorrelograms are based on Pearson’s product moment correlation coefficient with
corrections for edge effects and unvisited locations.  With λ (x, y) denoting the average rate of a cell at location (x, y), the autocorrelation between the fields with spatial lags of τx and τy was estimated as:

<img src="autocorrelation_equation.png" alt="autocorrelation_equation" />

where the summation is over all n pixels in λ (x, y) for which rate was estimated for both λ (x, y) and λ (x - τx, y - τy). Autocorrelations were not estimated for lags of τx, τy where n < 20.

The degree of spatial periodicity (gridness) can be determined for each cell by rotating the autocorrelation map for each cell in steps of 6 degrees (from 0 to 180 degrees) and computing the correlation between the rotated map and the original. The correlation is confined to the area defined by a circle around the peaks that are closest to the centre of the map, and the central peak is not included in the analysis.

The ‘gridness’ of a cell can be expressed as the difference between the lowest correlation at 60 and 120 degrees (where a peak correlation would be expected due to the triangular nature of the grid) and the highest correlation at 30, 90, and 150 degrees (where the minimum correlation would be expected). When the correlations at 60 and 120 degrees of rotation exceeded each of the correlations at 30, 90 and 150 degrees (gridness > 0), the cell was classified as a grid cell.

First, let's define our functions to help us make these calculations.

In [None]:
import numpy as np

def create_coer_arr(arr, rad_min=None, rad_max=None):
    """ Creating an array for correlation(tau_x, tau_y)
    Takes tau_x from the range (-arr.shape[0]+1, arr.shape[0]-1) and the same for tau_y
    """
    sh_x, sh_y = arr.shape
    # creating an array full of nan's
    coer_arr = np.full((2*sh_x-1, 2*sh_y-1), np.nan)
    for ii in range(0, 2*(sh_x-1)):
        for jj in range(0, 2*(sh_y-1)):
            # shifting tau_x/y
            tau_x = ii-sh_x+1
            tau_y = jj-sh_y+1
            # if rad_max and rad_min is provided, I only calculate the correlation for points between rad_min and rad_max
            if rad_max is not None and ((tau_x**2 + tau_y**2)**0.5 > rad_max):
                continue
            if rad_min is not None and ((tau_x**2 + tau_y**2)**0.5 < rad_min):
                continue
            coer_arr[ii, jj] = pearson_autocor(arr, lag_x=tau_x, lag_y=tau_y)
    return coer_arr

def pearson_autocor(arr, lag_x, lag_y):
    """ Calculates Pearson autocorrelation for an array that can contain NaN values."""
    sh_x, sh_y = arr.shape
    if abs(lag_x) >= sh_x or abs(lag_y) >= sh_y:
        raise Exception(f"abs(lag_x), abs(lag_y) have to be smaller than {sh_x}, {sh_y}, but {lag_x}, {lag_y} provided")

    # calculating sum for elements that meet the requirements
    n = 0
    sum1, sum2, sum3, sum4, sum5 = 0, 0, 0, 0, 0
    for ii in range(0, sh_x):
        for jj in range(0, sh_y):
            # checking if the indices are withing the array
            if 0 <= ii-lag_x < sh_x and 0 <= jj-lag_y < sh_y:
                # checking if both values (in ii,jj and shifted) are not nan
                if not np.isnan(arr[ii, jj]) and not np.isnan(arr[ii-lag_x, jj-lag_y]):
                    
                    n += 1
                    sum1 += arr[ii, jj] * arr[ii-lag_x, jj-lag_y]
                    sum2 += arr[ii, jj] 
                    sum3 += arr[ii-lag_x, jj-lag_y]
        
                    sum4 += (arr[ii, jj])**2
                    sum5 += (arr[ii-lag_x, jj-lag_y])**2

              
    # according to the paper they had this limit for number of points
    if n < 20:
        return np.nan

    numerator = n * sum1 - sum2 * sum3
    denominator = (n * sum4 - sum2**2)**0.5 * (n * sum5 - sum3**2)**0.5
    cor = numerator / denominator

    return cor

def pearson_cor_2arr(arr1, arr2):
    """ Calculates Pearson correlation for two arrays with the same shape and no NaN."""
    if not arr1.shape == arr2.shape:
        raise Exception("Both arrays should have the same shape")
    n = arr1.shape[0] * arr1.shape[1]
 
    numerator = n * np.sum(arr1 * arr2) - np.sum(arr1) * np.sum(arr2)
    denominator = (n * np.sum(arr1**2) - np.sum(arr1)**2)**0.5 * (n * np.sum(arr2**2) - np.sum(arr2)**2)**0.5
    return numerator / denominator

%whos

Now that we have these functions, let's use them to analyze the data. There's *a lot* of code below and it uses a new package `plotly` -- don't worry about the code, just the figures it creates.

In [None]:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from scipy.ndimage import rotate
import scipy

rate_maps_50_bin, _ = nap.compute_2d_tuning_curves(
    spike_times_group,
    position_over_time,
    50,
)

for cell_ind in range(len(rate_maps)):
    unit_name = nwbfile.units["unit_name"][cell_ind]
    fig = make_subplots(
        rows=1,
        cols=3,
        subplot_titles=(f'{unit_name} rate map', f'{unit_name} auto-correlation', "periodicity"),
    )

    rate_map_plot = go.Heatmap(z=rate_maps[cell_ind], colorscale='Viridis', showscale=False)
    fig.add_trace(rate_map_plot, row=1, col=1)

    # Compute auto-correlation
    autocorr = create_coer_arr(rate_maps_50_bin[cell_ind], rad_max=34, rad_min=6)
    autocorr_nonan = np.nan_to_num(autocorr, copy=True, nan=0.0)

    correlations = []
    angles = np.arange(0, 186, 6)
    for angle in angles:
        autocorr_nonan_rotated = rotate(autocorr_nonan, angle=angle, reshape=False)
        cor = pearson_cor_2arr(autocorr_nonan_rotated, autocorr_nonan)
        correlations.append(cor)
    
    gridness = max(correlations[10], correlations[20]) - max(correlations[5], correlations[15], correlations[25])
    gridness = np.round(gridness, 2)
    
    autocorr_rate_map = go.Heatmap(z=autocorr, colorscale='Viridis', showscale=False)
    fig.add_trace(autocorr_rate_map, row=1, col=2)
    
    line_trace = go.Scatter(
        x=angles,
        y=correlations,
        mode='lines',
        marker=dict(color="black"),
        
    )
    
    fig.add_trace(line_trace, row=1, col=3)

    fig.update_xaxes(showticklabels=False)
    fig.update_yaxes(showticklabels=False)
    
    fig.update_xaxes(showticklabels=True, row=1, col=3, title_text="Rotation (deg)")
    fig.update_yaxes(showticklabels=True, row=1, col=3, title_text="Correlation (r)")
    
    fig.update_layout(
        title=f"Gridness score {gridness}",
        xaxis3 = dict(
            tickmode="array",
            tickvals=[30, 60, 90, 120, 150, 180],
        )
    )

    fig.show()

As you can see most cells could be classified as "grid cells", the auto-correlation maps look good and they have high periodicity, i.e., there is clear sinusoidal behavior of the periodicity function with clear maximums around 0, 60, 120 and 180 deg. However, one example (t3c4) doesn't have clear autocorrelation and the periodicity function has no sinusoidal behavior.