{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this chapter we will be discussing the advatages to using the *Pandas* package to analyze large datasets. While NumPy is a useful package, it can only be used with data of the same datatype. [Pandas](https://pandas.pydata.org/docs/user_guide/index.html#user-guide), however, is a package that helps us manipulate and analyze heterogenous data.\n",
    "\n",
    "We strongly recommend looking at [\"10 minutes to pandas\"](https://pandas.pydata.org/docs/user_guide/10min.html) for a broader overview, but here we'll introduce the main concepts needed for the activities in this textbook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can use pandas, we need to import it. We can also nickname the modules when we import them. The convention is to import `pandas` as `pd`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variable   Type      Data/Info\n",
      "------------------------------\n",
      "pd         module    <module 'pandas' from '/U<...>ages/pandas/__init__.py'>\n"
     ]
    }
   ],
   "source": [
    "# Import packages\n",
    "import pandas as pd\n",
    "\n",
    "# Use whos 'magic command' to see available modules\n",
    "%whos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create and Manipulate Dataframes \n",
    "The two data structures of Pandas are the `Series` and the `DataFrame`. A `Series` is a one-dimensional onject similar to a list. A `DataFrame` can be thought of as a two-dimensional numpy array or a collection of `Series` objects. Series and dataframes can contain multiple different data types such as integers, strings, and floats, similar to an Excel spreadsheet. Pandas also supports `string` lables unlike numpy arrays which only have numeric labels for their rows and columns. For a more in depth explanation, please visit the [Introduction to Data Structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) section in the Pandas User Guide. \n",
    "\n",
    "You can create a Pandas dataframe by inputting dictionaries into the Pandas function `pd.DataFrame()`, by reading files, or through functions built into the Pandas package. The function [`pd.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) reads a comma- or tab-separated file and returns it as a `dataframe`.\n",
    "\n",
    "\n",
    "### DataFrame example\n",
    "Below we will create a dataframe by reading the file `brainarea_vs_genes_exp_w_reannotations.tsv` which contains information on gene expression accross multiple brain areas. \n",
    "\n",
    ">**About this dataset:**\n",
    "This dataset was created by Derek Howard and Abigail Mayes for the purpose of accelerating advances in data mining of open brain transcriptome data for polygenetic brain disorders. The data comes from normalized microarray datasets of gene expression from 6 adult human brains that was released by the Allen Brain Institute and then processed into the dataframe we will see below. For more information on this dataset please visit the <a href = \"https://github.com/derekhoward/HBAsets\"> HBAsets repository</a>. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene_symbol</th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "      <th>Crus I, lateral hemisphere</th>\n",
       "      <th>Crus I, paravermis</th>\n",
       "      <th>Crus II, lateral hemisphere</th>\n",
       "      <th>Crus II, paravermis</th>\n",
       "      <th>Edinger-Westphal nucleus</th>\n",
       "      <th>...</th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "      <th>transverse gyri</th>\n",
       "      <th>trochlear nucleus</th>\n",
       "      <th>tuberomammillary nucleus</th>\n",
       "      <th>ventral tegmental area</th>\n",
       "      <th>ventromedial hypothalamic nucleus</th>\n",
       "      <th>vestibular nuclei</th>\n",
       "      <th>zona incerta</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>A1BG</td>\n",
       "      <td>0.856487</td>\n",
       "      <td>-1.773695</td>\n",
       "      <td>-0.678679</td>\n",
       "      <td>-0.986914</td>\n",
       "      <td>0.826986</td>\n",
       "      <td>0.948039</td>\n",
       "      <td>0.935427</td>\n",
       "      <td>1.120774</td>\n",
       "      <td>-1.018554</td>\n",
       "      <td>...</td>\n",
       "      <td>0.277830</td>\n",
       "      <td>0.514923</td>\n",
       "      <td>0.733368</td>\n",
       "      <td>-0.104286</td>\n",
       "      <td>-0.910245</td>\n",
       "      <td>1.039610</td>\n",
       "      <td>-0.155167</td>\n",
       "      <td>-0.444398</td>\n",
       "      <td>-0.901361</td>\n",
       "      <td>-0.236790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A1BG-AS1</td>\n",
       "      <td>0.257664</td>\n",
       "      <td>-1.373085</td>\n",
       "      <td>-0.619923</td>\n",
       "      <td>-0.636275</td>\n",
       "      <td>0.362799</td>\n",
       "      <td>0.353296</td>\n",
       "      <td>0.422766</td>\n",
       "      <td>0.346853</td>\n",
       "      <td>-0.812015</td>\n",
       "      <td>...</td>\n",
       "      <td>1.074116</td>\n",
       "      <td>0.821031</td>\n",
       "      <td>1.219272</td>\n",
       "      <td>0.901213</td>\n",
       "      <td>-1.522431</td>\n",
       "      <td>0.598719</td>\n",
       "      <td>-1.709745</td>\n",
       "      <td>-0.054156</td>\n",
       "      <td>-1.695843</td>\n",
       "      <td>-1.155961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A1CF</td>\n",
       "      <td>-0.089614</td>\n",
       "      <td>-0.546903</td>\n",
       "      <td>0.282914</td>\n",
       "      <td>-0.528926</td>\n",
       "      <td>0.507916</td>\n",
       "      <td>0.577696</td>\n",
       "      <td>0.647671</td>\n",
       "      <td>0.306824</td>\n",
       "      <td>0.089958</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.030265</td>\n",
       "      <td>-0.187367</td>\n",
       "      <td>-0.428358</td>\n",
       "      <td>-0.465863</td>\n",
       "      <td>-0.136936</td>\n",
       "      <td>1.229487</td>\n",
       "      <td>-0.110680</td>\n",
       "      <td>-0.118175</td>\n",
       "      <td>-0.139776</td>\n",
       "      <td>0.123829</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A2M</td>\n",
       "      <td>0.552415</td>\n",
       "      <td>-0.635485</td>\n",
       "      <td>-0.954995</td>\n",
       "      <td>-0.259745</td>\n",
       "      <td>-1.687391</td>\n",
       "      <td>-1.756847</td>\n",
       "      <td>-1.640242</td>\n",
       "      <td>-1.733110</td>\n",
       "      <td>-0.091695</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.058505</td>\n",
       "      <td>0.207109</td>\n",
       "      <td>-0.161808</td>\n",
       "      <td>0.183630</td>\n",
       "      <td>0.948098</td>\n",
       "      <td>-0.977692</td>\n",
       "      <td>0.911896</td>\n",
       "      <td>-0.499357</td>\n",
       "      <td>1.469386</td>\n",
       "      <td>0.557998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A2ML1</td>\n",
       "      <td>0.758031</td>\n",
       "      <td>1.549857</td>\n",
       "      <td>1.262225</td>\n",
       "      <td>1.338780</td>\n",
       "      <td>-0.289888</td>\n",
       "      <td>-0.407026</td>\n",
       "      <td>-0.358798</td>\n",
       "      <td>-0.589988</td>\n",
       "      <td>0.944684</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.472908</td>\n",
       "      <td>-0.598317</td>\n",
       "      <td>-0.247797</td>\n",
       "      <td>-0.282673</td>\n",
       "      <td>1.396365</td>\n",
       "      <td>0.945043</td>\n",
       "      <td>0.158202</td>\n",
       "      <td>0.572771</td>\n",
       "      <td>0.073088</td>\n",
       "      <td>-0.886780</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 233 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  gene_symbol  CA1 field  CA2 field  CA3 field  CA4 field  \\\n",
       "0        A1BG   0.856487  -1.773695  -0.678679  -0.986914   \n",
       "1    A1BG-AS1   0.257664  -1.373085  -0.619923  -0.636275   \n",
       "2        A1CF  -0.089614  -0.546903   0.282914  -0.528926   \n",
       "3         A2M   0.552415  -0.635485  -0.954995  -0.259745   \n",
       "4       A2ML1   0.758031   1.549857   1.262225   1.338780   \n",
       "\n",
       "   Crus I, lateral hemisphere  Crus I, paravermis  \\\n",
       "0                    0.826986            0.948039   \n",
       "1                    0.362799            0.353296   \n",
       "2                    0.507916            0.577696   \n",
       "3                   -1.687391           -1.756847   \n",
       "4                   -0.289888           -0.407026   \n",
       "\n",
       "   Crus II, lateral hemisphere  Crus II, paravermis  Edinger-Westphal nucleus  \\\n",
       "0                     0.935427             1.120774                 -1.018554   \n",
       "1                     0.422766             0.346853                 -0.812015   \n",
       "2                     0.647671             0.306824                  0.089958   \n",
       "3                    -1.640242            -1.733110                 -0.091695   \n",
       "4                    -0.358798            -0.589988                  0.944684   \n",
       "\n",
       "   ...  temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "0  ...                        0.277830                      0.514923   \n",
       "1  ...                        1.074116                      0.821031   \n",
       "2  ...                       -0.030265                     -0.187367   \n",
       "3  ...                       -0.058505                      0.207109   \n",
       "4  ...                       -0.472908                     -0.598317   \n",
       "\n",
       "   temporal pole, superior aspect  transverse gyri  trochlear nucleus  \\\n",
       "0                        0.733368        -0.104286          -0.910245   \n",
       "1                        1.219272         0.901213          -1.522431   \n",
       "2                       -0.428358        -0.465863          -0.136936   \n",
       "3                       -0.161808         0.183630           0.948098   \n",
       "4                       -0.247797        -0.282673           1.396365   \n",
       "\n",
       "   tuberomammillary nucleus  ventral tegmental area  \\\n",
       "0                  1.039610               -0.155167   \n",
       "1                  0.598719               -1.709745   \n",
       "2                  1.229487               -0.110680   \n",
       "3                 -0.977692                0.911896   \n",
       "4                  0.945043                0.158202   \n",
       "\n",
       "   ventromedial hypothalamic nucleus  vestibular nuclei  zona incerta  \n",
       "0                          -0.444398          -0.901361     -0.236790  \n",
       "1                          -0.054156          -1.695843     -1.155961  \n",
       "2                          -0.118175          -0.139776      0.123829  \n",
       "3                          -0.499357           1.469386      0.557998  \n",
       "4                           0.572771           0.073088     -0.886780  \n",
       "\n",
       "[5 rows x 233 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read in the list of lists as a data frame\n",
    "file_name = 'brainarea_vs_genes_exp_w_reannotations.tsv'\n",
    "gene_df = pd.read_csv(file_name, sep='\\t')\n",
    "\n",
    "# '.head()' returns the first 5 rows in the dataframe\n",
    "gene_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the moment, the first column  of information above, the **index** just contains a list of numbers. We can reassign the row labels by using the method `set_index()`. We can choose any column in our present dataframe to be the row values. Let's assign the row lables to be the `gene_symbol` and reassign the dataframe. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "      <th>Crus I, lateral hemisphere</th>\n",
       "      <th>Crus I, paravermis</th>\n",
       "      <th>Crus II, lateral hemisphere</th>\n",
       "      <th>Crus II, paravermis</th>\n",
       "      <th>Edinger-Westphal nucleus</th>\n",
       "      <th>Heschl's gyrus</th>\n",
       "      <th>...</th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "      <th>transverse gyri</th>\n",
       "      <th>trochlear nucleus</th>\n",
       "      <th>tuberomammillary nucleus</th>\n",
       "      <th>ventral tegmental area</th>\n",
       "      <th>ventromedial hypothalamic nucleus</th>\n",
       "      <th>vestibular nuclei</th>\n",
       "      <th>zona incerta</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A1BG</th>\n",
       "      <td>0.856487</td>\n",
       "      <td>-1.773695</td>\n",
       "      <td>-0.678679</td>\n",
       "      <td>-0.986914</td>\n",
       "      <td>0.826986</td>\n",
       "      <td>0.948039</td>\n",
       "      <td>0.935427</td>\n",
       "      <td>1.120774</td>\n",
       "      <td>-1.018554</td>\n",
       "      <td>0.170282</td>\n",
       "      <td>...</td>\n",
       "      <td>0.277830</td>\n",
       "      <td>0.514923</td>\n",
       "      <td>0.733368</td>\n",
       "      <td>-0.104286</td>\n",
       "      <td>-0.910245</td>\n",
       "      <td>1.039610</td>\n",
       "      <td>-0.155167</td>\n",
       "      <td>-0.444398</td>\n",
       "      <td>-0.901361</td>\n",
       "      <td>-0.236790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1BG-AS1</th>\n",
       "      <td>0.257664</td>\n",
       "      <td>-1.373085</td>\n",
       "      <td>-0.619923</td>\n",
       "      <td>-0.636275</td>\n",
       "      <td>0.362799</td>\n",
       "      <td>0.353296</td>\n",
       "      <td>0.422766</td>\n",
       "      <td>0.346853</td>\n",
       "      <td>-0.812015</td>\n",
       "      <td>0.903358</td>\n",
       "      <td>...</td>\n",
       "      <td>1.074116</td>\n",
       "      <td>0.821031</td>\n",
       "      <td>1.219272</td>\n",
       "      <td>0.901213</td>\n",
       "      <td>-1.522431</td>\n",
       "      <td>0.598719</td>\n",
       "      <td>-1.709745</td>\n",
       "      <td>-0.054156</td>\n",
       "      <td>-1.695843</td>\n",
       "      <td>-1.155961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1CF</th>\n",
       "      <td>-0.089614</td>\n",
       "      <td>-0.546903</td>\n",
       "      <td>0.282914</td>\n",
       "      <td>-0.528926</td>\n",
       "      <td>0.507916</td>\n",
       "      <td>0.577696</td>\n",
       "      <td>0.647671</td>\n",
       "      <td>0.306824</td>\n",
       "      <td>0.089958</td>\n",
       "      <td>0.149820</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.030265</td>\n",
       "      <td>-0.187367</td>\n",
       "      <td>-0.428358</td>\n",
       "      <td>-0.465863</td>\n",
       "      <td>-0.136936</td>\n",
       "      <td>1.229487</td>\n",
       "      <td>-0.110680</td>\n",
       "      <td>-0.118175</td>\n",
       "      <td>-0.139776</td>\n",
       "      <td>0.123829</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2M</th>\n",
       "      <td>0.552415</td>\n",
       "      <td>-0.635485</td>\n",
       "      <td>-0.954995</td>\n",
       "      <td>-0.259745</td>\n",
       "      <td>-1.687391</td>\n",
       "      <td>-1.756847</td>\n",
       "      <td>-1.640242</td>\n",
       "      <td>-1.733110</td>\n",
       "      <td>-0.091695</td>\n",
       "      <td>0.003428</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.058505</td>\n",
       "      <td>0.207109</td>\n",
       "      <td>-0.161808</td>\n",
       "      <td>0.183630</td>\n",
       "      <td>0.948098</td>\n",
       "      <td>-0.977692</td>\n",
       "      <td>0.911896</td>\n",
       "      <td>-0.499357</td>\n",
       "      <td>1.469386</td>\n",
       "      <td>0.557998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2ML1</th>\n",
       "      <td>0.758031</td>\n",
       "      <td>1.549857</td>\n",
       "      <td>1.262225</td>\n",
       "      <td>1.338780</td>\n",
       "      <td>-0.289888</td>\n",
       "      <td>-0.407026</td>\n",
       "      <td>-0.358798</td>\n",
       "      <td>-0.589988</td>\n",
       "      <td>0.944684</td>\n",
       "      <td>-0.466327</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.472908</td>\n",
       "      <td>-0.598317</td>\n",
       "      <td>-0.247797</td>\n",
       "      <td>-0.282673</td>\n",
       "      <td>1.396365</td>\n",
       "      <td>0.945043</td>\n",
       "      <td>0.158202</td>\n",
       "      <td>0.572771</td>\n",
       "      <td>0.073088</td>\n",
       "      <td>-0.886780</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 232 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             CA1 field  CA2 field  CA3 field  CA4 field  \\\n",
       "gene_symbol                                               \n",
       "A1BG          0.856487  -1.773695  -0.678679  -0.986914   \n",
       "A1BG-AS1      0.257664  -1.373085  -0.619923  -0.636275   \n",
       "A1CF         -0.089614  -0.546903   0.282914  -0.528926   \n",
       "A2M           0.552415  -0.635485  -0.954995  -0.259745   \n",
       "A2ML1         0.758031   1.549857   1.262225   1.338780   \n",
       "\n",
       "             Crus I, lateral hemisphere  Crus I, paravermis  \\\n",
       "gene_symbol                                                   \n",
       "A1BG                           0.826986            0.948039   \n",
       "A1BG-AS1                       0.362799            0.353296   \n",
       "A1CF                           0.507916            0.577696   \n",
       "A2M                           -1.687391           -1.756847   \n",
       "A2ML1                         -0.289888           -0.407026   \n",
       "\n",
       "             Crus II, lateral hemisphere  Crus II, paravermis  \\\n",
       "gene_symbol                                                     \n",
       "A1BG                            0.935427             1.120774   \n",
       "A1BG-AS1                        0.422766             0.346853   \n",
       "A1CF                            0.647671             0.306824   \n",
       "A2M                            -1.640242            -1.733110   \n",
       "A2ML1                          -0.358798            -0.589988   \n",
       "\n",
       "             Edinger-Westphal nucleus  Heschl's gyrus  ...  \\\n",
       "gene_symbol                                            ...   \n",
       "A1BG                        -1.018554        0.170282  ...   \n",
       "A1BG-AS1                    -0.812015        0.903358  ...   \n",
       "A1CF                         0.089958        0.149820  ...   \n",
       "A2M                         -0.091695        0.003428  ...   \n",
       "A2ML1                        0.944684       -0.466327  ...   \n",
       "\n",
       "             temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "gene_symbol                                                                 \n",
       "A1BG                               0.277830                      0.514923   \n",
       "A1BG-AS1                           1.074116                      0.821031   \n",
       "A1CF                              -0.030265                     -0.187367   \n",
       "A2M                               -0.058505                      0.207109   \n",
       "A2ML1                             -0.472908                     -0.598317   \n",
       "\n",
       "             temporal pole, superior aspect  transverse gyri  \\\n",
       "gene_symbol                                                    \n",
       "A1BG                               0.733368        -0.104286   \n",
       "A1BG-AS1                           1.219272         0.901213   \n",
       "A1CF                              -0.428358        -0.465863   \n",
       "A2M                               -0.161808         0.183630   \n",
       "A2ML1                             -0.247797        -0.282673   \n",
       "\n",
       "             trochlear nucleus  tuberomammillary nucleus  \\\n",
       "gene_symbol                                                \n",
       "A1BG                 -0.910245                  1.039610   \n",
       "A1BG-AS1             -1.522431                  0.598719   \n",
       "A1CF                 -0.136936                  1.229487   \n",
       "A2M                   0.948098                 -0.977692   \n",
       "A2ML1                 1.396365                  0.945043   \n",
       "\n",
       "             ventral tegmental area  ventromedial hypothalamic nucleus  \\\n",
       "gene_symbol                                                              \n",
       "A1BG                      -0.155167                          -0.444398   \n",
       "A1BG-AS1                  -1.709745                          -0.054156   \n",
       "A1CF                      -0.110680                          -0.118175   \n",
       "A2M                        0.911896                          -0.499357   \n",
       "A2ML1                      0.158202                           0.572771   \n",
       "\n",
       "             vestibular nuclei  zona incerta  \n",
       "gene_symbol                                   \n",
       "A1BG                 -0.901361     -0.236790  \n",
       "A1BG-AS1             -1.695843     -1.155961  \n",
       "A1CF                 -0.139776      0.123829  \n",
       "A2M                   1.469386      0.557998  \n",
       "A2ML1                 0.073088     -0.886780  \n",
       "\n",
       "[5 rows x 232 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "row_index = 'gene_symbol'\n",
    "gene_df = gene_df.set_index(row_index)\n",
    "gene_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It would help to know what information is in our dataset. In other words, what is across the columns at the top? We can get a list by accessing the `columns` attribute. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['CA1 field', 'CA2 field', 'CA3 field', 'CA4 field',\n",
       "       'Crus I, lateral hemisphere', 'Crus I, paravermis',\n",
       "       'Crus II, lateral hemisphere', 'Crus II, paravermis',\n",
       "       'Edinger-Westphal nucleus', 'Heschl's gyrus',\n",
       "       ...\n",
       "       'temporal pole, inferior aspect', 'temporal pole, medial aspect',\n",
       "       'temporal pole, superior aspect', 'transverse gyri',\n",
       "       'trochlear nucleus', 'tuberomammillary nucleus',\n",
       "       'ventral tegmental area', 'ventromedial hypothalamic nucleus',\n",
       "       'vestibular nuclei', 'zona incerta'],\n",
       "      dtype='object', length=232)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Access the columns of our dataframe \n",
    "gene_df_columns = gene_df.columns \n",
    "gene_df_columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing Dataframes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Indexing in Pandas works slightly different than in NumPy. Similar to a dictionary, we can index dataframes by their names. \n",
    "\n",
    "The syntax for indexing single locations in a dataframe is `dataframe.loc[row_label,column_label]`. To index an individual column, we use the shorthand syntax `dataframe.[column_label]`. To index an individual row, we use the syntax `dataframe.loc[row_label]`. To index by index #, we use the syntax `dataframe.iloc[index_number]`. Below are some examples on how to access rows, columns, and single values in our dataframe. For more information on indexing dataframes, visit the <a href = \"https://pandas.pydata.org/docs/user_guide/indexing.html#indexing\"> \"Indexing and selecting data\"</a> section in the Pandas User Guide."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gene expression values in CA1 field:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "gene_symbol\n",
       "A1BG        0.856487\n",
       "A1BG-AS1    0.257664\n",
       "A1CF       -0.089614\n",
       "A2M         0.552415\n",
       "A2ML1       0.758031\n",
       "              ...   \n",
       "ZYG11A     -0.496398\n",
       "ZYG11B     -0.856866\n",
       "ZYX        -1.941816\n",
       "ZZEF1      -0.015748\n",
       "ZZZ3       -0.924901\n",
       "Name: CA1 field, Length: 20869, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select a single column\n",
    "column = 'CA1 field'\n",
    "print('Gene expression values in CA1 field:')\n",
    "gene_df[column]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gene expression of  A1BG  across brain regions:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "CA1 field                            0.856487\n",
       "CA2 field                           -1.773695\n",
       "CA3 field                           -0.678679\n",
       "CA4 field                           -0.986914\n",
       "Crus I, lateral hemisphere           0.826986\n",
       "                                       ...   \n",
       "tuberomammillary nucleus             1.039610\n",
       "ventral tegmental area              -0.155167\n",
       "ventromedial hypothalamic nucleus   -0.444398\n",
       "vestibular nuclei                   -0.901361\n",
       "zona incerta                        -0.236790\n",
       "Name: A1BG, Length: 232, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select a single row\n",
    "row = 'A1BG'\n",
    "print('Gene expression of ', row, ' across brain regions:')\n",
    "gene_df.loc[row]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gene expression of A1BG in CA1 field:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0.8564873784944677"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select an individual value \n",
    "print('Gene expression of A1BG in CA1 field:')\n",
    "gene_df.loc[row, column]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select multiple different columns, you can use a `list` of all your columns of interest as so:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gene expression values in multiple regions :\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>Crus I, lateral hemisphere</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A1BG</th>\n",
       "      <td>0.856487</td>\n",
       "      <td>-0.678679</td>\n",
       "      <td>0.826986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1BG-AS1</th>\n",
       "      <td>0.257664</td>\n",
       "      <td>-0.619923</td>\n",
       "      <td>0.362799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1CF</th>\n",
       "      <td>-0.089614</td>\n",
       "      <td>0.282914</td>\n",
       "      <td>0.507916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2M</th>\n",
       "      <td>0.552415</td>\n",
       "      <td>-0.954995</td>\n",
       "      <td>-1.687391</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2ML1</th>\n",
       "      <td>0.758031</td>\n",
       "      <td>1.262225</td>\n",
       "      <td>-0.289888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZYG11A</th>\n",
       "      <td>-0.496398</td>\n",
       "      <td>0.325555</td>\n",
       "      <td>0.158885</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZYG11B</th>\n",
       "      <td>-0.856866</td>\n",
       "      <td>0.701878</td>\n",
       "      <td>0.337138</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZYX</th>\n",
       "      <td>-1.941816</td>\n",
       "      <td>-0.681255</td>\n",
       "      <td>0.872683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZZEF1</th>\n",
       "      <td>-0.015748</td>\n",
       "      <td>0.743609</td>\n",
       "      <td>1.108376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZZZ3</th>\n",
       "      <td>-0.924901</td>\n",
       "      <td>0.108320</td>\n",
       "      <td>-1.591413</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>20869 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             CA1 field  CA3 field  Crus I, lateral hemisphere\n",
       "gene_symbol                                                  \n",
       "A1BG          0.856487  -0.678679                    0.826986\n",
       "A1BG-AS1      0.257664  -0.619923                    0.362799\n",
       "A1CF         -0.089614   0.282914                    0.507916\n",
       "A2M           0.552415  -0.954995                   -1.687391\n",
       "A2ML1         0.758031   1.262225                   -0.289888\n",
       "...                ...        ...                         ...\n",
       "ZYG11A       -0.496398   0.325555                    0.158885\n",
       "ZYG11B       -0.856866   0.701878                    0.337138\n",
       "ZYX          -1.941816  -0.681255                    0.872683\n",
       "ZZEF1        -0.015748   0.743609                    1.108376\n",
       "ZZZ3         -0.924901   0.108320                   -1.591413\n",
       "\n",
       "[20869 rows x 3 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select multiple columns\n",
    "print('Gene expression values in multiple regions :')\n",
    "columns = ['CA1 field', 'CA3 field', 'Crus I, lateral hemisphere']\n",
    "gene_df[columns]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Subsetting "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like NumPy arrays, we can subset our original dataframe to only include data that meets our criteria. Our dataframe has data on multiple different brain areas with many gene expression values. You can filter this dataframe using the following syntax:\n",
    "```\n",
    "new_df = original_df[original_df['Column of Interest'] == 'Desired Value']\n",
    "```\n",
    "In plain english, what this is saying is: save a dataframe from the original dataframe, where the original dataframe values in my Column of Interest are equal to my Desired Value. For more information on subsetting,  visit the <a href = \"https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html\"> \"How do I select a subset of a DataFrame\"</a> section in the Pandas documentation. \n",
    "\n",
    "Below we will demonstrate how to execute this by taking a look at the `CA1 field` column in `gene_df`. We will create a dataframe from `gene_df` that only contains genes that showed a certain level of gene expression. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "      <th>Crus I, lateral hemisphere</th>\n",
       "      <th>Crus I, paravermis</th>\n",
       "      <th>Crus II, lateral hemisphere</th>\n",
       "      <th>Crus II, paravermis</th>\n",
       "      <th>Edinger-Westphal nucleus</th>\n",
       "      <th>Heschl's gyrus</th>\n",
       "      <th>...</th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "      <th>transverse gyri</th>\n",
       "      <th>trochlear nucleus</th>\n",
       "      <th>tuberomammillary nucleus</th>\n",
       "      <th>ventral tegmental area</th>\n",
       "      <th>ventromedial hypothalamic nucleus</th>\n",
       "      <th>vestibular nuclei</th>\n",
       "      <th>zona incerta</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ABCC12</th>\n",
       "      <td>2.089999</td>\n",
       "      <td>0.684837</td>\n",
       "      <td>0.097313</td>\n",
       "      <td>-0.051411</td>\n",
       "      <td>-1.078900</td>\n",
       "      <td>-0.912071</td>\n",
       "      <td>-1.131497</td>\n",
       "      <td>-0.799075</td>\n",
       "      <td>-0.009423</td>\n",
       "      <td>0.889132</td>\n",
       "      <td>...</td>\n",
       "      <td>1.289340</td>\n",
       "      <td>0.885886</td>\n",
       "      <td>1.271053</td>\n",
       "      <td>0.650801</td>\n",
       "      <td>-0.083413</td>\n",
       "      <td>-0.793237</td>\n",
       "      <td>-0.499512</td>\n",
       "      <td>-0.762330</td>\n",
       "      <td>-0.902496</td>\n",
       "      <td>-0.904421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ABHD17C</th>\n",
       "      <td>1.716973</td>\n",
       "      <td>0.601041</td>\n",
       "      <td>1.132500</td>\n",
       "      <td>1.354679</td>\n",
       "      <td>-0.923195</td>\n",
       "      <td>-0.887576</td>\n",
       "      <td>-1.122027</td>\n",
       "      <td>-0.926876</td>\n",
       "      <td>0.339678</td>\n",
       "      <td>0.208174</td>\n",
       "      <td>...</td>\n",
       "      <td>0.530983</td>\n",
       "      <td>0.809101</td>\n",
       "      <td>0.763170</td>\n",
       "      <td>-0.093598</td>\n",
       "      <td>-0.313468</td>\n",
       "      <td>-0.164013</td>\n",
       "      <td>-0.537971</td>\n",
       "      <td>1.169105</td>\n",
       "      <td>-0.663245</td>\n",
       "      <td>-0.377181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ABI1</th>\n",
       "      <td>2.051762</td>\n",
       "      <td>2.571777</td>\n",
       "      <td>2.472188</td>\n",
       "      <td>2.261170</td>\n",
       "      <td>0.366138</td>\n",
       "      <td>0.507783</td>\n",
       "      <td>0.449606</td>\n",
       "      <td>0.498424</td>\n",
       "      <td>-1.365242</td>\n",
       "      <td>0.472104</td>\n",
       "      <td>...</td>\n",
       "      <td>0.868471</td>\n",
       "      <td>1.197022</td>\n",
       "      <td>0.914783</td>\n",
       "      <td>0.473846</td>\n",
       "      <td>-2.230209</td>\n",
       "      <td>-0.330684</td>\n",
       "      <td>-1.153189</td>\n",
       "      <td>-0.073650</td>\n",
       "      <td>-1.681602</td>\n",
       "      <td>-1.049258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ACTB</th>\n",
       "      <td>2.489711</td>\n",
       "      <td>2.806688</td>\n",
       "      <td>2.461655</td>\n",
       "      <td>2.340131</td>\n",
       "      <td>-1.296731</td>\n",
       "      <td>-1.334696</td>\n",
       "      <td>-1.158460</td>\n",
       "      <td>-1.461027</td>\n",
       "      <td>0.088209</td>\n",
       "      <td>0.027236</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.370215</td>\n",
       "      <td>-0.946920</td>\n",
       "      <td>-0.197363</td>\n",
       "      <td>-0.094468</td>\n",
       "      <td>0.926183</td>\n",
       "      <td>-0.034337</td>\n",
       "      <td>0.803164</td>\n",
       "      <td>-0.389238</td>\n",
       "      <td>0.596916</td>\n",
       "      <td>0.110921</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ACTR2</th>\n",
       "      <td>2.655049</td>\n",
       "      <td>2.384445</td>\n",
       "      <td>1.728348</td>\n",
       "      <td>1.413585</td>\n",
       "      <td>-1.377474</td>\n",
       "      <td>-1.207922</td>\n",
       "      <td>-1.496018</td>\n",
       "      <td>-1.359462</td>\n",
       "      <td>0.632764</td>\n",
       "      <td>0.259402</td>\n",
       "      <td>...</td>\n",
       "      <td>0.215683</td>\n",
       "      <td>0.142880</td>\n",
       "      <td>0.080649</td>\n",
       "      <td>0.312218</td>\n",
       "      <td>1.793563</td>\n",
       "      <td>-0.028263</td>\n",
       "      <td>-0.189930</td>\n",
       "      <td>-0.389064</td>\n",
       "      <td>-0.484580</td>\n",
       "      <td>-1.478232</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 232 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             CA1 field  CA2 field  CA3 field  CA4 field  \\\n",
       "gene_symbol                                               \n",
       "ABCC12        2.089999   0.684837   0.097313  -0.051411   \n",
       "ABHD17C       1.716973   0.601041   1.132500   1.354679   \n",
       "ABI1          2.051762   2.571777   2.472188   2.261170   \n",
       "ACTB          2.489711   2.806688   2.461655   2.340131   \n",
       "ACTR2         2.655049   2.384445   1.728348   1.413585   \n",
       "\n",
       "             Crus I, lateral hemisphere  Crus I, paravermis  \\\n",
       "gene_symbol                                                   \n",
       "ABCC12                        -1.078900           -0.912071   \n",
       "ABHD17C                       -0.923195           -0.887576   \n",
       "ABI1                           0.366138            0.507783   \n",
       "ACTB                          -1.296731           -1.334696   \n",
       "ACTR2                         -1.377474           -1.207922   \n",
       "\n",
       "             Crus II, lateral hemisphere  Crus II, paravermis  \\\n",
       "gene_symbol                                                     \n",
       "ABCC12                         -1.131497            -0.799075   \n",
       "ABHD17C                        -1.122027            -0.926876   \n",
       "ABI1                            0.449606             0.498424   \n",
       "ACTB                           -1.158460            -1.461027   \n",
       "ACTR2                          -1.496018            -1.359462   \n",
       "\n",
       "             Edinger-Westphal nucleus  Heschl's gyrus  ...  \\\n",
       "gene_symbol                                            ...   \n",
       "ABCC12                      -0.009423        0.889132  ...   \n",
       "ABHD17C                      0.339678        0.208174  ...   \n",
       "ABI1                        -1.365242        0.472104  ...   \n",
       "ACTB                         0.088209        0.027236  ...   \n",
       "ACTR2                        0.632764        0.259402  ...   \n",
       "\n",
       "             temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "gene_symbol                                                                 \n",
       "ABCC12                             1.289340                      0.885886   \n",
       "ABHD17C                            0.530983                      0.809101   \n",
       "ABI1                               0.868471                      1.197022   \n",
       "ACTB                              -0.370215                     -0.946920   \n",
       "ACTR2                              0.215683                      0.142880   \n",
       "\n",
       "             temporal pole, superior aspect  transverse gyri  \\\n",
       "gene_symbol                                                    \n",
       "ABCC12                             1.271053         0.650801   \n",
       "ABHD17C                            0.763170        -0.093598   \n",
       "ABI1                               0.914783         0.473846   \n",
       "ACTB                              -0.197363        -0.094468   \n",
       "ACTR2                              0.080649         0.312218   \n",
       "\n",
       "             trochlear nucleus  tuberomammillary nucleus  \\\n",
       "gene_symbol                                                \n",
       "ABCC12               -0.083413                 -0.793237   \n",
       "ABHD17C              -0.313468                 -0.164013   \n",
       "ABI1                 -2.230209                 -0.330684   \n",
       "ACTB                  0.926183                 -0.034337   \n",
       "ACTR2                 1.793563                 -0.028263   \n",
       "\n",
       "             ventral tegmental area  ventromedial hypothalamic nucleus  \\\n",
       "gene_symbol                                                              \n",
       "ABCC12                    -0.499512                          -0.762330   \n",
       "ABHD17C                   -0.537971                           1.169105   \n",
       "ABI1                      -1.153189                          -0.073650   \n",
       "ACTB                       0.803164                          -0.389238   \n",
       "ACTR2                     -0.189930                          -0.389064   \n",
       "\n",
       "             vestibular nuclei  zona incerta  \n",
       "gene_symbol                                   \n",
       "ABCC12               -0.902496     -0.904421  \n",
       "ABHD17C              -0.663245     -0.377181  \n",
       "ABI1                 -1.681602     -1.049258  \n",
       "ACTB                  0.596916      0.110921  \n",
       "ACTR2                -0.484580     -1.478232  \n",
       "\n",
       "[5 rows x 232 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a dataframe with only genes that have an expression \n",
    "# value greater than 1.7 in 'CA1 field' \n",
    "desired_column = 'CA1 field'\n",
    "desired_value = 1.7\n",
    "new_gene_df = gene_df[gene_df[desired_column] > desired_value]\n",
    "new_gene_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataFrame Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas has many useful methods that you can use on your data, including `describe`, `mean`, and more. To learn more about all the different methods that can be used to manipulate and analyze dataframes, please visit the <a href = \"https://pandas.pydata.org/docs/user_guide/index.html\"> Pandas User Guide </a>. We will demonstrate some of these methods below. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `describe` method returns descriptive statistics of all the columns in our dataframe. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "      <th>Crus I, lateral hemisphere</th>\n",
       "      <th>Crus I, paravermis</th>\n",
       "      <th>Crus II, lateral hemisphere</th>\n",
       "      <th>Crus II, paravermis</th>\n",
       "      <th>Edinger-Westphal nucleus</th>\n",
       "      <th>Heschl's gyrus</th>\n",
       "      <th>...</th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "      <th>transverse gyri</th>\n",
       "      <th>trochlear nucleus</th>\n",
       "      <th>tuberomammillary nucleus</th>\n",
       "      <th>ventral tegmental area</th>\n",
       "      <th>ventromedial hypothalamic nucleus</th>\n",
       "      <th>vestibular nuclei</th>\n",
       "      <th>zona incerta</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "      <td>20869.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>0.003664</td>\n",
       "      <td>0.017002</td>\n",
       "      <td>0.015315</td>\n",
       "      <td>-0.016633</td>\n",
       "      <td>0.093686</td>\n",
       "      <td>0.088810</td>\n",
       "      <td>0.096118</td>\n",
       "      <td>0.087608</td>\n",
       "      <td>0.047124</td>\n",
       "      <td>-0.042517</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.056050</td>\n",
       "      <td>-0.051731</td>\n",
       "      <td>-0.049905</td>\n",
       "      <td>-0.039528</td>\n",
       "      <td>0.059726</td>\n",
       "      <td>0.014856</td>\n",
       "      <td>0.009535</td>\n",
       "      <td>-0.002853</td>\n",
       "      <td>-0.002527</td>\n",
       "      <td>0.013018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>0.924456</td>\n",
       "      <td>1.129368</td>\n",
       "      <td>1.078987</td>\n",
       "      <td>0.897192</td>\n",
       "      <td>1.146146</td>\n",
       "      <td>1.118501</td>\n",
       "      <td>1.172986</td>\n",
       "      <td>1.136823</td>\n",
       "      <td>0.973224</td>\n",
       "      <td>0.500526</td>\n",
       "      <td>...</td>\n",
       "      <td>0.567950</td>\n",
       "      <td>0.651098</td>\n",
       "      <td>0.636729</td>\n",
       "      <td>0.494012</td>\n",
       "      <td>1.158916</td>\n",
       "      <td>0.897387</td>\n",
       "      <td>0.686320</td>\n",
       "      <td>0.830602</td>\n",
       "      <td>0.723977</td>\n",
       "      <td>0.725770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-4.076424</td>\n",
       "      <td>-5.923691</td>\n",
       "      <td>-5.994731</td>\n",
       "      <td>-3.971984</td>\n",
       "      <td>-2.739924</td>\n",
       "      <td>-2.662897</td>\n",
       "      <td>-2.908676</td>\n",
       "      <td>-2.864308</td>\n",
       "      <td>-3.671242</td>\n",
       "      <td>-1.666268</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.840486</td>\n",
       "      <td>-2.433961</td>\n",
       "      <td>-2.412614</td>\n",
       "      <td>-1.655962</td>\n",
       "      <td>-6.330275</td>\n",
       "      <td>-3.141490</td>\n",
       "      <td>-1.977225</td>\n",
       "      <td>-3.541112</td>\n",
       "      <td>-2.369304</td>\n",
       "      <td>-2.348784</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>-0.570475</td>\n",
       "      <td>-0.644093</td>\n",
       "      <td>-0.631248</td>\n",
       "      <td>-0.573605</td>\n",
       "      <td>-0.802651</td>\n",
       "      <td>-0.778933</td>\n",
       "      <td>-0.824404</td>\n",
       "      <td>-0.794683</td>\n",
       "      <td>-0.651691</td>\n",
       "      <td>-0.414882</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.472481</td>\n",
       "      <td>-0.517562</td>\n",
       "      <td>-0.529458</td>\n",
       "      <td>-0.404416</td>\n",
       "      <td>-0.751329</td>\n",
       "      <td>-0.602619</td>\n",
       "      <td>-0.497988</td>\n",
       "      <td>-0.542164</td>\n",
       "      <td>-0.532177</td>\n",
       "      <td>-0.493921</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>-0.025821</td>\n",
       "      <td>0.011189</td>\n",
       "      <td>0.006140</td>\n",
       "      <td>-0.044566</td>\n",
       "      <td>0.100558</td>\n",
       "      <td>0.098665</td>\n",
       "      <td>0.109358</td>\n",
       "      <td>0.096585</td>\n",
       "      <td>0.022760</td>\n",
       "      <td>-0.057495</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.091029</td>\n",
       "      <td>-0.083971</td>\n",
       "      <td>-0.089572</td>\n",
       "      <td>-0.051758</td>\n",
       "      <td>0.037410</td>\n",
       "      <td>-0.030474</td>\n",
       "      <td>-0.023799</td>\n",
       "      <td>-0.023471</td>\n",
       "      <td>-0.045003</td>\n",
       "      <td>-0.031623</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>0.561571</td>\n",
       "      <td>0.706946</td>\n",
       "      <td>0.706398</td>\n",
       "      <td>0.547903</td>\n",
       "      <td>0.985159</td>\n",
       "      <td>0.951919</td>\n",
       "      <td>1.007791</td>\n",
       "      <td>0.960813</td>\n",
       "      <td>0.736614</td>\n",
       "      <td>0.312739</td>\n",
       "      <td>...</td>\n",
       "      <td>0.352796</td>\n",
       "      <td>0.412321</td>\n",
       "      <td>0.403382</td>\n",
       "      <td>0.310551</td>\n",
       "      <td>0.844723</td>\n",
       "      <td>0.587382</td>\n",
       "      <td>0.503613</td>\n",
       "      <td>0.508410</td>\n",
       "      <td>0.501191</td>\n",
       "      <td>0.492262</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>7.062717</td>\n",
       "      <td>7.387742</td>\n",
       "      <td>6.413603</td>\n",
       "      <td>7.178692</td>\n",
       "      <td>2.679149</td>\n",
       "      <td>2.717237</td>\n",
       "      <td>2.963899</td>\n",
       "      <td>2.857205</td>\n",
       "      <td>7.552440</td>\n",
       "      <td>2.234669</td>\n",
       "      <td>...</td>\n",
       "      <td>2.199291</td>\n",
       "      <td>2.631498</td>\n",
       "      <td>3.065735</td>\n",
       "      <td>2.238555</td>\n",
       "      <td>6.892682</td>\n",
       "      <td>5.968364</td>\n",
       "      <td>7.267837</td>\n",
       "      <td>6.650673</td>\n",
       "      <td>2.723777</td>\n",
       "      <td>2.845665</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>8 rows × 232 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          CA1 field     CA2 field     CA3 field     CA4 field  \\\n",
       "count  20869.000000  20869.000000  20869.000000  20869.000000   \n",
       "mean       0.003664      0.017002      0.015315     -0.016633   \n",
       "std        0.924456      1.129368      1.078987      0.897192   \n",
       "min       -4.076424     -5.923691     -5.994731     -3.971984   \n",
       "25%       -0.570475     -0.644093     -0.631248     -0.573605   \n",
       "50%       -0.025821      0.011189      0.006140     -0.044566   \n",
       "75%        0.561571      0.706946      0.706398      0.547903   \n",
       "max        7.062717      7.387742      6.413603      7.178692   \n",
       "\n",
       "       Crus I, lateral hemisphere  Crus I, paravermis  \\\n",
       "count                20869.000000        20869.000000   \n",
       "mean                     0.093686            0.088810   \n",
       "std                      1.146146            1.118501   \n",
       "min                     -2.739924           -2.662897   \n",
       "25%                     -0.802651           -0.778933   \n",
       "50%                      0.100558            0.098665   \n",
       "75%                      0.985159            0.951919   \n",
       "max                      2.679149            2.717237   \n",
       "\n",
       "       Crus II, lateral hemisphere  Crus II, paravermis  \\\n",
       "count                 20869.000000         20869.000000   \n",
       "mean                      0.096118             0.087608   \n",
       "std                       1.172986             1.136823   \n",
       "min                      -2.908676            -2.864308   \n",
       "25%                      -0.824404            -0.794683   \n",
       "50%                       0.109358             0.096585   \n",
       "75%                       1.007791             0.960813   \n",
       "max                       2.963899             2.857205   \n",
       "\n",
       "       Edinger-Westphal nucleus  Heschl's gyrus  ...  \\\n",
       "count              20869.000000    20869.000000  ...   \n",
       "mean                   0.047124       -0.042517  ...   \n",
       "std                    0.973224        0.500526  ...   \n",
       "min                   -3.671242       -1.666268  ...   \n",
       "25%                   -0.651691       -0.414882  ...   \n",
       "50%                    0.022760       -0.057495  ...   \n",
       "75%                    0.736614        0.312739  ...   \n",
       "max                    7.552440        2.234669  ...   \n",
       "\n",
       "       temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "count                    20869.000000                  20869.000000   \n",
       "mean                        -0.056050                     -0.051731   \n",
       "std                          0.567950                      0.651098   \n",
       "min                         -1.840486                     -2.433961   \n",
       "25%                         -0.472481                     -0.517562   \n",
       "50%                         -0.091029                     -0.083971   \n",
       "75%                          0.352796                      0.412321   \n",
       "max                          2.199291                      2.631498   \n",
       "\n",
       "       temporal pole, superior aspect  transverse gyri  trochlear nucleus  \\\n",
       "count                    20869.000000     20869.000000       20869.000000   \n",
       "mean                        -0.049905        -0.039528           0.059726   \n",
       "std                          0.636729         0.494012           1.158916   \n",
       "min                         -2.412614        -1.655962          -6.330275   \n",
       "25%                         -0.529458        -0.404416          -0.751329   \n",
       "50%                         -0.089572        -0.051758           0.037410   \n",
       "75%                          0.403382         0.310551           0.844723   \n",
       "max                          3.065735         2.238555           6.892682   \n",
       "\n",
       "       tuberomammillary nucleus  ventral tegmental area  \\\n",
       "count              20869.000000            20869.000000   \n",
       "mean                   0.014856                0.009535   \n",
       "std                    0.897387                0.686320   \n",
       "min                   -3.141490               -1.977225   \n",
       "25%                   -0.602619               -0.497988   \n",
       "50%                   -0.030474               -0.023799   \n",
       "75%                    0.587382                0.503613   \n",
       "max                    5.968364                7.267837   \n",
       "\n",
       "       ventromedial hypothalamic nucleus  vestibular nuclei  zona incerta  \n",
       "count                       20869.000000       20869.000000  20869.000000  \n",
       "mean                           -0.002853          -0.002527      0.013018  \n",
       "std                             0.830602           0.723977      0.725770  \n",
       "min                            -3.541112          -2.369304     -2.348784  \n",
       "25%                            -0.542164          -0.532177     -0.493921  \n",
       "50%                            -0.023471          -0.045003     -0.031623  \n",
       "75%                             0.508410           0.501191      0.492262  \n",
       "max                             6.650673           2.723777      2.845665  \n",
       "\n",
       "[8 rows x 232 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gene_df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `mean` and `std` method return the mean and standard deviation of each column in the dataframe, respectfully. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CA1 field                            0.003664\n",
       "CA2 field                            0.017002\n",
       "CA3 field                            0.015315\n",
       "CA4 field                           -0.016633\n",
       "Crus I, lateral hemisphere           0.093686\n",
       "                                       ...   \n",
       "tuberomammillary nucleus             0.014856\n",
       "ventral tegmental area               0.009535\n",
       "ventromedial hypothalamic nucleus   -0.002853\n",
       "vestibular nuclei                   -0.002527\n",
       "zona incerta                         0.013018\n",
       "Length: 232, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gene_df.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CA1 field                            0.924456\n",
       "CA2 field                            1.129368\n",
       "CA3 field                            1.078987\n",
       "CA4 field                            0.897192\n",
       "Crus I, lateral hemisphere           1.146146\n",
       "                                       ...   \n",
       "tuberomammillary nucleus             0.897387\n",
       "ventral tegmental area               0.686320\n",
       "ventromedial hypothalamic nucleus    0.830602\n",
       "vestibular nuclei                    0.723977\n",
       "zona incerta                         0.725770\n",
       "Length: 232, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gene_df.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say we have two different dataframes and we would like to combine the two into one single dataframe. We can use either the `merge` or `join` Pandas methods in order to pull all of this data into one dataframe. \n",
    "\n",
    "![](http://www.datasciencemadesimple.com/wp-content/uploads/2017/09/join-or-merge-in-python-pandas-1.png)\n",
    "\n",
    "There are different types of joins/merges you can do in Pandas, illustrated <a href=\"http://www.datasciencemadesimple.com/join-merge-data-frames-pandas-python/\">above</a>. Here, we want to do an **inner** merge, where we're only keeping entries with indices that are in both dataframes. We could do this merge based on columns, alternatively.\n",
    "\n",
    "**Inner** is the default kind of join, so we do not need to specify it. And by default, join will use the 'left' dataframe, in other words, the dataframe that is executing the `join` method.\n",
    "\n",
    "If you need more information, look at the <a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html\">join</a> and <a href=\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html\">merge</a> documentation: you can use either of these to unite your dataframes, though join will be simpler!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is an example of how to join two separate dataframe into one, unified dataframe. We start with one dataframe with only entries from the *temporal pole* and another dataframe with only entries from the CA fields of the hippocampus. We can then join the two dataframes together using the syntax `unified_df = df_1.join(df_2)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A1BG</th>\n",
       "      <td>0.277830</td>\n",
       "      <td>0.514923</td>\n",
       "      <td>0.733368</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1BG-AS1</th>\n",
       "      <td>1.074116</td>\n",
       "      <td>0.821031</td>\n",
       "      <td>1.219272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1CF</th>\n",
       "      <td>-0.030265</td>\n",
       "      <td>-0.187367</td>\n",
       "      <td>-0.428358</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2M</th>\n",
       "      <td>-0.058505</td>\n",
       "      <td>0.207109</td>\n",
       "      <td>-0.161808</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2ML1</th>\n",
       "      <td>-0.472908</td>\n",
       "      <td>-0.598317</td>\n",
       "      <td>-0.247797</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "gene_symbol                                                                 \n",
       "A1BG                               0.277830                      0.514923   \n",
       "A1BG-AS1                           1.074116                      0.821031   \n",
       "A1CF                              -0.030265                     -0.187367   \n",
       "A2M                               -0.058505                      0.207109   \n",
       "A2ML1                             -0.472908                     -0.598317   \n",
       "\n",
       "             temporal pole, superior aspect  \n",
       "gene_symbol                                  \n",
       "A1BG                               0.733368  \n",
       "A1BG-AS1                           1.219272  \n",
       "A1CF                              -0.428358  \n",
       "A2M                               -0.161808  \n",
       "A2ML1                             -0.247797  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Dataframe w/ only Temporal Pole entries \n",
    "temporal_pole_df = gene_df[['temporal pole, inferior aspect', \n",
    "                            'temporal pole, medial aspect', \n",
    "                            'temporal pole, superior aspect']]\n",
    "temporal_pole_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A1BG</th>\n",
       "      <td>0.856487</td>\n",
       "      <td>-1.773695</td>\n",
       "      <td>-0.678679</td>\n",
       "      <td>-0.986914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1BG-AS1</th>\n",
       "      <td>0.257664</td>\n",
       "      <td>-1.373085</td>\n",
       "      <td>-0.619923</td>\n",
       "      <td>-0.636275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1CF</th>\n",
       "      <td>-0.089614</td>\n",
       "      <td>-0.546903</td>\n",
       "      <td>0.282914</td>\n",
       "      <td>-0.528926</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2M</th>\n",
       "      <td>0.552415</td>\n",
       "      <td>-0.635485</td>\n",
       "      <td>-0.954995</td>\n",
       "      <td>-0.259745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2ML1</th>\n",
       "      <td>0.758031</td>\n",
       "      <td>1.549857</td>\n",
       "      <td>1.262225</td>\n",
       "      <td>1.338780</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             CA1 field  CA2 field  CA3 field  CA4 field\n",
       "gene_symbol                                            \n",
       "A1BG          0.856487  -1.773695  -0.678679  -0.986914\n",
       "A1BG-AS1      0.257664  -1.373085  -0.619923  -0.636275\n",
       "A1CF         -0.089614  -0.546903   0.282914  -0.528926\n",
       "A2M           0.552415  -0.635485  -0.954995  -0.259745\n",
       "A2ML1         0.758031   1.549857   1.262225   1.338780"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Dataframe w/ only CA field entries \n",
    "CA_field_df = gene_df[['CA1 field', \n",
    "                       'CA2 field', \n",
    "                       'CA3 field', \n",
    "                       'CA4 field']]\n",
    "CA_field_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temporal pole, inferior aspect</th>\n",
       "      <th>temporal pole, medial aspect</th>\n",
       "      <th>temporal pole, superior aspect</th>\n",
       "      <th>CA1 field</th>\n",
       "      <th>CA2 field</th>\n",
       "      <th>CA3 field</th>\n",
       "      <th>CA4 field</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A1BG</th>\n",
       "      <td>0.277830</td>\n",
       "      <td>0.514923</td>\n",
       "      <td>0.733368</td>\n",
       "      <td>0.856487</td>\n",
       "      <td>-1.773695</td>\n",
       "      <td>-0.678679</td>\n",
       "      <td>-0.986914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1BG-AS1</th>\n",
       "      <td>1.074116</td>\n",
       "      <td>0.821031</td>\n",
       "      <td>1.219272</td>\n",
       "      <td>0.257664</td>\n",
       "      <td>-1.373085</td>\n",
       "      <td>-0.619923</td>\n",
       "      <td>-0.636275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A1CF</th>\n",
       "      <td>-0.030265</td>\n",
       "      <td>-0.187367</td>\n",
       "      <td>-0.428358</td>\n",
       "      <td>-0.089614</td>\n",
       "      <td>-0.546903</td>\n",
       "      <td>0.282914</td>\n",
       "      <td>-0.528926</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2M</th>\n",
       "      <td>-0.058505</td>\n",
       "      <td>0.207109</td>\n",
       "      <td>-0.161808</td>\n",
       "      <td>0.552415</td>\n",
       "      <td>-0.635485</td>\n",
       "      <td>-0.954995</td>\n",
       "      <td>-0.259745</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>A2ML1</th>\n",
       "      <td>-0.472908</td>\n",
       "      <td>-0.598317</td>\n",
       "      <td>-0.247797</td>\n",
       "      <td>0.758031</td>\n",
       "      <td>1.549857</td>\n",
       "      <td>1.262225</td>\n",
       "      <td>1.338780</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             temporal pole, inferior aspect  temporal pole, medial aspect  \\\n",
       "gene_symbol                                                                 \n",
       "A1BG                               0.277830                      0.514923   \n",
       "A1BG-AS1                           1.074116                      0.821031   \n",
       "A1CF                              -0.030265                     -0.187367   \n",
       "A2M                               -0.058505                      0.207109   \n",
       "A2ML1                             -0.472908                     -0.598317   \n",
       "\n",
       "             temporal pole, superior aspect  CA1 field  CA2 field  CA3 field  \\\n",
       "gene_symbol                                                                    \n",
       "A1BG                               0.733368   0.856487  -1.773695  -0.678679   \n",
       "A1BG-AS1                           1.219272   0.257664  -1.373085  -0.619923   \n",
       "A1CF                              -0.428358  -0.089614  -0.546903   0.282914   \n",
       "A2M                               -0.161808   0.552415  -0.635485  -0.954995   \n",
       "A2ML1                             -0.247797   0.758031   1.549857   1.262225   \n",
       "\n",
       "             CA4 field  \n",
       "gene_symbol             \n",
       "A1BG         -0.986914  \n",
       "A1BG-AS1     -0.636275  \n",
       "A1CF         -0.528926  \n",
       "A2M          -0.259745  \n",
       "A2ML1         1.338780  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Join the two dataframes\n",
    "df_1 = temporal_pole_df\n",
    "df_2 = CA_field_df\n",
    "\n",
    "unified_df = df_1.join(df_2)\n",
    "unified_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Those are the basics of working with Pandas dataframes! Circle back to this page or the resources linked within if you ever need a refresher. Next, we'll talk about the power of SciPy for scientific analysis in Python.\n",
    "\n",
    "## Additional resources\n",
    "See the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) for a more in depth exploration of Pandas, and of course, the [Pandas documentation](https://pandas.pydata.org/docs/user_guide/index.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}