{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this chapter we will be discussing the advatages to using the *Pandas* package to analyze large datasets. While NumPy is a useful package, it can only be used with data of the same datatype. [Pandas](https://pandas.pydata.org/docs/user_guide/index.html#user-guide), however, is a package that helps us manipulate and analyze heterogenous data.\n", "\n", "We strongly recommend looking at [\"10 minutes to pandas\"](https://pandas.pydata.org/docs/user_guide/10min.html) for a broader overview, but here we'll introduce the main concepts needed for the activities in this textbook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can use pandas, we need to import it. We can also nickname the modules when we import them. The convention is to import `pandas` as `pd`. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Variable Type Data/Info\n", "------------------------------\n", "pd module ages/pandas/__init__.py'>\n" ] } ], "source": [ "# Import packages\n", "import pandas as pd\n", "\n", "# Use whos 'magic command' to see available modules\n", "%whos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create and Manipulate Dataframes \n", "The two data structures of Pandas are the `Series` and the `DataFrame`. A `Series` is a one-dimensional onject similar to a list. A `DataFrame` can be thought of as a two-dimensional numpy array or a collection of `Series` objects. Series and dataframes can contain multiple different data types such as integers, strings, and floats, similar to an Excel spreadsheet. Pandas also supports `string` lables unlike numpy arrays which only have numeric labels for their rows and columns. For a more in depth explanation, please visit the [Introduction to Data Structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html) section in the Pandas User Guide. \n", "\n", "You can create a Pandas dataframe by inputting dictionaries into the Pandas function `pd.DataFrame()`, by reading files, or through functions built into the Pandas package. The function [`pd.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) reads a comma- or tab-separated file and returns it as a `dataframe`.\n", "\n", "\n", "### DataFrame example\n", "Below we will create a dataframe by reading the file `brainarea_vs_genes_exp_w_reannotations.tsv` which contains information on gene expression accross multiple brain areas. \n", "\n", ">**About this dataset:**\n", "This dataset was created by Derek Howard and Abigail Mayes for the purpose of accelerating advances in data mining of open brain transcriptome data for polygenetic brain disorders. The data comes from normalized microarray datasets of gene expression from 6 adult human brains that was released by the Allen Brain Institute and then processed into the dataframe we will see below. For more information on this dataset please visit the HBAsets repository. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gene_symbolCA1 fieldCA2 fieldCA3 fieldCA4 fieldCrus I, lateral hemisphereCrus I, paravermisCrus II, lateral hemisphereCrus II, paravermisEdinger-Westphal nucleus...temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspecttransverse gyritrochlear nucleustuberomammillary nucleusventral tegmental areaventromedial hypothalamic nucleusvestibular nucleizona incerta
0A1BG0.856487-1.773695-0.678679-0.9869140.8269860.9480390.9354271.120774-1.018554...0.2778300.5149230.733368-0.104286-0.9102451.039610-0.155167-0.444398-0.901361-0.236790
1A1BG-AS10.257664-1.373085-0.619923-0.6362750.3627990.3532960.4227660.346853-0.812015...1.0741160.8210311.2192720.901213-1.5224310.598719-1.709745-0.054156-1.695843-1.155961
2A1CF-0.089614-0.5469030.282914-0.5289260.5079160.5776960.6476710.3068240.089958...-0.030265-0.187367-0.428358-0.465863-0.1369361.229487-0.110680-0.118175-0.1397760.123829
3A2M0.552415-0.635485-0.954995-0.259745-1.687391-1.756847-1.640242-1.733110-0.091695...-0.0585050.207109-0.1618080.1836300.948098-0.9776920.911896-0.4993571.4693860.557998
4A2ML10.7580311.5498571.2622251.338780-0.289888-0.407026-0.358798-0.5899880.944684...-0.472908-0.598317-0.247797-0.2826731.3963650.9450430.1582020.5727710.073088-0.886780
\n", "

5 rows × 233 columns

\n", "
" ], "text/plain": [ " gene_symbol CA1 field CA2 field CA3 field CA4 field \\\n", "0 A1BG 0.856487 -1.773695 -0.678679 -0.986914 \n", "1 A1BG-AS1 0.257664 -1.373085 -0.619923 -0.636275 \n", "2 A1CF -0.089614 -0.546903 0.282914 -0.528926 \n", "3 A2M 0.552415 -0.635485 -0.954995 -0.259745 \n", "4 A2ML1 0.758031 1.549857 1.262225 1.338780 \n", "\n", " Crus I, lateral hemisphere Crus I, paravermis \\\n", "0 0.826986 0.948039 \n", "1 0.362799 0.353296 \n", "2 0.507916 0.577696 \n", "3 -1.687391 -1.756847 \n", "4 -0.289888 -0.407026 \n", "\n", " Crus II, lateral hemisphere Crus II, paravermis Edinger-Westphal nucleus \\\n", "0 0.935427 1.120774 -1.018554 \n", "1 0.422766 0.346853 -0.812015 \n", "2 0.647671 0.306824 0.089958 \n", "3 -1.640242 -1.733110 -0.091695 \n", "4 -0.358798 -0.589988 0.944684 \n", "\n", " ... temporal pole, inferior aspect temporal pole, medial aspect \\\n", "0 ... 0.277830 0.514923 \n", "1 ... 1.074116 0.821031 \n", "2 ... -0.030265 -0.187367 \n", "3 ... -0.058505 0.207109 \n", "4 ... -0.472908 -0.598317 \n", "\n", " temporal pole, superior aspect transverse gyri trochlear nucleus \\\n", "0 0.733368 -0.104286 -0.910245 \n", "1 1.219272 0.901213 -1.522431 \n", "2 -0.428358 -0.465863 -0.136936 \n", "3 -0.161808 0.183630 0.948098 \n", "4 -0.247797 -0.282673 1.396365 \n", "\n", " tuberomammillary nucleus ventral tegmental area \\\n", "0 1.039610 -0.155167 \n", "1 0.598719 -1.709745 \n", "2 1.229487 -0.110680 \n", "3 -0.977692 0.911896 \n", "4 0.945043 0.158202 \n", "\n", " ventromedial hypothalamic nucleus vestibular nuclei zona incerta \n", "0 -0.444398 -0.901361 -0.236790 \n", "1 -0.054156 -1.695843 -1.155961 \n", "2 -0.118175 -0.139776 0.123829 \n", "3 -0.499357 1.469386 0.557998 \n", "4 0.572771 0.073088 -0.886780 \n", "\n", "[5 rows x 233 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Read in the list of lists as a data frame\n", "file_name = 'brainarea_vs_genes_exp_w_reannotations.tsv'\n", "gene_df = pd.read_csv(file_name, sep='\\t')\n", "\n", "# '.head()' returns the first 5 rows in the dataframe\n", "gene_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At the moment, the first column of information above, the **index** just contains a list of numbers. We can reassign the row labels by using the method `set_index()`. We can choose any column in our present dataframe to be the row values. Let's assign the row lables to be the `gene_symbol` and reassign the dataframe. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CA1 fieldCA2 fieldCA3 fieldCA4 fieldCrus I, lateral hemisphereCrus I, paravermisCrus II, lateral hemisphereCrus II, paravermisEdinger-Westphal nucleusHeschl's gyrus...temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspecttransverse gyritrochlear nucleustuberomammillary nucleusventral tegmental areaventromedial hypothalamic nucleusvestibular nucleizona incerta
gene_symbol
A1BG0.856487-1.773695-0.678679-0.9869140.8269860.9480390.9354271.120774-1.0185540.170282...0.2778300.5149230.733368-0.104286-0.9102451.039610-0.155167-0.444398-0.901361-0.236790
A1BG-AS10.257664-1.373085-0.619923-0.6362750.3627990.3532960.4227660.346853-0.8120150.903358...1.0741160.8210311.2192720.901213-1.5224310.598719-1.709745-0.054156-1.695843-1.155961
A1CF-0.089614-0.5469030.282914-0.5289260.5079160.5776960.6476710.3068240.0899580.149820...-0.030265-0.187367-0.428358-0.465863-0.1369361.229487-0.110680-0.118175-0.1397760.123829
A2M0.552415-0.635485-0.954995-0.259745-1.687391-1.756847-1.640242-1.733110-0.0916950.003428...-0.0585050.207109-0.1618080.1836300.948098-0.9776920.911896-0.4993571.4693860.557998
A2ML10.7580311.5498571.2622251.338780-0.289888-0.407026-0.358798-0.5899880.944684-0.466327...-0.472908-0.598317-0.247797-0.2826731.3963650.9450430.1582020.5727710.073088-0.886780
\n", "

5 rows × 232 columns

\n", "
" ], "text/plain": [ " CA1 field CA2 field CA3 field CA4 field \\\n", "gene_symbol \n", "A1BG 0.856487 -1.773695 -0.678679 -0.986914 \n", "A1BG-AS1 0.257664 -1.373085 -0.619923 -0.636275 \n", "A1CF -0.089614 -0.546903 0.282914 -0.528926 \n", "A2M 0.552415 -0.635485 -0.954995 -0.259745 \n", "A2ML1 0.758031 1.549857 1.262225 1.338780 \n", "\n", " Crus I, lateral hemisphere Crus I, paravermis \\\n", "gene_symbol \n", "A1BG 0.826986 0.948039 \n", "A1BG-AS1 0.362799 0.353296 \n", "A1CF 0.507916 0.577696 \n", "A2M -1.687391 -1.756847 \n", "A2ML1 -0.289888 -0.407026 \n", "\n", " Crus II, lateral hemisphere Crus II, paravermis \\\n", "gene_symbol \n", "A1BG 0.935427 1.120774 \n", "A1BG-AS1 0.422766 0.346853 \n", "A1CF 0.647671 0.306824 \n", "A2M -1.640242 -1.733110 \n", "A2ML1 -0.358798 -0.589988 \n", "\n", " Edinger-Westphal nucleus Heschl's gyrus ... \\\n", "gene_symbol ... \n", "A1BG -1.018554 0.170282 ... \n", "A1BG-AS1 -0.812015 0.903358 ... \n", "A1CF 0.089958 0.149820 ... \n", "A2M -0.091695 0.003428 ... \n", "A2ML1 0.944684 -0.466327 ... \n", "\n", " temporal pole, inferior aspect temporal pole, medial aspect \\\n", "gene_symbol \n", "A1BG 0.277830 0.514923 \n", "A1BG-AS1 1.074116 0.821031 \n", "A1CF -0.030265 -0.187367 \n", "A2M -0.058505 0.207109 \n", "A2ML1 -0.472908 -0.598317 \n", "\n", " temporal pole, superior aspect transverse gyri \\\n", "gene_symbol \n", "A1BG 0.733368 -0.104286 \n", "A1BG-AS1 1.219272 0.901213 \n", "A1CF -0.428358 -0.465863 \n", "A2M -0.161808 0.183630 \n", "A2ML1 -0.247797 -0.282673 \n", "\n", " trochlear nucleus tuberomammillary nucleus \\\n", "gene_symbol \n", "A1BG -0.910245 1.039610 \n", "A1BG-AS1 -1.522431 0.598719 \n", "A1CF -0.136936 1.229487 \n", "A2M 0.948098 -0.977692 \n", "A2ML1 1.396365 0.945043 \n", "\n", " ventral tegmental area ventromedial hypothalamic nucleus \\\n", "gene_symbol \n", "A1BG -0.155167 -0.444398 \n", "A1BG-AS1 -1.709745 -0.054156 \n", "A1CF -0.110680 -0.118175 \n", "A2M 0.911896 -0.499357 \n", "A2ML1 0.158202 0.572771 \n", "\n", " vestibular nuclei zona incerta \n", "gene_symbol \n", "A1BG -0.901361 -0.236790 \n", "A1BG-AS1 -1.695843 -1.155961 \n", "A1CF -0.139776 0.123829 \n", "A2M 1.469386 0.557998 \n", "A2ML1 0.073088 -0.886780 \n", "\n", "[5 rows x 232 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row_index = 'gene_symbol'\n", "gene_df = gene_df.set_index(row_index)\n", "gene_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It would help to know what information is in our dataset. In other words, what is across the columns at the top? We can get a list by accessing the `columns` attribute. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['CA1 field', 'CA2 field', 'CA3 field', 'CA4 field',\n", " 'Crus I, lateral hemisphere', 'Crus I, paravermis',\n", " 'Crus II, lateral hemisphere', 'Crus II, paravermis',\n", " 'Edinger-Westphal nucleus', 'Heschl's gyrus',\n", " ...\n", " 'temporal pole, inferior aspect', 'temporal pole, medial aspect',\n", " 'temporal pole, superior aspect', 'transverse gyri',\n", " 'trochlear nucleus', 'tuberomammillary nucleus',\n", " 'ventral tegmental area', 'ventromedial hypothalamic nucleus',\n", " 'vestibular nuclei', 'zona incerta'],\n", " dtype='object', length=232)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Access the columns of our dataframe \n", "gene_df_columns = gene_df.columns \n", "gene_df_columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing Dataframes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexing in Pandas works slightly different than in NumPy. Similar to a dictionary, we can index dataframes by their names. \n", "\n", "The syntax for indexing single locations in a dataframe is `dataframe.loc[row_label,column_label]`. To index an individual column, we use the shorthand syntax `dataframe.[column_label]`. To index an individual row, we use the syntax `dataframe.loc[row_label]`. To index by index #, we use the syntax `dataframe.iloc[index_number]`. Below are some examples on how to access rows, columns, and single values in our dataframe. For more information on indexing dataframes, visit the \"Indexing and selecting data\" section in the Pandas User Guide." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gene expression values in CA1 field:\n" ] }, { "data": { "text/plain": [ "gene_symbol\n", "A1BG 0.856487\n", "A1BG-AS1 0.257664\n", "A1CF -0.089614\n", "A2M 0.552415\n", "A2ML1 0.758031\n", " ... \n", "ZYG11A -0.496398\n", "ZYG11B -0.856866\n", "ZYX -1.941816\n", "ZZEF1 -0.015748\n", "ZZZ3 -0.924901\n", "Name: CA1 field, Length: 20869, dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a single column\n", "column = 'CA1 field'\n", "print('Gene expression values in CA1 field:')\n", "gene_df[column]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gene expression of A1BG across brain regions:\n" ] }, { "data": { "text/plain": [ "CA1 field 0.856487\n", "CA2 field -1.773695\n", "CA3 field -0.678679\n", "CA4 field -0.986914\n", "Crus I, lateral hemisphere 0.826986\n", " ... \n", "tuberomammillary nucleus 1.039610\n", "ventral tegmental area -0.155167\n", "ventromedial hypothalamic nucleus -0.444398\n", "vestibular nuclei -0.901361\n", "zona incerta -0.236790\n", "Name: A1BG, Length: 232, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a single row\n", "row = 'A1BG'\n", "print('Gene expression of ', row, ' across brain regions:')\n", "gene_df.loc[row]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gene expression of A1BG in CA1 field:\n" ] }, { "data": { "text/plain": [ "0.8564873784944677" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select an individual value \n", "print('Gene expression of A1BG in CA1 field:')\n", "gene_df.loc[row, column]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To select multiple different columns, you can use a `list` of all your columns of interest as so:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gene expression values in multiple regions :\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CA1 fieldCA3 fieldCrus I, lateral hemisphere
gene_symbol
A1BG0.856487-0.6786790.826986
A1BG-AS10.257664-0.6199230.362799
A1CF-0.0896140.2829140.507916
A2M0.552415-0.954995-1.687391
A2ML10.7580311.262225-0.289888
............
ZYG11A-0.4963980.3255550.158885
ZYG11B-0.8568660.7018780.337138
ZYX-1.941816-0.6812550.872683
ZZEF1-0.0157480.7436091.108376
ZZZ3-0.9249010.108320-1.591413
\n", "

20869 rows × 3 columns

\n", "
" ], "text/plain": [ " CA1 field CA3 field Crus I, lateral hemisphere\n", "gene_symbol \n", "A1BG 0.856487 -0.678679 0.826986\n", "A1BG-AS1 0.257664 -0.619923 0.362799\n", "A1CF -0.089614 0.282914 0.507916\n", "A2M 0.552415 -0.954995 -1.687391\n", "A2ML1 0.758031 1.262225 -0.289888\n", "... ... ... ...\n", "ZYG11A -0.496398 0.325555 0.158885\n", "ZYG11B -0.856866 0.701878 0.337138\n", "ZYX -1.941816 -0.681255 0.872683\n", "ZZEF1 -0.015748 0.743609 1.108376\n", "ZZZ3 -0.924901 0.108320 -1.591413\n", "\n", "[20869 rows x 3 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select multiple columns\n", "print('Gene expression values in multiple regions :')\n", "columns = ['CA1 field', 'CA3 field', 'Crus I, lateral hemisphere']\n", "gene_df[columns]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subsetting " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like NumPy arrays, we can subset our original dataframe to only include data that meets our criteria. Our dataframe has data on multiple different brain areas with many gene expression values. You can filter this dataframe using the following syntax:\n", "```\n", "new_df = original_df[original_df['Column of Interest'] == 'Desired Value']\n", "```\n", "In plain english, what this is saying is: save a dataframe from the original dataframe, where the original dataframe values in my Column of Interest are equal to my Desired Value. For more information on subsetting, visit the \"How do I select a subset of a DataFrame\" section in the Pandas documentation. \n", "\n", "Below we will demonstrate how to execute this by taking a look at the `CA1 field` column in `gene_df`. We will create a dataframe from `gene_df` that only contains genes that showed a certain level of gene expression. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CA1 fieldCA2 fieldCA3 fieldCA4 fieldCrus I, lateral hemisphereCrus I, paravermisCrus II, lateral hemisphereCrus II, paravermisEdinger-Westphal nucleusHeschl's gyrus...temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspecttransverse gyritrochlear nucleustuberomammillary nucleusventral tegmental areaventromedial hypothalamic nucleusvestibular nucleizona incerta
gene_symbol
ABCC122.0899990.6848370.097313-0.051411-1.078900-0.912071-1.131497-0.799075-0.0094230.889132...1.2893400.8858861.2710530.650801-0.083413-0.793237-0.499512-0.762330-0.902496-0.904421
ABHD17C1.7169730.6010411.1325001.354679-0.923195-0.887576-1.122027-0.9268760.3396780.208174...0.5309830.8091010.763170-0.093598-0.313468-0.164013-0.5379711.169105-0.663245-0.377181
ABI12.0517622.5717772.4721882.2611700.3661380.5077830.4496060.498424-1.3652420.472104...0.8684711.1970220.9147830.473846-2.230209-0.330684-1.153189-0.073650-1.681602-1.049258
ACTB2.4897112.8066882.4616552.340131-1.296731-1.334696-1.158460-1.4610270.0882090.027236...-0.370215-0.946920-0.197363-0.0944680.926183-0.0343370.803164-0.3892380.5969160.110921
ACTR22.6550492.3844451.7283481.413585-1.377474-1.207922-1.496018-1.3594620.6327640.259402...0.2156830.1428800.0806490.3122181.793563-0.028263-0.189930-0.389064-0.484580-1.478232
\n", "

5 rows × 232 columns

\n", "
" ], "text/plain": [ " CA1 field CA2 field CA3 field CA4 field \\\n", "gene_symbol \n", "ABCC12 2.089999 0.684837 0.097313 -0.051411 \n", "ABHD17C 1.716973 0.601041 1.132500 1.354679 \n", "ABI1 2.051762 2.571777 2.472188 2.261170 \n", "ACTB 2.489711 2.806688 2.461655 2.340131 \n", "ACTR2 2.655049 2.384445 1.728348 1.413585 \n", "\n", " Crus I, lateral hemisphere Crus I, paravermis \\\n", "gene_symbol \n", "ABCC12 -1.078900 -0.912071 \n", "ABHD17C -0.923195 -0.887576 \n", "ABI1 0.366138 0.507783 \n", "ACTB -1.296731 -1.334696 \n", "ACTR2 -1.377474 -1.207922 \n", "\n", " Crus II, lateral hemisphere Crus II, paravermis \\\n", "gene_symbol \n", "ABCC12 -1.131497 -0.799075 \n", "ABHD17C -1.122027 -0.926876 \n", "ABI1 0.449606 0.498424 \n", "ACTB -1.158460 -1.461027 \n", "ACTR2 -1.496018 -1.359462 \n", "\n", " Edinger-Westphal nucleus Heschl's gyrus ... \\\n", "gene_symbol ... \n", "ABCC12 -0.009423 0.889132 ... \n", "ABHD17C 0.339678 0.208174 ... \n", "ABI1 -1.365242 0.472104 ... \n", "ACTB 0.088209 0.027236 ... \n", "ACTR2 0.632764 0.259402 ... \n", "\n", " temporal pole, inferior aspect temporal pole, medial aspect \\\n", "gene_symbol \n", "ABCC12 1.289340 0.885886 \n", "ABHD17C 0.530983 0.809101 \n", "ABI1 0.868471 1.197022 \n", "ACTB -0.370215 -0.946920 \n", "ACTR2 0.215683 0.142880 \n", "\n", " temporal pole, superior aspect transverse gyri \\\n", "gene_symbol \n", "ABCC12 1.271053 0.650801 \n", "ABHD17C 0.763170 -0.093598 \n", "ABI1 0.914783 0.473846 \n", "ACTB -0.197363 -0.094468 \n", "ACTR2 0.080649 0.312218 \n", "\n", " trochlear nucleus tuberomammillary nucleus \\\n", "gene_symbol \n", "ABCC12 -0.083413 -0.793237 \n", "ABHD17C -0.313468 -0.164013 \n", "ABI1 -2.230209 -0.330684 \n", "ACTB 0.926183 -0.034337 \n", "ACTR2 1.793563 -0.028263 \n", "\n", " ventral tegmental area ventromedial hypothalamic nucleus \\\n", "gene_symbol \n", "ABCC12 -0.499512 -0.762330 \n", "ABHD17C -0.537971 1.169105 \n", "ABI1 -1.153189 -0.073650 \n", "ACTB 0.803164 -0.389238 \n", "ACTR2 -0.189930 -0.389064 \n", "\n", " vestibular nuclei zona incerta \n", "gene_symbol \n", "ABCC12 -0.902496 -0.904421 \n", "ABHD17C -0.663245 -0.377181 \n", "ABI1 -1.681602 -1.049258 \n", "ACTB 0.596916 0.110921 \n", "ACTR2 -0.484580 -1.478232 \n", "\n", "[5 rows x 232 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a dataframe with only genes that have an expression \n", "# value greater than 1.7 in 'CA1 field' \n", "desired_column = 'CA1 field'\n", "desired_value = 1.7\n", "new_gene_df = gene_df[gene_df[desired_column] > desired_value]\n", "new_gene_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DataFrame Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas has many useful methods that you can use on your data, including `describe`, `mean`, and more. To learn more about all the different methods that can be used to manipulate and analyze dataframes, please visit the Pandas User Guide . We will demonstrate some of these methods below. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `describe` method returns descriptive statistics of all the columns in our dataframe. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CA1 fieldCA2 fieldCA3 fieldCA4 fieldCrus I, lateral hemisphereCrus I, paravermisCrus II, lateral hemisphereCrus II, paravermisEdinger-Westphal nucleusHeschl's gyrus...temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspecttransverse gyritrochlear nucleustuberomammillary nucleusventral tegmental areaventromedial hypothalamic nucleusvestibular nucleizona incerta
count20869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.000000...20869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.00000020869.000000
mean0.0036640.0170020.015315-0.0166330.0936860.0888100.0961180.0876080.047124-0.042517...-0.056050-0.051731-0.049905-0.0395280.0597260.0148560.009535-0.002853-0.0025270.013018
std0.9244561.1293681.0789870.8971921.1461461.1185011.1729861.1368230.9732240.500526...0.5679500.6510980.6367290.4940121.1589160.8973870.6863200.8306020.7239770.725770
min-4.076424-5.923691-5.994731-3.971984-2.739924-2.662897-2.908676-2.864308-3.671242-1.666268...-1.840486-2.433961-2.412614-1.655962-6.330275-3.141490-1.977225-3.541112-2.369304-2.348784
25%-0.570475-0.644093-0.631248-0.573605-0.802651-0.778933-0.824404-0.794683-0.651691-0.414882...-0.472481-0.517562-0.529458-0.404416-0.751329-0.602619-0.497988-0.542164-0.532177-0.493921
50%-0.0258210.0111890.006140-0.0445660.1005580.0986650.1093580.0965850.022760-0.057495...-0.091029-0.083971-0.089572-0.0517580.037410-0.030474-0.023799-0.023471-0.045003-0.031623
75%0.5615710.7069460.7063980.5479030.9851590.9519191.0077910.9608130.7366140.312739...0.3527960.4123210.4033820.3105510.8447230.5873820.5036130.5084100.5011910.492262
max7.0627177.3877426.4136037.1786922.6791492.7172372.9638992.8572057.5524402.234669...2.1992912.6314983.0657352.2385556.8926825.9683647.2678376.6506732.7237772.845665
\n", "

8 rows × 232 columns

\n", "
" ], "text/plain": [ " CA1 field CA2 field CA3 field CA4 field \\\n", "count 20869.000000 20869.000000 20869.000000 20869.000000 \n", "mean 0.003664 0.017002 0.015315 -0.016633 \n", "std 0.924456 1.129368 1.078987 0.897192 \n", "min -4.076424 -5.923691 -5.994731 -3.971984 \n", "25% -0.570475 -0.644093 -0.631248 -0.573605 \n", "50% -0.025821 0.011189 0.006140 -0.044566 \n", "75% 0.561571 0.706946 0.706398 0.547903 \n", "max 7.062717 7.387742 6.413603 7.178692 \n", "\n", " Crus I, lateral hemisphere Crus I, paravermis \\\n", "count 20869.000000 20869.000000 \n", "mean 0.093686 0.088810 \n", "std 1.146146 1.118501 \n", "min -2.739924 -2.662897 \n", "25% -0.802651 -0.778933 \n", "50% 0.100558 0.098665 \n", "75% 0.985159 0.951919 \n", "max 2.679149 2.717237 \n", "\n", " Crus II, lateral hemisphere Crus II, paravermis \\\n", "count 20869.000000 20869.000000 \n", "mean 0.096118 0.087608 \n", "std 1.172986 1.136823 \n", "min -2.908676 -2.864308 \n", "25% -0.824404 -0.794683 \n", "50% 0.109358 0.096585 \n", "75% 1.007791 0.960813 \n", "max 2.963899 2.857205 \n", "\n", " Edinger-Westphal nucleus Heschl's gyrus ... \\\n", "count 20869.000000 20869.000000 ... \n", "mean 0.047124 -0.042517 ... \n", "std 0.973224 0.500526 ... \n", "min -3.671242 -1.666268 ... \n", "25% -0.651691 -0.414882 ... \n", "50% 0.022760 -0.057495 ... \n", "75% 0.736614 0.312739 ... \n", "max 7.552440 2.234669 ... \n", "\n", " temporal pole, inferior aspect temporal pole, medial aspect \\\n", "count 20869.000000 20869.000000 \n", "mean -0.056050 -0.051731 \n", "std 0.567950 0.651098 \n", "min -1.840486 -2.433961 \n", "25% -0.472481 -0.517562 \n", "50% -0.091029 -0.083971 \n", "75% 0.352796 0.412321 \n", "max 2.199291 2.631498 \n", "\n", " temporal pole, superior aspect transverse gyri trochlear nucleus \\\n", "count 20869.000000 20869.000000 20869.000000 \n", "mean -0.049905 -0.039528 0.059726 \n", "std 0.636729 0.494012 1.158916 \n", "min -2.412614 -1.655962 -6.330275 \n", "25% -0.529458 -0.404416 -0.751329 \n", "50% -0.089572 -0.051758 0.037410 \n", "75% 0.403382 0.310551 0.844723 \n", "max 3.065735 2.238555 6.892682 \n", "\n", " tuberomammillary nucleus ventral tegmental area \\\n", "count 20869.000000 20869.000000 \n", "mean 0.014856 0.009535 \n", "std 0.897387 0.686320 \n", "min -3.141490 -1.977225 \n", "25% -0.602619 -0.497988 \n", "50% -0.030474 -0.023799 \n", "75% 0.587382 0.503613 \n", "max 5.968364 7.267837 \n", "\n", " ventromedial hypothalamic nucleus vestibular nuclei zona incerta \n", "count 20869.000000 20869.000000 20869.000000 \n", "mean -0.002853 -0.002527 0.013018 \n", "std 0.830602 0.723977 0.725770 \n", "min -3.541112 -2.369304 -2.348784 \n", "25% -0.542164 -0.532177 -0.493921 \n", "50% -0.023471 -0.045003 -0.031623 \n", "75% 0.508410 0.501191 0.492262 \n", "max 6.650673 2.723777 2.845665 \n", "\n", "[8 rows x 232 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene_df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `mean` and `std` method return the mean and standard deviation of each column in the dataframe, respectfully. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CA1 field 0.003664\n", "CA2 field 0.017002\n", "CA3 field 0.015315\n", "CA4 field -0.016633\n", "Crus I, lateral hemisphere 0.093686\n", " ... \n", "tuberomammillary nucleus 0.014856\n", "ventral tegmental area 0.009535\n", "ventromedial hypothalamic nucleus -0.002853\n", "vestibular nuclei -0.002527\n", "zona incerta 0.013018\n", "Length: 232, dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene_df.mean()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CA1 field 0.924456\n", "CA2 field 1.129368\n", "CA3 field 1.078987\n", "CA4 field 0.897192\n", "Crus I, lateral hemisphere 1.146146\n", " ... \n", "tuberomammillary nucleus 0.897387\n", "ventral tegmental area 0.686320\n", "ventromedial hypothalamic nucleus 0.830602\n", "vestibular nuclei 0.723977\n", "zona incerta 0.725770\n", "Length: 232, dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene_df.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we have two different dataframes and we would like to combine the two into one single dataframe. We can use either the `merge` or `join` Pandas methods in order to pull all of this data into one dataframe. \n", "\n", "![](http://www.datasciencemadesimple.com/wp-content/uploads/2017/09/join-or-merge-in-python-pandas-1.png)\n", "\n", "There are different types of joins/merges you can do in Pandas, illustrated above. Here, we want to do an **inner** merge, where we're only keeping entries with indices that are in both dataframes. We could do this merge based on columns, alternatively.\n", "\n", "**Inner** is the default kind of join, so we do not need to specify it. And by default, join will use the 'left' dataframe, in other words, the dataframe that is executing the `join` method.\n", "\n", "If you need more information, look at the join and merge documentation: you can use either of these to unite your dataframes, though join will be simpler!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is an example of how to join two separate dataframe into one, unified dataframe. We start with one dataframe with only entries from the *temporal pole* and another dataframe with only entries from the CA fields of the hippocampus. We can then join the two dataframes together using the syntax `unified_df = df_1.join(df_2)`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspect
gene_symbol
A1BG0.2778300.5149230.733368
A1BG-AS11.0741160.8210311.219272
A1CF-0.030265-0.187367-0.428358
A2M-0.0585050.207109-0.161808
A2ML1-0.472908-0.598317-0.247797
\n", "
" ], "text/plain": [ " temporal pole, inferior aspect temporal pole, medial aspect \\\n", "gene_symbol \n", "A1BG 0.277830 0.514923 \n", "A1BG-AS1 1.074116 0.821031 \n", "A1CF -0.030265 -0.187367 \n", "A2M -0.058505 0.207109 \n", "A2ML1 -0.472908 -0.598317 \n", "\n", " temporal pole, superior aspect \n", "gene_symbol \n", "A1BG 0.733368 \n", "A1BG-AS1 1.219272 \n", "A1CF -0.428358 \n", "A2M -0.161808 \n", "A2ML1 -0.247797 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dataframe w/ only Temporal Pole entries \n", "temporal_pole_df = gene_df[['temporal pole, inferior aspect', \n", " 'temporal pole, medial aspect', \n", " 'temporal pole, superior aspect']]\n", "temporal_pole_df.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CA1 fieldCA2 fieldCA3 fieldCA4 field
gene_symbol
A1BG0.856487-1.773695-0.678679-0.986914
A1BG-AS10.257664-1.373085-0.619923-0.636275
A1CF-0.089614-0.5469030.282914-0.528926
A2M0.552415-0.635485-0.954995-0.259745
A2ML10.7580311.5498571.2622251.338780
\n", "
" ], "text/plain": [ " CA1 field CA2 field CA3 field CA4 field\n", "gene_symbol \n", "A1BG 0.856487 -1.773695 -0.678679 -0.986914\n", "A1BG-AS1 0.257664 -1.373085 -0.619923 -0.636275\n", "A1CF -0.089614 -0.546903 0.282914 -0.528926\n", "A2M 0.552415 -0.635485 -0.954995 -0.259745\n", "A2ML1 0.758031 1.549857 1.262225 1.338780" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dataframe w/ only CA field entries \n", "CA_field_df = gene_df[['CA1 field', \n", " 'CA2 field', \n", " 'CA3 field', \n", " 'CA4 field']]\n", "CA_field_df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temporal pole, inferior aspecttemporal pole, medial aspecttemporal pole, superior aspectCA1 fieldCA2 fieldCA3 fieldCA4 field
gene_symbol
A1BG0.2778300.5149230.7333680.856487-1.773695-0.678679-0.986914
A1BG-AS11.0741160.8210311.2192720.257664-1.373085-0.619923-0.636275
A1CF-0.030265-0.187367-0.428358-0.089614-0.5469030.282914-0.528926
A2M-0.0585050.207109-0.1618080.552415-0.635485-0.954995-0.259745
A2ML1-0.472908-0.598317-0.2477970.7580311.5498571.2622251.338780
\n", "
" ], "text/plain": [ " temporal pole, inferior aspect temporal pole, medial aspect \\\n", "gene_symbol \n", "A1BG 0.277830 0.514923 \n", "A1BG-AS1 1.074116 0.821031 \n", "A1CF -0.030265 -0.187367 \n", "A2M -0.058505 0.207109 \n", "A2ML1 -0.472908 -0.598317 \n", "\n", " temporal pole, superior aspect CA1 field CA2 field CA3 field \\\n", "gene_symbol \n", "A1BG 0.733368 0.856487 -1.773695 -0.678679 \n", "A1BG-AS1 1.219272 0.257664 -1.373085 -0.619923 \n", "A1CF -0.428358 -0.089614 -0.546903 0.282914 \n", "A2M -0.161808 0.552415 -0.635485 -0.954995 \n", "A2ML1 -0.247797 0.758031 1.549857 1.262225 \n", "\n", " CA4 field \n", "gene_symbol \n", "A1BG -0.986914 \n", "A1BG-AS1 -0.636275 \n", "A1CF -0.528926 \n", "A2M -0.259745 \n", "A2ML1 1.338780 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Join the two dataframes\n", "df_1 = temporal_pole_df\n", "df_2 = CA_field_df\n", "\n", "unified_df = df_1.join(df_2)\n", "unified_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Those are the basics of working with Pandas dataframes! Circle back to this page or the resources linked within if you ever need a refresher. Next, we'll talk about the power of SciPy for scientific analysis in Python.\n", "\n", "## Additional resources\n", "See the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) for a more in depth exploration of Pandas, and of course, the [Pandas documentation](https://pandas.pydata.org/docs/user_guide/index.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }