{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning workshop\n", "\n", "In this workshop, we will study GSE53987 dataset on Bipolar disorder (BD) and major depressive disorder (MDD) and schizophrenia. You can download it [here](https://github.com/BRITE-REU/programming-workshops/blob/master/source/workshops/04_Machine_learning/data/GSE53987_combined.csv).\n", "\n", "In total there are 205 rows consisting of 19 individuals diagnosed with BPD, 19 with MDD, 19 schizophrenia and 19 controls. Each sample has gene expression from 3 tissues (post-mortem brain). There are a total of 13768 genes (numeric features) and 10 meta features and 1 ID (GEO sample accession).\n", "\n", "- Age\n", "- Race (W for white and B for black)\n", "- Gender is F for female and M for male\n", "- Ph is the ph of the brain tissue\n", "- Pmi is the post mortal interval\n", "- Rin is the RNA integrity number\n", "- Patient is unique for each patient. Each patient has up to 3 tissue samples. The patient ID is written as disease followed by a number from 1 to 19\n", "- Tissue is the tissue the expression was obtained from.\n", "- Disease.state is the class of disease the patient belongs to: bipolar, schizophrenia, depression or control.\n", "- source.name is the combination of th etissue and disease.state" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# load data (change location if necessary)\n", "data = pd.read_csv(\"../data/GSE53987_combined.csv\", index_col=0)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Patient | \n", "Source.name | \n", "Age | \n", "Gender | \n", "Race | \n", "Pmi | \n", "Ph | \n", "Rin | \n", "Tissue | \n", "Disease.state | \n", "... | \n", "ZSWIM8.AS1 | \n", "ZW10 | \n", "ZWILCH | \n", "ZWINT | \n", "ZXDA | \n", "ZXDB | \n", "ZXDC | \n", "ZYX | \n", "ZZEF1 | \n", "ZZZ3 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GSM1304852 | \n", "bipolar_10 | \n", "hippocampus, bipolar disorder | \n", "52 | \n", "M | \n", "W | \n", "23.5 | \n", "6.7 | \n", "6.3 | \n", "hippocampus | \n", "Bipolar disorder | \n", "... | \n", "5.985163 | \n", "6.428342 | \n", "3.909310 | \n", "6.046175 | \n", "4.277563 | \n", "4.107621 | \n", "6.688651 | \n", "7.228978 | \n", "5.891636 | \n", "7.982137 | \n", "
GSM1304853 | \n", "bipolar_11 | \n", "hippocampus, bipolar disorder | \n", "50 | \n", "F | \n", "W | \n", "11.7 | \n", "6.4 | \n", "6.8 | \n", "hippocampus | \n", "Bipolar disorder | \n", "... | \n", "6.177670 | \n", "6.537507 | \n", "4.552976 | \n", "7.335697 | \n", "4.358375 | \n", "4.132819 | \n", "6.475851 | \n", "7.028054 | \n", "5.905446 | \n", "7.891166 | \n", "
GSM1304854 | \n", "bipolar_12 | \n", "hippocampus, bipolar disorder | \n", "28 | \n", "F | \n", "W | \n", "22.3 | \n", "6.3 | \n", "7.7 | \n", "hippocampus | \n", "Bipolar disorder | \n", "... | \n", "5.544327 | \n", "6.732762 | \n", "5.078011 | \n", "7.470260 | \n", "4.405250 | \n", "4.137028 | \n", "6.020157 | \n", "6.810143 | \n", "5.610422 | \n", "7.940210 | \n", "
GSM1304855 | \n", "bipolar_13 | \n", "hippocampus, bipolar disorder | \n", "55 | \n", "F | \n", "W | \n", "17.5 | \n", "6.4 | \n", "7.6 | \n", "hippocampus | \n", "Bipolar disorder | \n", "... | \n", "5.978466 | \n", "6.913840 | \n", "4.864570 | \n", "7.175861 | \n", "4.206593 | \n", "4.005465 | \n", "6.586425 | \n", "6.818529 | \n", "5.769763 | \n", "7.987298 | \n", "
GSM1304856 | \n", "bipolar_14 | \n", "hippocampus, bipolar disorder | \n", "58 | \n", "M | \n", "W | \n", "27.7 | \n", "6.8 | \n", "7.0 | \n", "hippocampus | \n", "Bipolar disorder | \n", "... | \n", "6.138507 | \n", "6.756435 | \n", "4.203565 | \n", "7.032669 | \n", "4.284513 | \n", "4.128175 | \n", "6.633143 | \n", "7.037504 | \n", "5.926310 | \n", "8.002489 | \n", "
5 rows × 13778 columns
\n", "