{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Version 1.0.3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas basics " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hi! In this programming assignment you need to refresh your `pandas` knowledge. You will need to do several [`groupby`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html)s and [`join`]()`s to solve the task. " ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "\n", "from grader import Grader" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DATA_FOLDER = '../readonly/final_project_data/'\n", "\n", "transactions = pd.read_csv(os.path.join(DATA_FOLDER, 'sales_train.csv.gz'))\n", "items = pd.read_csv(os.path.join(DATA_FOLDER, 'items.csv'))\n", "item_categories = pd.read_csv(os.path.join(DATA_FOLDER, 'item_categories.csv'))\n", "shops = pd.read_csv(os.path.join(DATA_FOLDER, 'shops.csv'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset we are going to use is taken from the competition, that serves as the final project for this course. You can find complete data description at the [competition web page](https://www.kaggle.com/c/competitive-data-science-final-project/data). To join the competition use [this link](https://www.kaggle.com/t/1ea93815dca248e99221df42ebde3540)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will create a grader instace below and use it to collect your answers. When function `submit_tag` is called, grader will store your answer *locally*. The answers will *not* be submited to the platform immediately so you can call `submit_tag` function as many times as you need. \n", "\n", "When you are ready to push your answers to the platform you should fill your credentials and run `submit` function in the last paragraph of the assignment." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "grader = Grader()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Task" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start with a simple task. \n", "\n", "
\n", " | date | \n", "date_block_num | \n", "shop_id | \n", "item_id | \n", "item_price | \n", "item_cnt_day | \n", "
---|---|---|---|---|---|---|
0 | \n", "02.01.2013 | \n", "0 | \n", "59 | \n", "22154 | \n", "999.00 | \n", "1.0 | \n", "
1 | \n", "03.01.2013 | \n", "0 | \n", "25 | \n", "2552 | \n", "899.00 | \n", "1.0 | \n", "
2 | \n", "05.01.2013 | \n", "0 | \n", "25 | \n", "2552 | \n", "899.00 | \n", "-1.0 | \n", "
3 | \n", "06.01.2013 | \n", "0 | \n", "25 | \n", "2554 | \n", "1709.05 | \n", "1.0 | \n", "
4 | \n", "15.01.2013 | \n", "0 | \n", "25 | \n", "2555 | \n", "1099.00 | \n", "1.0 | \n", "
5 | \n", "10.01.2013 | \n", "0 | \n", "25 | \n", "2564 | \n", "349.00 | \n", "1.0 | \n", "
6 | \n", "02.01.2013 | \n", "0 | \n", "25 | \n", "2565 | \n", "549.00 | \n", "1.0 | \n", "
7 | \n", "04.01.2013 | \n", "0 | \n", "25 | \n", "2572 | \n", "239.00 | \n", "1.0 | \n", "
8 | \n", "11.01.2013 | \n", "0 | \n", "25 | \n", "2572 | \n", "299.00 | \n", "1.0 | \n", "
9 | \n", "03.01.2013 | \n", "0 | \n", "25 | \n", "2573 | \n", "299.00 | \n", "3.0 | \n", "
10 | \n", "03.01.2013 | \n", "0 | \n", "25 | \n", "2574 | \n", "399.00 | \n", "2.0 | \n", "
11 | \n", "05.01.2013 | \n", "0 | \n", "25 | \n", "2574 | \n", "399.00 | \n", "1.0 | \n", "
12 | \n", "07.01.2013 | \n", "0 | \n", "25 | \n", "2574 | \n", "399.00 | \n", "1.0 | \n", "
13 | \n", "08.01.2013 | \n", "0 | \n", "25 | \n", "2574 | \n", "399.00 | \n", "2.0 | \n", "
14 | \n", "10.01.2013 | \n", "0 | \n", "25 | \n", "2574 | \n", "399.00 | \n", "1.0 | \n", "