Files
artificial_intelligence_sys…/Архив/3 лекция (Pandas ч.1).ipynb
2025-04-04 13:28:56 +03:00

5914 lines
221 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ojlhGzdxhkwR"
},
"source": [
"# Pandas. Загрузка библиотек"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xChor81V6mtD"
},
"outputs": [],
"source": [
"## Описание и загрузка библиотеки"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z22T_R766hsO"
},
"source": [
" - <a href=\"http://pandas.pydata.org/\">Pandas</a> - библиотека для обработки и анализа данных. Предназначена для данных разной природы - матричных, панельных данных, временных рядов. Претендует на звание самого мощного и гибкого средства для анализа данных с открытым исходным кодом."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "DqYWosnHhkwU"
},
"outputs": [],
"source": [
"import pandas as pd # Загружаем модуль pandas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZNKpaic1hkwb"
},
"source": [
"В пандас есть две структуры данных:\n",
"- Series: одномерный массив с именованными индексами (чаще всего, данные одного типа)\n",
"- DataFrame: двухмерный массив, имеет табличную структуру, легко изменяется по размерам, может содержать в себе данные разных типов\n",
"\n",
"Оба типа можно создавать вручную с помощью функций из самой библиотеки:\n",
"- pandas.Series(data=None, index=None, dtype=None)\n",
"- pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
"\n",
"- **data** - данные, которые надо записать в структуру\n",
"- **index** - индексы строк\n",
"- **columns** - названия столбцов\n",
"- **dtype** - тип данных\n",
"\n",
"Кроме data, остальные параметры опциональны\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tMHOWBBWhkwf"
},
"source": [
"Мы, конечно, можем сами создавать датафреймы!\n",
"\n",
"Например, кто-то нашел нам кусок данных и просит воспроизвести этот датасет:\n",
"\n",
"<img src=\"https://i.imgur.com/FUCGiKP.png\">\n",
"\n",
"Давайте разберемся, что здесь, что и запишем в известную нам конструкцию - листы. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "9yW-A-fRhkwi"
},
"outputs": [],
"source": [
"columns = ['country', 'province', 'region_1', 'region_2'] # Создаем список, в котором будут храниться названия столбцов\n",
"index = [0, 1, 10, 100] # Создаем список, в котором будут индексы строк\n",
"\n",
"# Создаем список с данными, каждая строка таблицы - отдельный список\n",
"data = [['Italy', 'Sicily & Sardinia', 'Etna', 'NaN'], \n",
" ['Portugal', 'Douro', 'NaN', 'NaN'],\n",
" ['US', 'California', 'Napa Valley', 'Napa'],\n",
" ['US', 'New York', 'Finger Lakes', 'Finger Lakes']]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6jUo7y0uhkwo"
},
"source": [
"А теперь соберем в датафрейм"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"id": "jMEdfOOdhkwp",
"outputId": "b5fae3e6-3e8d-4297-d468-0be74894b070",
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Italy</td>\n",
" <td>Sicily &amp; Sardinia</td>\n",
" <td>Etna</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Portugal</td>\n",
" <td>Douro</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>US</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>US</td>\n",
" <td>New York</td>\n",
" <td>Finger Lakes</td>\n",
" <td>Finger Lakes</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country province region_1 region_2\n",
"0 Italy Sicily & Sardinia Etna NaN\n",
"1 Portugal Douro NaN NaN\n",
"10 US California Napa Valley Napa\n",
"100 US New York Finger Lakes Finger Lakes"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(data, columns = columns, index = index) # Создаем ДатаФрейм (в качестве параметров передаем называние столбцов, индексы и сами данные)\n",
"df # Отображаем наш ДатаФрейм (лучше без использования функции print())"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "TIJhU5vEhkwv"
},
"outputs": [],
"source": [
"## Загрузка и запись данных"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CjIlX-Ar6vd7"
},
"source": [
"\n",
"- Функции типа **pd.read_формат** и **pd.to_формат**\n",
"считывают и записывают данные соответственно. <br /> Полный список можно найти в документации:\n",
"https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html\n",
"\n",
"Научимся считывать данные в формате csv (comma separated value) функцией:\n",
"\n",
"- <a href=\"http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv\"> pd.read_csv()</a>: \n",
"\n",
"Аргументов у нее очень много, критически важные:\n",
" - **filepath_or_buffer** - текстовая строка с названием (адресом) файла\n",
" - **sep** - разделитель между данными\n",
" - **header** - номер строки, в которой в файле указаны названия столбцов, None, если нет\n",
" - **names** - список с названиями колонок\n",
" - **index_col** - или номер столбца, или список, или ничего - колонка, из которой надо взять названия строк"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "mWdKBTMNhkwx"
},
"outputs": [],
"source": [
"data = pd.read_csv('wine_base.csv') # С помощью метода read_csv загружаем файл wine_base.csv и записываем данные в data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g8zunGkmhkw2"
},
"source": [
"**Смотрим, что загрузилось**\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "slhGLHJNhkw4",
"outputId": "58af12df-d33f-4a2a-e3f6-5a763ba68831"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>150925</th>\n",
" <td>150925</td>\n",
" <td>Italy</td>\n",
" <td>Many people feel Fiano represents southern Ita...</td>\n",
" <td>NaN</td>\n",
" <td>91</td>\n",
" <td>20.0</td>\n",
" <td>Southern Italy</td>\n",
" <td>Fiano di Avellino</td>\n",
" <td>NaN</td>\n",
" <td>White Blend</td>\n",
" <td>Feudi di San Gregorio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150926</th>\n",
" <td>150926</td>\n",
" <td>France</td>\n",
" <td>Offers an intriguing nose with ginger, lime an...</td>\n",
" <td>Cuvée Prestige</td>\n",
" <td>91</td>\n",
" <td>27.0</td>\n",
" <td>Champagne</td>\n",
" <td>Champagne</td>\n",
" <td>NaN</td>\n",
" <td>Champagne Blend</td>\n",
" <td>H.Germain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150927</th>\n",
" <td>150927</td>\n",
" <td>Italy</td>\n",
" <td>This classic example comes from a cru vineyard...</td>\n",
" <td>Terre di Dora</td>\n",
" <td>91</td>\n",
" <td>20.0</td>\n",
" <td>Southern Italy</td>\n",
" <td>Fiano di Avellino</td>\n",
" <td>NaN</td>\n",
" <td>White Blend</td>\n",
" <td>Terredora</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150928</th>\n",
" <td>150928</td>\n",
" <td>France</td>\n",
" <td>A perfect salmon shade, with scents of peaches...</td>\n",
" <td>Grand Brut Rosé</td>\n",
" <td>90</td>\n",
" <td>52.0</td>\n",
" <td>Champagne</td>\n",
" <td>Champagne</td>\n",
" <td>NaN</td>\n",
" <td>Champagne Blend</td>\n",
" <td>Gosset</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150929</th>\n",
" <td>150929</td>\n",
" <td>Italy</td>\n",
" <td>More Pinot Grigios should taste like this. A r...</td>\n",
" <td>NaN</td>\n",
" <td>90</td>\n",
" <td>15.0</td>\n",
" <td>Northeastern Italy</td>\n",
" <td>Alto Adige</td>\n",
" <td>NaN</td>\n",
" <td>Pinot Grigio</td>\n",
" <td>Alois Lageder</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 country description \\\n",
"150925 150925 Italy Many people feel Fiano represents southern Ita... \n",
"150926 150926 France Offers an intriguing nose with ginger, lime an... \n",
"150927 150927 Italy This classic example comes from a cru vineyard... \n",
"150928 150928 France A perfect salmon shade, with scents of peaches... \n",
"150929 150929 Italy More Pinot Grigios should taste like this. A r... \n",
"\n",
" designation points price province region_1 \\\n",
"150925 NaN 91 20.0 Southern Italy Fiano di Avellino \n",
"150926 Cuvée Prestige 91 27.0 Champagne Champagne \n",
"150927 Terre di Dora 91 20.0 Southern Italy Fiano di Avellino \n",
"150928 Grand Brut Rosé 90 52.0 Champagne Champagne \n",
"150929 NaN 90 15.0 Northeastern Italy Alto Adige \n",
"\n",
" region_2 variety winery \n",
"150925 NaN White Blend Feudi di San Gregorio \n",
"150926 NaN Champagne Blend H.Germain \n",
"150927 NaN White Blend Terredora \n",
"150928 NaN Champagne Blend Gosset \n",
"150929 NaN Pinot Grigio Alois Lageder "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.tail() # С помощью метода head выводим первые 5 строк нашего ДатаФрейма"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TAEcUwXohkw9"
},
"source": [
"Что-то не то с первым столбцом, немного поправим"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "UQ_ne0wIhkw-"
},
"outputs": [],
"source": [
"data = pd.read_csv('wine_base.csv', index_col = 0) # В параметре index_col указываем столбец, который будет использоваться как индекс нашего датафрейма"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 924
},
"id": "u5iBpJ0jhkxC",
"outputId": "b8c9ab01-2747-467a-e833-870c9d83d11b",
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Spain</td>\n",
" <td>Slightly gritty black-fruit aromas include a s...</td>\n",
" <td>San Román</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Maurodos</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Spain</td>\n",
" <td>Lush cedary black-fruit aromas are luxe and of...</td>\n",
" <td>Carodorum Único Crianza</td>\n",
" <td>95</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>US</td>\n",
" <td>This re-named vineyard was formerly bottled as...</td>\n",
" <td>Silice</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Chehalem Mountains</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>US</td>\n",
" <td>The producer sources from two blocks of the vi...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>60.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Blue Farm</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Italy</td>\n",
" <td>Elegance, complexity and structure come togeth...</td>\n",
" <td>Ronco della Chiesa</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northeastern Italy</td>\n",
" <td>Collio</td>\n",
" <td>NaN</td>\n",
" <td>Friulano</td>\n",
" <td>Borgo del Tiglio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>US</td>\n",
" <td>From 18-year-old vines, this supple well-balan...</td>\n",
" <td>Estate Vineyard Wadensvil Block</td>\n",
" <td>95</td>\n",
" <td>48.0</td>\n",
" <td>Oregon</td>\n",
" <td>Ribbon Ridge</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Patricia Green Cellars</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>US</td>\n",
" <td>A standout even in this terrific lineup of 201...</td>\n",
" <td>Weber Vineyard</td>\n",
" <td>95</td>\n",
" <td>48.0</td>\n",
" <td>Oregon</td>\n",
" <td>Dundee Hills</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Patricia Green Cellars</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>France</td>\n",
" <td>This wine is in peak condition. The tannins an...</td>\n",
" <td>Château Montus Prestige</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Madiran</td>\n",
" <td>NaN</td>\n",
" <td>Tannat</td>\n",
" <td>Vignobles Brumont</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>US</td>\n",
" <td>With its sophisticated mix of mineral, acid an...</td>\n",
" <td>Grace Vineyard</td>\n",
" <td>95</td>\n",
" <td>185.0</td>\n",
" <td>Oregon</td>\n",
" <td>Dundee Hills</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Domaine Serene</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>US</td>\n",
" <td>First made in 2006, this succulent luscious Ch...</td>\n",
" <td>Sigrid</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Chardonnay</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>US</td>\n",
" <td>This blockbuster, powerhouse of a wine suggest...</td>\n",
" <td>Rainin Vineyard</td>\n",
" <td>95</td>\n",
" <td>325.0</td>\n",
" <td>California</td>\n",
" <td>Diamond Mountain District</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Hall</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Spain</td>\n",
" <td>Nicely oaked blackberry, licorice, vanilla and...</td>\n",
" <td>6 Años Reserva Premium</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Ribera del Duero</td>\n",
" <td>NaN</td>\n",
" <td>Tempranillo</td>\n",
" <td>Valduero</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>France</td>\n",
" <td>Coming from a seven-acre vineyard named after ...</td>\n",
" <td>Le Pigeonnier</td>\n",
" <td>95</td>\n",
" <td>290.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Cahors</td>\n",
" <td>NaN</td>\n",
" <td>Malbec</td>\n",
" <td>Château Lagrézette</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>US</td>\n",
" <td>This fresh and lively medium-bodied wine is be...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>75.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Gary Farrell</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"5 Spain Deep, dense and pure from the opening bell, th... \n",
"6 Spain Slightly gritty black-fruit aromas include a s... \n",
"7 Spain Lush cedary black-fruit aromas are luxe and of... \n",
"8 US This re-named vineyard was formerly bottled as... \n",
"9 US The producer sources from two blocks of the vi... \n",
"10 Italy Elegance, complexity and structure come togeth... \n",
"11 US From 18-year-old vines, this supple well-balan... \n",
"12 US A standout even in this terrific lineup of 201... \n",
"13 France This wine is in peak condition. The tannins an... \n",
"14 US With its sophisticated mix of mineral, acid an... \n",
"15 US First made in 2006, this succulent luscious Ch... \n",
"16 US This blockbuster, powerhouse of a wine suggest... \n",
"17 Spain Nicely oaked blackberry, licorice, vanilla and... \n",
"18 France Coming from a seven-acre vineyard named after ... \n",
"19 US This fresh and lively medium-bodied wine is be... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"5 Numanthia 95 73.0 Northern Spain \n",
"6 San Román 95 65.0 Northern Spain \n",
"7 Carodorum Único Crianza 95 110.0 Northern Spain \n",
"8 Silice 95 65.0 Oregon \n",
"9 Gap's Crown Vineyard 95 60.0 California \n",
"10 Ronco della Chiesa 95 80.0 Northeastern Italy \n",
"11 Estate Vineyard Wadensvil Block 95 48.0 Oregon \n",
"12 Weber Vineyard 95 48.0 Oregon \n",
"13 Château Montus Prestige 95 90.0 Southwest France \n",
"14 Grace Vineyard 95 185.0 Oregon \n",
"15 Sigrid 95 90.0 Oregon \n",
"16 Rainin Vineyard 95 325.0 California \n",
"17 6 Años Reserva Premium 95 80.0 Northern Spain \n",
"18 Le Pigeonnier 95 290.0 Southwest France \n",
"19 Gap's Crown Vineyard 95 75.0 California \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"5 Toro NaN Tinta de Toro \n",
"6 Toro NaN Tinta de Toro \n",
"7 Toro NaN Tinta de Toro \n",
"8 Chehalem Mountains Willamette Valley Pinot Noir \n",
"9 Sonoma Coast Sonoma Pinot Noir \n",
"10 Collio NaN Friulano \n",
"11 Ribbon Ridge Willamette Valley Pinot Noir \n",
"12 Dundee Hills Willamette Valley Pinot Noir \n",
"13 Madiran NaN Tannat \n",
"14 Dundee Hills Willamette Valley Pinot Noir \n",
"15 Willamette Valley Willamette Valley Chardonnay \n",
"16 Diamond Mountain District Napa Cabernet Sauvignon \n",
"17 Ribera del Duero NaN Tempranillo \n",
"18 Cahors NaN Malbec \n",
"19 Sonoma Coast Sonoma Pinot Noir \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude \n",
"5 Numanthia \n",
"6 Maurodos \n",
"7 Bodega Carmen Rodríguez \n",
"8 Bergström \n",
"9 Blue Farm \n",
"10 Borgo del Tiglio \n",
"11 Patricia Green Cellars \n",
"12 Patricia Green Cellars \n",
"13 Vignobles Brumont \n",
"14 Domaine Serene \n",
"15 Bergström \n",
"16 Hall \n",
"17 Valduero \n",
"18 Château Lagrézette \n",
"19 Gary Farrell "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(20) # С помощью метода head выводим первые 20 строк нашего ДатаФрейма"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UnU4xLLzhkxG"
},
"source": [
"**Информация о загруженных данных**:\n",
"\n",
"- Посчитаем, сколько записей\n",
"- Посмотрим, какого типа данные\n",
"- Проверим, есть ли пропуски"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "Z-MKWiELhkxP",
"outputId": "68ca424e-83eb-4d17-d779-1ea7f1a6b4ae"
},
"outputs": [
{
"data": {
"text/plain": [
"(150930, 10)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape # Параметр .shape (так же как и в numpy-массивах) показывает размерность нашего датафрейма"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "SEr52zb4hkxT",
"outputId": "d72fe356-c89d-4d61-c62b-a64359f748d2"
},
"outputs": [
{
"data": {
"text/plain": [
"1509300"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.size # Параметр .size (так же как и в numpy-массивах) показывает количество элементов в нашем датафрейме"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "sw8ATDX1hkxJ",
"outputId": "2cfce98c-00bf-4093-8bf5-6573e0cd909f",
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"country 150925\n",
"description 150930\n",
"designation 105195\n",
"points 150930\n",
"price 137235\n",
"province 150925\n",
"region_1 125870\n",
"region_2 60953\n",
"variety 150930\n",
"winery 150930\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.count() # Метод count считает сколько всего непустых записей в каждом столбце"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "OwVkE1MKX6mW",
"outputId": "9a9f1142-de9c-4ffa-e051-bb58af728151"
},
"outputs": [
{
"data": {
"text/plain": [
"country 100\n",
"description 100\n",
"designation 84\n",
"points 100\n",
"price 96\n",
"province 100\n",
"region_1 92\n",
"region_2 43\n",
"variety 100\n",
"winery 100\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(100).count() # Применим метод .count() к первым ста записям нашего датафрейма"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5AjpVmYFhkxX"
},
"source": [
"- Метод info() заодно показывает, какого типа данные в столбцах"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 306
},
"id": "G8RHx3kvhkxZ",
"outputId": "cf46dd23-3acf-4d2e-c8c4-046fa7b1f8d6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Int64Index: 150930 entries, 0 to 150929\n",
"Data columns (total 10 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 country 150925 non-null object \n",
" 1 description 150930 non-null object \n",
" 2 designation 105195 non-null object \n",
" 3 points 150930 non-null int64 \n",
" 4 price 137235 non-null float64\n",
" 5 province 150925 non-null object \n",
" 6 region_1 125870 non-null object \n",
" 7 region_2 60953 non-null object \n",
" 8 variety 150930 non-null object \n",
" 9 winery 150930 non-null object \n",
"dtypes: float64(1), int64(1), object(8)\n",
"memory usage: 12.7+ MB\n"
]
}
],
"source": [
"data.info() # Метод .info() показывает тип каждого столбца и занимаемую память"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "cMRzYwQdhkxd",
"outputId": "4afa01d4-ec65-452a-fc89-c503253c1efa"
},
"outputs": [
{
"data": {
"text/plain": [
"country object\n",
"description object\n",
"designation object\n",
"points int64\n",
"price float64\n",
"province object\n",
"region_1 object\n",
"region_2 object\n",
"variety object\n",
"winery object\n",
"dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.dtypes # Параметр .dtypes показывает просто тип каждого столбца"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S3TniwKUhkxh"
},
"source": [
"Начнем проверять на пропуски! \n",
"\n",
"- .isnull() - выдает табличку, где False - ячейка заполнена, True - ячейка пуста :( Ближайшая родня - isna()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "uq1iywLbYsxS",
"outputId": "fcc31e0d-6e49-4967-ac34-d03865227b1f"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head() # Отобразим первые 5 строк нашего датафрейма"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "_oxBR6lAzfgu",
"outputId": "1b2f600d-ea50-4289-cfd6-976ea1526877"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description designation points price province region_1 \\\n",
"0 False False False False False False False \n",
"1 False False False False False False False \n",
"2 False False False False False False False \n",
"3 False False False False False False False \n",
"4 False False False False False False False \n",
"\n",
" region_2 variety winery \n",
"0 False False False \n",
"1 True False False \n",
"2 False False False \n",
"3 False False False \n",
"4 True False False "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isna().head() # Метод .isna() вместо каждого значения подставит True (значение NaN) или False (действительное значение)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "ZjTn7cM5zyta",
"outputId": "b3889e78-08c3-4bdf-c4ec-260d35eed9ea"
},
"outputs": [
{
"data": {
"text/plain": [
"country 5\n",
"description 0\n",
"designation 45735\n",
"points 0\n",
"price 13695\n",
"province 5\n",
"region_1 25060\n",
"region_2 89977\n",
"variety 0\n",
"winery 0\n",
"dtype: int64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isna().sum() # Подсчитаем количество пропусков в каждом столбце с помощью метода .sum()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "D7aHOhOGY5pe",
"outputId": "ce904bd7-3087-40a2-824e-7f7047f358db"
},
"outputs": [
{
"data": {
"text/plain": [
"country 0\n",
"description 0\n",
"designation 16\n",
"points 0\n",
"price 4\n",
"province 0\n",
"region_1 8\n",
"region_2 57\n",
"variety 0\n",
"winery 0\n",
"dtype: int64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(100).isna().sum() # Подсчитаем количество пропусков в каждом столбце для первых ста записей"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "ZaiGPw-KY9eQ",
"outputId": "3ffbc798-c5bc-4970-c6c3-1608da30afe4"
},
"outputs": [
{
"data": {
"text/plain": [
"country 0\n",
"description 0\n",
"designation 16\n",
"points 0\n",
"price 4\n",
"province 0\n",
"region_1 8\n",
"region_2 57\n",
"variety 0\n",
"winery 0\n",
"dtype: int64"
]
},
"execution_count": 20,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"data.isna().head(100).sum() # Подсчитаем количество пропусков в каждом столбце для первых ста записей (равнозначно предыдущей записи)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "NHeX2czDhkxi",
"outputId": "9995758b-2f88-47ca-ab63-37cb364875dd"
},
"outputs": [
{
"data": {
"text/plain": [
"country 0.000033\n",
"description 0.000000\n",
"designation 0.303021\n",
"points 0.000000\n",
"price 0.090737\n",
"province 0.000033\n",
"region_1 0.166037\n",
"region_2 0.596151\n",
"variety 0.000000\n",
"winery 0.000000\n",
"dtype: float64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"round(data.isna().sum() / data.shape[0], 6) # Посчитаем какую часть составляют пропуски от общего количества элементов"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "tvAQTignhkxo",
"outputId": "7d223855-7d97-4529-bb74-572e21ed89a2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"174477\n"
]
}
],
"source": [
"proc = data.isna().sum().sum() # Подсчитаем сколько всего пропусков (во всех столбцах) в нашем датафрейме\n",
"print(proc) # Отобразим количество посчитанных пропусков"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"id": "EOZz-GAPhkxr",
"outputId": "b7997cfd-a292-48ff-d431-ff425d210a7c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"11.6%\n"
]
}
],
"source": [
"# Переведем полученное значение в процентное отображение\n",
"proc = data.isna().sum().sum() / data.size\n",
"print(round(100*proc,1), '%', sep='')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OuW1gRtlhkxz"
},
"outputs": [],
"source": [
"### Как оценить пропуски визуально"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 735
},
"id": "ToPE3VkWhkx1",
"outputId": "6a5c7213-1a94-4823-e5cf-3d19468a890d"
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x864 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt # Загружаем модуль matplotlib.pyplot\n",
"import seaborn as sns # Загружаем модуль seaborn\n",
"%matplotlib inline\n",
"\n",
"fig, ax = plt.subplots(figsize=(20,12)) # Создаем область под график\n",
"sns_heatmap = sns.heatmap(data.isnull(), yticklabels=False, cbar=False, cmap='viridis') # Визуализируем прпуски\n",
"plt.show() # Отображаем график"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E8w2yGJ1hkx6"
},
"source": [
"Что с ним делать?\n",
"\n",
"Выбора не очень много: <br>\n",
"\n",
"1) Удалять: \n",
"- dropna(axis=0, how='any'): axis = 0 - удаляем построчно, axis = 1 выкидываем столбец; how ='any' - выкидываем, если есть хотя бы одна ячейка пустая. how = 'all' - выкидываем, если есть полностью пустая строка или столбец\n",
"\n",
"2) Вставлять информацию самим:\n",
"- fillna() - это отдельное искусство, как заполнять. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 495
},
"id": "uZgh1E3nhkx6",
"outputId": "3a6709c7-3fbc-4260-bcd9-69c47228b013"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>Python</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>Python</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>Python</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Spain</td>\n",
" <td>Slightly gritty black-fruit aromas include a s...</td>\n",
" <td>San Román</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>Python</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Maurodos</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Spain</td>\n",
" <td>Lush cedary black-fruit aromas are luxe and of...</td>\n",
" <td>Carodorum Único Crianza</td>\n",
" <td>95</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>Python</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>US</td>\n",
" <td>This re-named vineyard was formerly bottled as...</td>\n",
" <td>Silice</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Chehalem Mountains</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>US</td>\n",
" <td>The producer sources from two blocks of the vi...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>60.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Blue Farm</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"5 Spain Deep, dense and pure from the opening bell, th... \n",
"6 Spain Slightly gritty black-fruit aromas include a s... \n",
"7 Spain Lush cedary black-fruit aromas are luxe and of... \n",
"8 US This re-named vineyard was formerly bottled as... \n",
"9 US The producer sources from two blocks of the vi... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"5 Numanthia 95 73.0 Northern Spain \n",
"6 San Román 95 65.0 Northern Spain \n",
"7 Carodorum Único Crianza 95 110.0 Northern Spain \n",
"8 Silice 95 65.0 Oregon \n",
"9 Gap's Crown Vineyard 95 60.0 California \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro Python Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol Python Provence red blend \n",
"5 Toro Python Tinta de Toro \n",
"6 Toro Python Tinta de Toro \n",
"7 Toro Python Tinta de Toro \n",
"8 Chehalem Mountains Willamette Valley Pinot Noir \n",
"9 Sonoma Coast Sonoma Pinot Noir \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude \n",
"5 Numanthia \n",
"6 Maurodos \n",
"7 Bodega Carmen Rodríguez \n",
"8 Bergström \n",
"9 Blue Farm "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.fillna(\"Python\").head(10) # С помощью метода .fillna() заменяем все пропуски словом Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mBsHwML6hkyF"
},
"outputs": [],
"source": [
"### Описательные статистики\n",
"\n",
"Теперь посмотрим, а что содержательно у нас есть на руках. \n",
"\n",
"Глазами просматривать не будем, а попросим посчитать основные описательные статистики. Причем сразу все.\n",
"\n",
"- describe() - метод, который возвращает табличку с описательными статистиками. В таком виде считает все для числовых столбцов"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"id": "ZWz60or1hkyG",
"outputId": "134781c0-28b6-4137-85d1-8376216860c6",
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>150930.000000</td>\n",
" <td>137235.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>87.888418</td>\n",
" <td>33.131482</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.222392</td>\n",
" <td>36.322536</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>80.000000</td>\n",
" <td>4.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>86.000000</td>\n",
" <td>16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>88.000000</td>\n",
" <td>24.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>90.000000</td>\n",
" <td>40.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>100.000000</td>\n",
" <td>2300.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" points price\n",
"count 150930.000000 137235.000000\n",
"mean 87.888418 33.131482\n",
"std 3.222392 36.322536\n",
"min 80.000000 4.000000\n",
"25% 86.000000 16.000000\n",
"50% 88.000000 24.000000\n",
"75% 90.000000 40.000000\n",
"max 100.000000 2300.000000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe() # Отобразим описательные статистики нашего датафрейма (только числовые данные)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0aeamrWMhkyK"
},
"source": [
"Немножко магии, и для нечисловых данные тоже будут свои описательные статистики. "
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"id": "jKTF-2BHhkyK",
"outputId": "244a91a6-e8b4-42a9-d464-3728bc5c3dc5",
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>150925</td>\n",
" <td>150930</td>\n",
" <td>105195</td>\n",
" <td>150925</td>\n",
" <td>125870</td>\n",
" <td>60953</td>\n",
" <td>150930</td>\n",
" <td>150930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>48</td>\n",
" <td>97821</td>\n",
" <td>30621</td>\n",
" <td>455</td>\n",
" <td>1236</td>\n",
" <td>18</td>\n",
" <td>632</td>\n",
" <td>14810</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>US</td>\n",
" <td>A little bit funky and unsettled when you pop ...</td>\n",
" <td>Reserve</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Central Coast</td>\n",
" <td>Chardonnay</td>\n",
" <td>Williams Selyem</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>62397</td>\n",
" <td>6</td>\n",
" <td>2752</td>\n",
" <td>44508</td>\n",
" <td>6209</td>\n",
" <td>13057</td>\n",
" <td>14482</td>\n",
" <td>374</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description designation \\\n",
"count 150925 150930 105195 \n",
"unique 48 97821 30621 \n",
"top US A little bit funky and unsettled when you pop ... Reserve \n",
"freq 62397 6 2752 \n",
"\n",
" province region_1 region_2 variety winery \n",
"count 150925 125870 60953 150930 150930 \n",
"unique 455 1236 18 632 14810 \n",
"top California Napa Valley Central Coast Chardonnay Williams Selyem \n",
"freq 44508 6209 13057 14482 374 "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe(include=['O']) # # Отобразим описательные статистики нашего датафрейма ('O' - в том числе и строковые)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-IbwBRL_hkyO"
},
"outputs": [],
"source": [
"### Срезы данных\n",
"\n",
"Допустим, нам не нужен датасет, а только определенные столбцы или строки или столбцы и строки. \n",
"\n",
"\n",
"Как делать?\n",
"Помним, что:\n",
"- у столбцов есть названия\n",
"- у строк есть названия\n",
"- если нет названий, то они пронумерованы с нуля\n",
"\n",
"Основываясь на этой идее, мы начнем отбирать данные."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"id": "4uT9dn4vhkyO",
"outputId": "f57acc81-2f88-41fd-ada0-028d80ed3ca7"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"\n",
" designation points price province region_1 region_2 \\\n",
"0 Martha's Vineyard 96 235.0 California Napa Valley Napa \n",
"\n",
" variety winery \n",
"0 Cabernet Sauvignon Heitz "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head(1) # Отобразим первую строчку датафрейма"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5b2HPHJwhkyT"
},
"source": [
"#### Отбираем по столбцам. Версия 1. "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 221
},
"id": "ocn9YgmnhkyZ",
"outputId": "438dc10b-ff81-42d3-deea-eefa140305d5"
},
"outputs": [
{
"data": {
"text/plain": [
"0 235.0\n",
"1 110.0\n",
"2 90.0\n",
"3 65.0\n",
"4 66.0\n",
" ... \n",
"150925 20.0\n",
"150926 27.0\n",
"150927 20.0\n",
"150928 52.0\n",
"150929 15.0\n",
"Name: price, Length: 150930, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"array = data['price'] # Отобразим столбец price\n",
"array"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 221
},
"id": "tBeyZQLPIMIJ",
"outputId": "95d6ac17-ddde-46d9-e9a8-6e265eb12085"
},
"outputs": [
{
"data": {
"text/plain": [
"0 235.0\n",
"1 110.0\n",
"2 90.0\n",
"3 65.0\n",
"4 66.0\n",
" ... \n",
"150925 20.0\n",
"150926 27.0\n",
"150927 20.0\n",
"150928 52.0\n",
"150929 15.0\n",
"Name: price, Length: 150930, dtype: float64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.price"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 119
},
"id": "YVzV30CQhkyV",
"outputId": "26839d1c-a250-4ec0-a388-50f50e45af89"
},
"outputs": [
{
"data": {
"text/plain": [
"0 235.0\n",
"1 110.0\n",
"2 90.0\n",
"3 65.0\n",
"4 66.0\n",
"Name: price, dtype: float64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.price.head() # Отобразим столбец price (альтернативные вариант)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "IDhUYDK5hkye",
"outputId": "536a6bdf-0016-4faf-e984-f1bfa8d356ad"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>price</th>\n",
" <th>country</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>235.0</td>\n",
" <td>US</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>110.0</td>\n",
" <td>Spain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>90.0</td>\n",
" <td>US</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>65.0</td>\n",
" <td>US</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>66.0</td>\n",
" <td>France</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price country\n",
"0 235.0 US\n",
"1 110.0 Spain\n",
"2 90.0 US\n",
"3 65.0 US\n",
"4 66.0 France"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_df = data[['price','country']].head() # Отобразим столбцы 'price' и 'country'\n",
"new_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Pw7bsVKPhkyg"
},
"source": [
"#### Отбираем по строкам. Версия 1. "
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
},
"id": "-3ntG2CzhDyV",
"outputId": "799530a2-5339-4ddf-8cbc-ef187d8a148f"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Italy</td>\n",
" <td>Elegance, complexity and structure come togeth...</td>\n",
" <td>Ronco della Chiesa</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northeastern Italy</td>\n",
" <td>Collio</td>\n",
" <td>NaN</td>\n",
" <td>Friulano</td>\n",
" <td>Borgo del Tiglio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>US</td>\n",
" <td>From 18-year-old vines, this supple well-balan...</td>\n",
" <td>Estate Vineyard Wadensvil Block</td>\n",
" <td>95</td>\n",
" <td>48.0</td>\n",
" <td>Oregon</td>\n",
" <td>Ribbon Ridge</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Patricia Green Cellars</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>US</td>\n",
" <td>A standout even in this terrific lineup of 201...</td>\n",
" <td>Weber Vineyard</td>\n",
" <td>95</td>\n",
" <td>48.0</td>\n",
" <td>Oregon</td>\n",
" <td>Dundee Hills</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Patricia Green Cellars</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>France</td>\n",
" <td>This wine is in peak condition. The tannins an...</td>\n",
" <td>Château Montus Prestige</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Madiran</td>\n",
" <td>NaN</td>\n",
" <td>Tannat</td>\n",
" <td>Vignobles Brumont</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>US</td>\n",
" <td>With its sophisticated mix of mineral, acid an...</td>\n",
" <td>Grace Vineyard</td>\n",
" <td>95</td>\n",
" <td>185.0</td>\n",
" <td>Oregon</td>\n",
" <td>Dundee Hills</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Domaine Serene</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>US</td>\n",
" <td>First made in 2006, this succulent luscious Ch...</td>\n",
" <td>Sigrid</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Chardonnay</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>US</td>\n",
" <td>This blockbuster, powerhouse of a wine suggest...</td>\n",
" <td>Rainin Vineyard</td>\n",
" <td>95</td>\n",
" <td>325.0</td>\n",
" <td>California</td>\n",
" <td>Diamond Mountain District</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Hall</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Spain</td>\n",
" <td>Nicely oaked blackberry, licorice, vanilla and...</td>\n",
" <td>6 Años Reserva Premium</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Ribera del Duero</td>\n",
" <td>NaN</td>\n",
" <td>Tempranillo</td>\n",
" <td>Valduero</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>France</td>\n",
" <td>Coming from a seven-acre vineyard named after ...</td>\n",
" <td>Le Pigeonnier</td>\n",
" <td>95</td>\n",
" <td>290.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Cahors</td>\n",
" <td>NaN</td>\n",
" <td>Malbec</td>\n",
" <td>Château Lagrézette</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>US</td>\n",
" <td>This fresh and lively medium-bodied wine is be...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>75.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Gary Farrell</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"10 Italy Elegance, complexity and structure come togeth... \n",
"11 US From 18-year-old vines, this supple well-balan... \n",
"12 US A standout even in this terrific lineup of 201... \n",
"13 France This wine is in peak condition. The tannins an... \n",
"14 US With its sophisticated mix of mineral, acid an... \n",
"15 US First made in 2006, this succulent luscious Ch... \n",
"16 US This blockbuster, powerhouse of a wine suggest... \n",
"17 Spain Nicely oaked blackberry, licorice, vanilla and... \n",
"18 France Coming from a seven-acre vineyard named after ... \n",
"19 US This fresh and lively medium-bodied wine is be... \n",
"\n",
" designation points price province \\\n",
"10 Ronco della Chiesa 95 80.0 Northeastern Italy \n",
"11 Estate Vineyard Wadensvil Block 95 48.0 Oregon \n",
"12 Weber Vineyard 95 48.0 Oregon \n",
"13 Château Montus Prestige 95 90.0 Southwest France \n",
"14 Grace Vineyard 95 185.0 Oregon \n",
"15 Sigrid 95 90.0 Oregon \n",
"16 Rainin Vineyard 95 325.0 California \n",
"17 6 Años Reserva Premium 95 80.0 Northern Spain \n",
"18 Le Pigeonnier 95 290.0 Southwest France \n",
"19 Gap's Crown Vineyard 95 75.0 California \n",
"\n",
" region_1 region_2 variety \\\n",
"10 Collio NaN Friulano \n",
"11 Ribbon Ridge Willamette Valley Pinot Noir \n",
"12 Dundee Hills Willamette Valley Pinot Noir \n",
"13 Madiran NaN Tannat \n",
"14 Dundee Hills Willamette Valley Pinot Noir \n",
"15 Willamette Valley Willamette Valley Chardonnay \n",
"16 Diamond Mountain District Napa Cabernet Sauvignon \n",
"17 Ribera del Duero NaN Tempranillo \n",
"18 Cahors NaN Malbec \n",
"19 Sonoma Coast Sonoma Pinot Noir \n",
"\n",
" winery \n",
"10 Borgo del Tiglio \n",
"11 Patricia Green Cellars \n",
"12 Patricia Green Cellars \n",
"13 Vignobles Brumont \n",
"14 Domaine Serene \n",
"15 Bergström \n",
"16 Hall \n",
"17 Valduero \n",
"18 Château Lagrézette \n",
"19 Gary Farrell "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[10:20] # Отобразим с 10й по 20ю строки датафрейма"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"id": "DaW5dRU7hkyh",
"outputId": "9665e0f7-f195-4217-b8a1-674700cdc917"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Italy</td>\n",
" <td>Elegance, complexity and structure come togeth...</td>\n",
" <td>Ronco della Chiesa</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northeastern Italy</td>\n",
" <td>Collio</td>\n",
" <td>NaN</td>\n",
" <td>Friulano</td>\n",
" <td>Borgo del Tiglio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>France</td>\n",
" <td>This wine is in peak condition. The tannins an...</td>\n",
" <td>Château Montus Prestige</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Madiran</td>\n",
" <td>NaN</td>\n",
" <td>Tannat</td>\n",
" <td>Vignobles Brumont</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>US</td>\n",
" <td>This blockbuster, powerhouse of a wine suggest...</td>\n",
" <td>Rainin Vineyard</td>\n",
" <td>95</td>\n",
" <td>325.0</td>\n",
" <td>California</td>\n",
" <td>Diamond Mountain District</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Hall</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>US</td>\n",
" <td>This fresh and lively medium-bodied wine is be...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>75.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Gary Farrell</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country ... winery\n",
"10 Italy ... Borgo del Tiglio\n",
"13 France ... Vignobles Brumont\n",
"16 US ... Hall\n",
"19 US ... Gary Farrell\n",
"\n",
"[4 rows x 10 columns]"
]
},
"execution_count": 34,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"data[10:20:3] # Отобразим с 10й по 20ю строки датафрейма с шагом 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
},
"id": "zXqL-lBEhGkG",
"outputId": "c741c699-f40d-4417-9bb4-bc849da4f19b"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Italy</td>\n",
" <td>Elegance, complexity and structure come togeth...</td>\n",
" <td>Ronco della Chiesa</td>\n",
" <td>95</td>\n",
" <td>80.0</td>\n",
" <td>Northeastern Italy</td>\n",
" <td>Collio</td>\n",
" <td>NaN</td>\n",
" <td>Friulano</td>\n",
" <td>Borgo del Tiglio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>US</td>\n",
" <td>First made in 2006, this succulent luscious Ch...</td>\n",
" <td>Sigrid</td>\n",
" <td>95</td>\n",
" <td>90.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Chardonnay</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>US</td>\n",
" <td>Heitz has made this stellar rosé from the rare...</td>\n",
" <td>Grignolino</td>\n",
" <td>95</td>\n",
" <td>24.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Rosé</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>New Zealand</td>\n",
" <td>Yields were down in 2015, but intensity is up,...</td>\n",
" <td>Maté's Vineyard</td>\n",
" <td>94</td>\n",
" <td>57.0</td>\n",
" <td>Kumeu</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Chardonnay</td>\n",
" <td>Kumeu River</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>Bulgaria</td>\n",
" <td>This Bulgarian Mavrud presents the nose with s...</td>\n",
" <td>Bergulé</td>\n",
" <td>90</td>\n",
" <td>15.0</td>\n",
" <td>Bulgaria</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Mavrud</td>\n",
" <td>Villa Melnik</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>Italy</td>\n",
" <td>Forest floor, tilled soil, mature berry and a ...</td>\n",
" <td>Riserva</td>\n",
" <td>90</td>\n",
" <td>135.0</td>\n",
" <td>Tuscany</td>\n",
" <td>Brunello di Montalcino</td>\n",
" <td>NaN</td>\n",
" <td>Sangiovese</td>\n",
" <td>Carillon</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>Spain</td>\n",
" <td>Earthy plum and cherry aromas score points for...</td>\n",
" <td>Amandi</td>\n",
" <td>90</td>\n",
" <td>17.0</td>\n",
" <td>Galicia</td>\n",
" <td>Ribeira Sacra</td>\n",
" <td>NaN</td>\n",
" <td>Mencía</td>\n",
" <td>Don Bernardino</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>Italy</td>\n",
" <td>A blend of 90% Sangiovese and 10% Canaiolo, th...</td>\n",
" <td>Vigneto Odoardo Beccari Riserva</td>\n",
" <td>90</td>\n",
" <td>30.0</td>\n",
" <td>Tuscany</td>\n",
" <td>Chianti Classico</td>\n",
" <td>NaN</td>\n",
" <td>Red Blend</td>\n",
" <td>Vignavecchia</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country ... winery\n",
"0 US ... Heitz\n",
"5 Spain ... Numanthia\n",
"10 Italy ... Borgo del Tiglio\n",
"15 US ... Bergström\n",
"20 US ... Heitz\n",
"25 New Zealand ... Kumeu River\n",
"30 Bulgaria ... Villa Melnik\n",
"35 Italy ... Carillon\n",
"40 Spain ... Don Bernardino\n",
"45 Italy ... Vignavecchia\n",
"\n",
"[10 rows x 10 columns]"
]
},
"execution_count": 35,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"data[::5].head(10) # Отобразим каждую 5ю строку датафрейма"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pV0kczWfhkyk"
},
"source": [
"#### Отбор по столбцам. Версия 2. Все еще по названиям "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 495
},
"id": "blyn4oRnJOlm",
"outputId": "cc0258b0-2735-4d7f-cb92-ac9b23b2a83e"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Spain</td>\n",
" <td>Slightly gritty black-fruit aromas include a s...</td>\n",
" <td>San Román</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Maurodos</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Spain</td>\n",
" <td>Lush cedary black-fruit aromas are luxe and of...</td>\n",
" <td>Carodorum Único Crianza</td>\n",
" <td>95</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>US</td>\n",
" <td>This re-named vineyard was formerly bottled as...</td>\n",
" <td>Silice</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Chehalem Mountains</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Bergström</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>US</td>\n",
" <td>The producer sources from two blocks of the vi...</td>\n",
" <td>Gap's Crown Vineyard</td>\n",
" <td>95</td>\n",
" <td>60.0</td>\n",
" <td>California</td>\n",
" <td>Sonoma Coast</td>\n",
" <td>Sonoma</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Blue Farm</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country ... winery\n",
"0 US ... Heitz\n",
"1 Spain ... Bodega Carmen Rodríguez\n",
"2 US ... Macauley\n",
"3 US ... Ponzi\n",
"4 France ... Domaine de la Bégude\n",
"5 Spain ... Numanthia\n",
"6 Spain ... Maurodos\n",
"7 Spain ... Bodega Carmen Rodríguez\n",
"8 US ... Bergström\n",
"9 US ... Blue Farm\n",
"\n",
"[10 rows x 10 columns]"
]
},
"execution_count": 36,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"data.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"id": "LfRYRSsohkyk",
"outputId": "c7f8b402-bba9-4da4-eebe-6402c24030c2"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>price</th>\n",
" <th>points</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>66.0</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>73.0</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>65.0</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>110.0</td>\n",
" <td>95</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" price points\n",
"4 66.0 95\n",
"5 73.0 95\n",
"6 65.0 95\n",
"7 110.0 95"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.loc[4:7, ['price', 'points']] # Отобразим два столбца 'price' и 'points', и в них строки с индексами с 4 по 7"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CK4-ntzDhkyo"
},
"source": [
"#### Отбор по строкам. Версия 2. Все еще по названиям "
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
},
"id": "eqAQs0YIhkyq",
"outputId": "d0bab15c-91be-41a2-ef6f-82f2faf5e702"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"5 Spain Deep, dense and pure from the opening bell, th... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"5 Numanthia 95 73.0 Northern Spain \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"5 Toro NaN Tinta de Toro \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude \n",
"5 Numanthia "
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.loc[:5,:] # Отобразим строки с индексом от 0 до 5 (то же, что и data.loc[:5])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NGugpgJfhkyv"
},
"source": [
"#### Отбор по строчкам и столбцам. Версия 3. По номеру строк и столбцов"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "CG0aSTW8hkyv",
"outputId": "af2503a1-524e-431c-f61e-5d0a5590a9b1",
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>description</th>\n",
" <th>points</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>96</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Elegance, complexity and structure come togeth...</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>First made in 2006, this succulent luscious Ch...</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Heitz has made this stellar rosé from the rare...</td>\n",
" <td>95</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" description points\n",
"0 This tremendous 100% varietal wine hails from ... 96\n",
"5 Deep, dense and pure from the opening bell, th... 95\n",
"10 Elegance, complexity and structure come togeth... 95\n",
"15 First made in 2006, this succulent luscious Ch... 95\n",
"20 Heitz has made this stellar rosé from the rare... 95"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.iloc[::5, [1,3]].head() # Отобразим каждую 5 строку и 1 и 3 столбец"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sIao6149hkyy"
},
"source": [
"#### Отбор с условиями\n",
"\n",
"Так, а если мне нужны вина дороже $15 долларов? Как быть?"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"id": "YsIrLdRnhkyy"
},
"outputs": [],
"source": [
"#задаем маску\n",
"mask = data['price'] > 15"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 119
},
"id": "nfVB6YBbhky0",
"outputId": "ebfb750f-f4f2-43a3-c62b-4e39273046de"
},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
"Name: price, dtype: bool"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mask.head() # Отобразим маску"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
},
"id": "FADnit0Ghky2",
"outputId": "16cf6881-4c3c-408e-f661-4141182143d5"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150924</th>\n",
" <td>France</td>\n",
" <td>Really fine for a low-acid vintage, there's an...</td>\n",
" <td>Diamant Bleu</td>\n",
" <td>91</td>\n",
" <td>70.0</td>\n",
" <td>Champagne</td>\n",
" <td>Champagne</td>\n",
" <td>NaN</td>\n",
" <td>Champagne Blend</td>\n",
" <td>Heidsieck &amp; Co Monopole</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150925</th>\n",
" <td>Italy</td>\n",
" <td>Many people feel Fiano represents southern Ita...</td>\n",
" <td>NaN</td>\n",
" <td>91</td>\n",
" <td>20.0</td>\n",
" <td>Southern Italy</td>\n",
" <td>Fiano di Avellino</td>\n",
" <td>NaN</td>\n",
" <td>White Blend</td>\n",
" <td>Feudi di San Gregorio</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150926</th>\n",
" <td>France</td>\n",
" <td>Offers an intriguing nose with ginger, lime an...</td>\n",
" <td>Cuvée Prestige</td>\n",
" <td>91</td>\n",
" <td>27.0</td>\n",
" <td>Champagne</td>\n",
" <td>Champagne</td>\n",
" <td>NaN</td>\n",
" <td>Champagne Blend</td>\n",
" <td>H.Germain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150927</th>\n",
" <td>Italy</td>\n",
" <td>This classic example comes from a cru vineyard...</td>\n",
" <td>Terre di Dora</td>\n",
" <td>91</td>\n",
" <td>20.0</td>\n",
" <td>Southern Italy</td>\n",
" <td>Fiano di Avellino</td>\n",
" <td>NaN</td>\n",
" <td>White Blend</td>\n",
" <td>Terredora</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150928</th>\n",
" <td>France</td>\n",
" <td>A perfect salmon shade, with scents of peaches...</td>\n",
" <td>Grand Brut Rosé</td>\n",
" <td>90</td>\n",
" <td>52.0</td>\n",
" <td>Champagne</td>\n",
" <td>Champagne</td>\n",
" <td>NaN</td>\n",
" <td>Champagne Blend</td>\n",
" <td>Gosset</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>103342 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"... ... ... \n",
"150924 France Really fine for a low-acid vintage, there's an... \n",
"150925 Italy Many people feel Fiano represents southern Ita... \n",
"150926 France Offers an intriguing nose with ginger, lime an... \n",
"150927 Italy This classic example comes from a cru vineyard... \n",
"150928 France A perfect salmon shade, with scents of peaches... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"... ... ... ... ... \n",
"150924 Diamant Bleu 91 70.0 Champagne \n",
"150925 NaN 91 20.0 Southern Italy \n",
"150926 Cuvée Prestige 91 27.0 Champagne \n",
"150927 Terre di Dora 91 20.0 Southern Italy \n",
"150928 Grand Brut Rosé 90 52.0 Champagne \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"... ... ... ... \n",
"150924 Champagne NaN Champagne Blend \n",
"150925 Fiano di Avellino NaN White Blend \n",
"150926 Champagne NaN Champagne Blend \n",
"150927 Fiano di Avellino NaN White Blend \n",
"150928 Champagne NaN Champagne Blend \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude \n",
"... ... \n",
"150924 Heidsieck & Co Monopole \n",
"150925 Feudi di San Gregorio \n",
"150926 H.Germain \n",
"150927 Terredora \n",
"150928 Gosset \n",
"\n",
"[103342 rows x 10 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#и отбираем данные\n",
"temp = data[mask] # Выбираем данные из датафрейма в соответствии с маской и записываем их в новый даатафрейм temp\n",
"temp # Отображаем temp"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "EHT3VYtNhky4",
"outputId": "84091e0c-6995-4d73-b63a-c69aeec6ecd4"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>US</td>\n",
" <td>This blockbuster, powerhouse of a wine suggest...</td>\n",
" <td>Rainin Vineyard</td>\n",
" <td>95</td>\n",
" <td>325.0</td>\n",
" <td>California</td>\n",
" <td>Diamond Mountain District</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Hall</td>\n",
" </tr>\n",
" <tr>\n",
" <th>898</th>\n",
" <td>Italy</td>\n",
" <td>Aromas of crushed plum, asphalt, oak, toast, e...</td>\n",
" <td>Sorì Tildin</td>\n",
" <td>92</td>\n",
" <td>500.0</td>\n",
" <td>Piedmont</td>\n",
" <td>Langhe</td>\n",
" <td>NaN</td>\n",
" <td>Red Blend</td>\n",
" <td>Gaja</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2145</th>\n",
" <td>France</td>\n",
" <td>Full of ripe fruit, opulent and concentrated, ...</td>\n",
" <td>NaN</td>\n",
" <td>100</td>\n",
" <td>848.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style White Blend</td>\n",
" <td>Château Haut-Brion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2155</th>\n",
" <td>France</td>\n",
" <td>There is a sense of pure juicy black-currant f...</td>\n",
" <td>NaN</td>\n",
" <td>97</td>\n",
" <td>450.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Margaux</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château Margaux</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2159</th>\n",
" <td>France</td>\n",
" <td>With seriously dense tannins, this shows great...</td>\n",
" <td>NaN</td>\n",
" <td>97</td>\n",
" <td>330.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château Haut-Brion</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"16 US This blockbuster, powerhouse of a wine suggest... \n",
"898 Italy Aromas of crushed plum, asphalt, oak, toast, e... \n",
"2145 France Full of ripe fruit, opulent and concentrated, ... \n",
"2155 France There is a sense of pure juicy black-currant f... \n",
"2159 France With seriously dense tannins, this shows great... \n",
"\n",
" designation points price province region_1 \\\n",
"16 Rainin Vineyard 95 325.0 California Diamond Mountain District \n",
"898 Sorì Tildin 92 500.0 Piedmont Langhe \n",
"2145 NaN 100 848.0 Bordeaux Pessac-Léognan \n",
"2155 NaN 97 450.0 Bordeaux Margaux \n",
"2159 NaN 97 330.0 Bordeaux Pessac-Léognan \n",
"\n",
" region_2 variety winery \n",
"16 Napa Cabernet Sauvignon Hall \n",
"898 NaN Red Blend Gaja \n",
"2145 NaN Bordeaux-style White Blend Château Haut-Brion \n",
"2155 NaN Bordeaux-style Red Blend Château Margaux \n",
"2159 NaN Bordeaux-style Red Blend Château Haut-Brion "
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[data.price>300].head()# Альтернативный вариант"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 616
},
"id": "Moi8GwyVhky8",
"outputId": "f4020760-204a-42f0-9df3-28242583e16e"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>US</td>\n",
" <td>This blockbuster, powerhouse of a wine suggest...</td>\n",
" <td>Rainin Vineyard</td>\n",
" <td>95</td>\n",
" <td>325.0</td>\n",
" <td>California</td>\n",
" <td>Diamond Mountain District</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Hall</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>France</td>\n",
" <td>Coming from a seven-acre vineyard named after ...</td>\n",
" <td>Le Pigeonnier</td>\n",
" <td>95</td>\n",
" <td>290.0</td>\n",
" <td>Southwest France</td>\n",
" <td>Cahors</td>\n",
" <td>NaN</td>\n",
" <td>Malbec</td>\n",
" <td>Château Lagrézette</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2145</th>\n",
" <td>France</td>\n",
" <td>Full of ripe fruit, opulent and concentrated, ...</td>\n",
" <td>NaN</td>\n",
" <td>100</td>\n",
" <td>848.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style White Blend</td>\n",
" <td>Château Haut-Brion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2155</th>\n",
" <td>France</td>\n",
" <td>There is a sense of pure juicy black-currant f...</td>\n",
" <td>NaN</td>\n",
" <td>97</td>\n",
" <td>450.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Margaux</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château Margaux</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2159</th>\n",
" <td>France</td>\n",
" <td>With seriously dense tannins, this shows great...</td>\n",
" <td>NaN</td>\n",
" <td>97</td>\n",
" <td>330.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château Haut-Brion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2434</th>\n",
" <td>France</td>\n",
" <td>With 83% Sémillon in the blend, this wine has ...</td>\n",
" <td>NaN</td>\n",
" <td>97</td>\n",
" <td>698.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style White Blend</td>\n",
" <td>Château La Mission Haut-Brion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2673</th>\n",
" <td>France</td>\n",
" <td>As with Clos de Vougeot in red, every producer...</td>\n",
" <td>NaN</td>\n",
" <td>90</td>\n",
" <td>238.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Corton-Charlemagne</td>\n",
" <td>NaN</td>\n",
" <td>Chardonnay</td>\n",
" <td>Jean-Luc and Paul Aegerter</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2733</th>\n",
" <td>France</td>\n",
" <td>Richly endowed, the wine is beautifully concen...</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>202.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pessac-Léognan</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château La Mission Haut-Brion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2742</th>\n",
" <td>France</td>\n",
" <td>This is a powerfully structured wine from a 18...</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>250.0</td>\n",
" <td>Bordeaux</td>\n",
" <td>Pomerol</td>\n",
" <td>NaN</td>\n",
" <td>Bordeaux-style Red Blend</td>\n",
" <td>Château Trotanoy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7796</th>\n",
" <td>France</td>\n",
" <td>This is full of fruit and weighty tannins. Com...</td>\n",
" <td>NaN</td>\n",
" <td>96</td>\n",
" <td>240.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Chambertin</td>\n",
" <td>NaN</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Domaine Rossignol-Trapet</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7798</th>\n",
" <td>France</td>\n",
" <td>Initially, this is a richly ripe wine with tro...</td>\n",
" <td>NaN</td>\n",
" <td>96</td>\n",
" <td>315.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Chevalier-Montrachet</td>\n",
" <td>NaN</td>\n",
" <td>Chardonnay</td>\n",
" <td>Bouchard Père &amp; Fils</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8071</th>\n",
" <td>France</td>\n",
" <td>Rounded and rich, it is full of dark fruits an...</td>\n",
" <td>Clos des Cortons Faiveley</td>\n",
" <td>95</td>\n",
" <td>225.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Corton-Rognet</td>\n",
" <td>NaN</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Domaine Faiveley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8072</th>\n",
" <td>France</td>\n",
" <td>This is the big one, one of the most treasured...</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>520.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Montrachet</td>\n",
" <td>NaN</td>\n",
" <td>Chardonnay</td>\n",
" <td>Louis Latour</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8074</th>\n",
" <td>France</td>\n",
" <td>A small, acre-sized parcel gives a wine that h...</td>\n",
" <td>NaN</td>\n",
" <td>95</td>\n",
" <td>300.0</td>\n",
" <td>Burgundy</td>\n",
" <td>Chambertin Clos de Bèze</td>\n",
" <td>NaN</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Chanson Père et Fils</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country ... winery\n",
"0 US ... Heitz\n",
"16 US ... Hall\n",
"18 France ... Château Lagrézette\n",
"2145 France ... Château Haut-Brion\n",
"2155 France ... Château Margaux\n",
"2159 France ... Château Haut-Brion\n",
"2434 France ... Château La Mission Haut-Brion\n",
"2673 France ... Jean-Luc and Paul Aegerter\n",
"2733 France ... Château La Mission Haut-Brion\n",
"2742 France ... Château Trotanoy\n",
"7796 France ... Domaine Rossignol-Trapet\n",
"7798 France ... Bouchard Père & Fils\n",
"8071 France ... Domaine Faiveley\n",
"8072 France ... Louis Latour\n",
"8074 France ... Chanson Père et Fils\n",
"\n",
"[15 rows x 10 columns]"
]
},
"execution_count": 44,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"data[(data.price > 200) & ((data.country == 'US') | (data.country == 'France'))].head(15) # Составное условие"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Da0WfR_5hky_"
},
"outputs": [],
"source": [
"### Мультииндексация"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "gIqtjR45hky_",
"outputId": "48a1fdf1-4c7f-4c3c-8e76-b9e773cb15c7"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head() # Отобразим наш датафрем"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "JizHrXguhkzC",
"outputId": "4b793963-14d9-4f87-bb3f-b26bb14d2d8e"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" <tr>\n",
" <th>country</th>\n",
" <th>price</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Albania</th>\n",
" <th>20.0</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"7\" valign=\"top\">Argentina</th>\n",
" <th>4.0</th>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5.0</th>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6.0</th>\n",
" <td>74</td>\n",
" <td>19</td>\n",
" <td>74</td>\n",
" <td>74</td>\n",
" <td>74</td>\n",
" <td>0</td>\n",
" <td>74</td>\n",
" <td>74</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7.0</th>\n",
" <td>61</td>\n",
" <td>19</td>\n",
" <td>61</td>\n",
" <td>61</td>\n",
" <td>61</td>\n",
" <td>0</td>\n",
" <td>61</td>\n",
" <td>61</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>230.0</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>250.0</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">Australia</th>\n",
" <th>5.0</th>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6.0</th>\n",
" <td>9</td>\n",
" <td>2</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>0</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7.0</th>\n",
" <td>51</td>\n",
" <td>21</td>\n",
" <td>51</td>\n",
" <td>51</td>\n",
" <td>51</td>\n",
" <td>0</td>\n",
" <td>51</td>\n",
" <td>51</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" description designation points province region_1 \\\n",
"country price \n",
"Albania 20.0 2 0 2 2 0 \n",
"Argentina 4.0 3 2 3 3 3 \n",
" 5.0 9 1 9 9 9 \n",
" 6.0 74 19 74 74 74 \n",
" 7.0 61 19 61 61 61 \n",
"... ... ... ... ... ... \n",
" 230.0 2 2 2 2 2 \n",
" 250.0 1 1 1 1 1 \n",
"Australia 5.0 11 0 11 11 11 \n",
" 6.0 9 2 9 9 9 \n",
" 7.0 51 21 51 51 51 \n",
"\n",
" region_2 variety winery \n",
"country price \n",
"Albania 20.0 0 2 2 \n",
"Argentina 4.0 0 3 3 \n",
" 5.0 0 9 9 \n",
" 6.0 0 74 74 \n",
" 7.0 0 61 61 \n",
"... ... ... ... \n",
" 230.0 0 2 2 \n",
" 250.0 0 1 1 \n",
"Australia 5.0 0 11 11 \n",
" 6.0 0 9 9 \n",
" 7.0 0 51 51 \n",
"\n",
"[100 rows x 8 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_ = data.groupby(['country', 'price']).count() # Сграппируем данные сначала по странам, а затем по price\n",
"data_.head(100) # Отобразим первые 50 строк нового датафрейма"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 450
},
"id": "hE9aG1imhkzG",
"outputId": "b2abe0e9-93f7-4044-e88c-844918452e52"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" <tr>\n",
" <th>price</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4.0</th>\n",
" <td>9</td>\n",
" <td>2</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5.0</th>\n",
" <td>33</td>\n",
" <td>16</td>\n",
" <td>33</td>\n",
" <td>33</td>\n",
" <td>33</td>\n",
" <td>32</td>\n",
" <td>33</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6.0</th>\n",
" <td>58</td>\n",
" <td>13</td>\n",
" <td>58</td>\n",
" <td>58</td>\n",
" <td>58</td>\n",
" <td>58</td>\n",
" <td>58</td>\n",
" <td>58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7.0</th>\n",
" <td>206</td>\n",
" <td>58</td>\n",
" <td>206</td>\n",
" <td>206</td>\n",
" <td>206</td>\n",
" <td>203</td>\n",
" <td>206</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8.0</th>\n",
" <td>458</td>\n",
" <td>188</td>\n",
" <td>458</td>\n",
" <td>458</td>\n",
" <td>456</td>\n",
" <td>444</td>\n",
" <td>458</td>\n",
" <td>458</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>350.0</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>450.0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>500.0</th>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>625.0</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013.0</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>149 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" description designation points province region_1 region_2 \\\n",
"price \n",
"4.0 9 2 9 9 9 9 \n",
"5.0 33 16 33 33 33 32 \n",
"6.0 58 13 58 58 58 58 \n",
"7.0 206 58 206 206 206 203 \n",
"8.0 458 188 458 458 456 444 \n",
"... ... ... ... ... ... ... \n",
"350.0 5 0 5 5 5 4 \n",
"450.0 1 0 1 1 1 1 \n",
"500.0 3 2 3 3 3 3 \n",
"625.0 2 2 2 2 2 2 \n",
"2013.0 1 1 1 1 1 1 \n",
"\n",
" variety winery \n",
"price \n",
"4.0 9 9 \n",
"5.0 33 33 \n",
"6.0 58 58 \n",
"7.0 206 206 \n",
"8.0 458 458 \n",
"... ... ... \n",
"350.0 5 5 \n",
"450.0 1 1 \n",
"500.0 3 3 \n",
"625.0 2 2 \n",
"2013.0 1 1 \n",
"\n",
"[149 rows x 8 columns]"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_.loc['US'] # Отобразим все данные для 'US'"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 170
},
"id": "ZteYvkfehkzL",
"outputId": "938087ed-492f-4ddf-c3f1-16ca3b38f331"
},
"outputs": [
{
"data": {
"text/plain": [
"description 318\n",
"designation 261\n",
"points 318\n",
"province 318\n",
"region_1 318\n",
"region_2 317\n",
"variety 318\n",
"winery 318\n",
"Name: (US, 100.0), dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_.loc['US', 100] # Отобразим данные для 'US', у кого 100 points"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6o6JX1OnhkzP"
},
"outputs": [],
"source": [
"#### Как изменять значения в табличке"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "sVulAc0HsaLu",
"outputId": "6fca9bb4-357b-4398-e509-6191a1ee9e74"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude "
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_backup = data.copy() # Создаем копию нашего датафрейма и записываем в переменную data_backup\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"id": "eMhSX4jqhkzP",
"outputId": "5ee71b23-0935-46c4-925f-912e28eeda25"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>kotiki</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>129</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>new</td>\n",
" <td>new</td>\n",
" <td>new</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>new</td>\n",
" <td>new</td>\n",
" <td>new</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Spain</td>\n",
" <td>Deep, dense and pure from the opening bell, th...</td>\n",
" <td>Numanthia</td>\n",
" <td>95</td>\n",
" <td>73.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Numanthia</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Spain</td>\n",
" <td>Slightly gritty black-fruit aromas include a s...</td>\n",
" <td>San Román</td>\n",
" <td>95</td>\n",
" <td>65.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Maurodos</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Spain</td>\n",
" <td>Lush cedary black-fruit aromas are luxe and of...</td>\n",
" <td>Carodorum Único Crianza</td>\n",
" <td>95</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US kotiki \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"5 Spain Deep, dense and pure from the opening bell, th... \n",
"6 Spain Slightly gritty black-fruit aromas include a s... \n",
"7 Spain Lush cedary black-fruit aromas are luxe and of... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 129 96 90.0 California \n",
"3 new new new Oregon \n",
"4 new new new Provence \n",
"5 Numanthia 95 73.0 Northern Spain \n",
"6 San Román 95 65.0 Northern Spain \n",
"7 Carodorum Único Crianza 95 110.0 Northern Spain \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"5 Toro NaN Tinta de Toro \n",
"6 Toro NaN Tinta de Toro \n",
"7 Toro NaN Tinta de Toro \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude \n",
"5 Numanthia \n",
"6 Maurodos \n",
"7 Bodega Carmen Rodríguez "
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.iloc[0,1] = 'kotiki' # Вставляем новое значение в 0 строку и 1 стоблец\n",
"data.iloc[2,2] = '129' # Вставляем новое значение в 2 строку и 2 стоблец\n",
"data.iloc[3:5,2:5] = 'new' # Вставляем новое значение с 3 по 5 строку и со 2го по 5ый стоблец\n",
"data.head(8)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"id": "RZvBCkMCsiOT"
},
"outputs": [],
"source": [
"data = data_backup.copy() # Восстанавливаем данные из копии"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "SNliNx3TPCTX",
"outputId": "2f0ccacf-df6b-4499-c844-12be866957dc"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>96</td>\n",
" <td>235.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Napa</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>96</td>\n",
" <td>110.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Sonoma</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 96 235.0 California \n",
"1 Carodorum Selección Especial Reserva 96 110.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"\n",
" region_1 region_2 variety \\\n",
"0 Napa Valley Napa Cabernet Sauvignon \n",
"1 Toro NaN Tinta de Toro \n",
"2 Knights Valley Sonoma Sauvignon Blanc \n",
"3 Willamette Valley Willamette Valley Pinot Noir \n",
"4 Bandol NaN Provence red blend \n",
"\n",
" winery \n",
"0 Heitz \n",
"1 Bodega Carmen Rodríguez \n",
"2 Macauley \n",
"3 Ponzi \n",
"4 Domaine de la Bégude "
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"id": "qXlU-wqyhkzT",
"outputId": "b65e6f8a-0562-43d7-b09a-ff2c8f9eab35"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>country</th>\n",
" <th>description</th>\n",
" <th>designation</th>\n",
" <th>points</th>\n",
" <th>price</th>\n",
" <th>province</th>\n",
" <th>region_1</th>\n",
" <th>region_2</th>\n",
" <th>variety</th>\n",
" <th>winery</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>US</td>\n",
" <td>This tremendous 100% varietal wine hails from ...</td>\n",
" <td>Martha's Vineyard</td>\n",
" <td>200</td>\n",
" <td>1000.0</td>\n",
" <td>California</td>\n",
" <td>Napa Valley</td>\n",
" <td>Syberia</td>\n",
" <td>Cabernet Sauvignon</td>\n",
" <td>Heitz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Spain</td>\n",
" <td>Ripe aromas of fig, blackberry and cassis are ...</td>\n",
" <td>Carodorum Selección Especial Reserva</td>\n",
" <td>200</td>\n",
" <td>1000.0</td>\n",
" <td>Northern Spain</td>\n",
" <td>Toro</td>\n",
" <td>NaN</td>\n",
" <td>Tinta de Toro</td>\n",
" <td>Bodega Carmen Rodríguez</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>US</td>\n",
" <td>Mac Watson honors the memory of a wine once ma...</td>\n",
" <td>Special Selected Late Harvest</td>\n",
" <td>96</td>\n",
" <td>90.0</td>\n",
" <td>California</td>\n",
" <td>Knights Valley</td>\n",
" <td>Syberia</td>\n",
" <td>Sauvignon Blanc</td>\n",
" <td>Macauley</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>US</td>\n",
" <td>This spent 20 months in 30% new French oak, an...</td>\n",
" <td>Reserve</td>\n",
" <td>96</td>\n",
" <td>65.0</td>\n",
" <td>Oregon</td>\n",
" <td>Willamette Valley</td>\n",
" <td>Syberia</td>\n",
" <td>Pinot Noir</td>\n",
" <td>Ponzi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>France</td>\n",
" <td>This is the top wine from La Bégude, named aft...</td>\n",
" <td>La Brûlade</td>\n",
" <td>95</td>\n",
" <td>66.0</td>\n",
" <td>Provence</td>\n",
" <td>Bandol</td>\n",
" <td>NaN</td>\n",
" <td>Provence red blend</td>\n",
" <td>Domaine de la Bégude</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" country description \\\n",
"0 US This tremendous 100% varietal wine hails from ... \n",
"1 Spain Ripe aromas of fig, blackberry and cassis are ... \n",
"2 US Mac Watson honors the memory of a wine once ma... \n",
"3 US This spent 20 months in 30% new French oak, an... \n",
"4 France This is the top wine from La Bégude, named aft... \n",
"\n",
" designation points price province \\\n",
"0 Martha's Vineyard 200 1000.0 California \n",
"1 Carodorum Selección Especial Reserva 200 1000.0 Northern Spain \n",
"2 Special Selected Late Harvest 96 90.0 California \n",
"3 Reserve 96 65.0 Oregon \n",
"4 La Brûlade 95 66.0 Provence \n",
"\n",
" region_1 region_2 variety winery \n",
"0 Napa Valley Syberia Cabernet Sauvignon Heitz \n",
"1 Toro NaN Tinta de Toro Bodega Carmen Rodríguez \n",
"2 Knights Valley Syberia Sauvignon Blanc Macauley \n",
"3 Willamette Valley Syberia Pinot Noir Ponzi \n",
"4 Bandol NaN Provence red blend Domaine de la Bégude "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.loc[data.country == 'US', 'region_2'] = 'Syberia'\n",
"data.loc[data.price > 100, 'points'] = 200\n",
"data.loc[data.price > 100, 'price'] = 1000\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6PumqSIU7Q7U"
},
"outputs": [],
"source": [
"## Перевод в Numpy\n"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"id": "Obs9TzQ9E8ss",
"outputId": "7ea94ae7-0ee5-4773-be88-14ab92522349"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(150930, 10)\n"
]
},
{
"data": {
"text/plain": [
"dtype('O')"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np_data = data.values # Получаем данные из датафрейма и записываем их в переменную np_data\n",
"print(np_data.shape) # Выводим размерность np_data\n",
"np_data.dtype"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 105
},
"id": "y1n1cNpdrFqQ",
"outputId": "f6f3c2df-9732-40ed-8569-5d53223b9a24"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['US'\n",
" 'This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 20222030.'\n",
" \"Martha's Vineyard\" 200 1000.0 'California' 'Napa Valley' 'Syberia'\n",
" 'Cabernet Sauvignon' 'Heitz']\n"
]
}
],
"source": [
"print(np_data[0]) # Выводим 0ой элемент из массива"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 717
},
"id": "UsAcU8mBnWwn",
"outputId": "4eec37c8-c418-4140-de98-c4ec5b6bbe8b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['US'\n",
" 'This tremendous 100% varietal wine hails from Oakville and was aged over three years in oak. Juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. Balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. Enjoy 20222030.'\n",
" \"Martha's Vineyard\" 200 1000.0 'California' 'Napa Valley' 'Syberia'\n",
" 'Cabernet Sauvignon' 'Heitz']\n",
"['Spain'\n",
" 'Ripe aromas of fig, blackberry and cassis are softened and sweetened by a slathering of oaky chocolate and vanilla. This is full, layered, intense and cushioned on the palate, with rich flavors of chocolaty black fruits and baking spices. A toasty, everlasting finish is heady but ideally balanced. Drink through 2023.'\n",
" 'Carodorum Selección Especial Reserva' 200 1000.0 'Northern Spain' 'Toro'\n",
" nan 'Tinta de Toro' 'Bodega Carmen Rodríguez']\n",
"['US'\n",
" 'Mac Watson honors the memory of a wine once made by his mother in this tremendously delicious, balanced and complex botrytised white. Dark gold in color, it layers toasted hazelnut, pear compote and orange peel flavors, reveling in the succulence of its 122 g/L of residual sugar.'\n",
" 'Special Selected Late Harvest' 96 90.0 'California' 'Knights Valley'\n",
" 'Syberia' 'Sauvignon Blanc' 'Macauley']\n",
"['US'\n",
" \"This spent 20 months in 30% new French oak, and incorporates fruit from Ponzi's Aurora, Abetina and Madrona vineyards, among others. Aromatic, dense and toasty, it deftly blends aromas and flavors of toast, cigar box, blackberry, black cherry, coffee and graphite. Tannins are polished to a fine sheen, and frame a finish loaded with dark chocolate and espresso. Drink now through 2032.\"\n",
" 'Reserve' 96 65.0 'Oregon' 'Willamette Valley' 'Syberia' 'Pinot Noir'\n",
" 'Ponzi']\n",
"['France'\n",
" 'This is the top wine from La Bégude, named after the highest point in the vineyard at 1200 feet. It has structure, density and considerable acidity that is still calming down. With 18 months in wood, the wine has developing an extra richness and concentration. Produced by the Tari family, formerly of Château Giscours in Margaux, it is a wine made for aging. Drink from 2020.'\n",
" 'La Brûlade' 95 66.0 'Provence' 'Bandol' nan 'Provence red blend'\n",
" 'Domaine de la Bégude']\n",
"['Spain'\n",
" 'Deep, dense and pure from the opening bell, this Toro is a winner. Aromas of dark ripe black fruits are cool and moderately oaked. This feels massive on the palate but sensationally balanced. Flavors of blackberry, coffee, mocha and toasty oak finish spicy, smooth and heady. Drink this exemplary Toro through 2023.'\n",
" 'Numanthia' 95 73.0 'Northern Spain' 'Toro' nan 'Tinta de Toro'\n",
" 'Numanthia']\n",
"['Spain'\n",
" \"Slightly gritty black-fruit aromas include a sweet note of pastry along with a hint of prune. Wall-to-wall saturation ensures that all corners of one's mouth are covered. Flavors of blackberry, mocha and chocolate are highly impressive and expressive, while this settles nicely on a long finish. Drink now through 2024.\"\n",
" 'San Román' 95 65.0 'Northern Spain' 'Toro' nan 'Tinta de Toro'\n",
" 'Maurodos']\n",
"['Spain'\n",
" 'Lush cedary black-fruit aromas are luxe and offer notes of marzipan and vanilla. This bruiser is massive and tannic on the palate, but still lush and friendly. Chocolate is a key flavor, while baked berry and cassis flavors are hardly wallflowers. On the finish, this is tannic and deep as a sea trench. Drink this saturated black-colored Toro through 2023.'\n",
" 'Carodorum Único Crianza' 200 1000.0 'Northern Spain' 'Toro' nan\n",
" 'Tinta de Toro' 'Bodega Carmen Rodríguez']\n",
"['US'\n",
" \"This re-named vineyard was formerly bottled as deLancellotti. You'll find striking minerality underscoring chunky black fruits. Accents of citrus and graphite comingle, with exceptional midpalate concentration. This is a wine to cellar, though it is already quite enjoyable. Drink now through 2030.\"\n",
" 'Silice' 95 65.0 'Oregon' 'Chehalem Mountains' 'Syberia' 'Pinot Noir'\n",
" 'Bergström']\n",
"['US'\n",
" 'The producer sources from two blocks of the vineyard for this wine—one at a high elevation, which contributes bright acidity. Crunchy cranberry, pomegranate and orange peel flavors surround silky, succulent layers of texture that present as fleshy fruit. That delicately lush flavor has considerable length.'\n",
" \"Gap's Crown Vineyard\" 95 60.0 'California' 'Sonoma Coast' 'Syberia'\n",
" 'Pinot Noir' 'Blue Farm']\n"
]
}
],
"source": [
"# Выведем первые 10 элементов из np_data\n",
"for i in range(10):\n",
" print(np_data[i])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8F6G2AUnKFCA"
},
"source": [
"# **Глоссарий**\n",
"\n",
"\n",
"pd.DataFrame(данные, columns = [колонки, если есть], index = [индексы ,если есть]) - создать датафрейм\n",
"\n",
"pd.read_csv(полный адрес расположения файла) - открыть .csv файл\n",
"\n",
"------------\n",
"\n",
".head() - посмотреть верхушку датафрейма (первые n строк)\n",
"\n",
".tail() - посмотреть конец датафрейма (последние n строк)\n",
"\n",
".columns - список колонок датафрейма\n",
"\n",
".values - вывести массив всех значений датафрейма\n",
"\n",
".index - список индексов датафрейма\n",
"\n",
".tolist() - перевести в список\n",
"\n",
".count() - посчитать количество определенных величин во фрейме\n",
"\n",
".describe() - посмотреть основные статистические характеристики фрейма\n",
"\n",
".shape - форма фрейма (строки, колонки)\n",
"\n",
".size - размер фрейма строки*колонки\n",
"\n",
".info() - информация о данных каждой колонки\n",
"\n",
".dtypes - тип данных каждой колонки\n",
"\n",
".isnull() - где недостает значений\n",
"\n",
".isna()- есть ли значения None\n",
"\n",
".dropna() - выкинуть строки/колонки с None\n",
"\n",
".fillna() - заполнить заданным значеним ячейки, где есть None\n",
"\n",
".loc[] - вывести значения по названиям колонок\n",
"\n",
".iloc[] - вывести значения по индексам колонок\n",
"\n",
".drop() - выкинуть определенные значения\n",
"\n",
"--------------\n",
"\n",
"pd.to_datetime(колонка, которую переводим в формат временного ряда)\n",
"\n",
".groupby() - сгруппировать по конкретному признаку\n",
"\n",
".copy() - создать копию\n",
"\n",
".sort_values() - сортировка значений\n",
"\n",
"pd.concat([df1,df2]) - конкатенация фреймов\n",
"\n",
".merge(второй_датафрейм, on = 'общая колонка, по которой склеиваем', how = 'с какой стороны') - конкатенация фреймов через общий признак\n",
"\n",
"-------------\n",
"\n",
"\n",
".corr() - вычислить корреляцию\n",
"\n",
".median() - вычислить медиану\n",
"\n",
".cumsum() - вычислить куммулятивную сумму\n",
"\n",
".cumprod() - вычислить коммулятивное произведение\n",
"\n",
".cummax() - вычислить коммулятивный максимум\n",
"\n",
"-------------\n",
"\n",
".quantile([]) - вычислить квантили\n",
"\n",
".nunique() - уникальные значения для n-колонок/строк\n",
"\n",
".unique() - уникальные значения определенной колонки/строк\n",
"\n",
"------------\n",
"\n",
".apply(функция) - применить функцию для колонки/строки\n",
"\n",
".agg(наборункций) - применить ряд функций для колонки/строки\n"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 1
}