{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [], "user_expressions": [] }, "source": [ "# DBSCAN & クラスタリングの実用上の問題" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# 表形式のデータを操作するためのライブラリ\n", "import pandas as pd\n", "\n", "# 行列計算をおこなうためのライブラリ\n", "import numpy as np\n", "\n", "# 機械学習用ライブラリsklearnのKmeansクラス\n", "from sklearn.cluster import KMeans\n", "\n", "# 機械学習用ライブラリsklearnのDBSCANクラス\n", "from sklearn.cluster import DBSCAN\n", "\n", "# グラフ描画ライブラリ\n", "import matplotlib.pyplot as plt\n", "import matplotlib.colors as mcolors\n", "import seaborn as sns;\n", "sns.set(style='ticks')\n", "%matplotlib inline\n", "\n", "# ファイルの操作用\n", "import os\n", "\n", "\n", "# 警告文を表示させないおまじない\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [], "user_expressions": [] }, "source": [ "\n", "---\n", "## クイズ" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "(L4-Q1)=\n", "### Q1: クラスタ内誤差平方和\n", "\n", "[第3章で用いた人工データ](https://mlnote.hontolab.org/content/kmeans-and-hierarchical-clustering.html)(University of Eastern Finlandの計算学部が公開しているデータセット)に対して,K-meansクラスタリングを適用することを考える.\n", "データを可視化すれば最適なクラスタ数は推測できるが,ここでは最適なクラスタ数は未知であると仮定する.\n", "\n", "当該データをデータフレームに変換し,変数`s1_df`に格納しなさい.\n", "さらに,クラスタ数を3(`K=3`)として`s1_df`にK-meansクラスタリングを適用し,クラスタ内誤差平方和(SSE)を計算しなさい.\n", "\n", "※ ヒント: scikit-learnライブラリを用いた場合,SSEの値はクラスタリング実行後,モデルの`inertia_`プロパティにアクセスすれば取得できる([参考](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html))." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "y | \n", "
---|---|---|
0 | \n", "664159 | \n", "550946 | \n", "
1 | \n", "665845 | \n", "557965 | \n", "
2 | \n", "597173 | \n", "575538 | \n", "
3 | \n", "618600 | \n", "551446 | \n", "
4 | \n", "635690 | \n", "608046 | \n", "
5 | \n", "588100 | \n", "557588 | \n", "
6 | \n", "582015 | \n", "546191 | \n", "
7 | \n", "604678 | \n", "574577 | \n", "
8 | \n", "572029 | \n", "518313 | \n", "
9 | \n", "604737 | \n", "574591 | \n", "