ld2daps/Notebooks/Attribute Profiles Classifier.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Attribute Profiles Classification Prototype\n",
    "\n",
    "Ground classification (2D) of LiDAR data with Attribute Profiles (APs) on pre-calculated rasters from LiDAR point cloud processing (DEMs, intensity maps...).\n",
    "\n",
    "We will use the LD2DAPs package to compute profiles and try to generalize the process to automatise the classification."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "### Packages\n",
    "\n",
    "#### Attributes Profile"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "sys.path.append(str(Path('..').resolve()))\n",
    "import ld2dap\n",
    "\n",
    "sys.path.append(str(Path('../triskele/python').resolve()))\n",
    "import triskele"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import metrics\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "import pandas as pd\n",
    "import pickle\n",
    "from CrossValidationGenerator import APsCVG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fuctions and constants"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "figsize=np.array([16,9]) * 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List of raster files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "layers_files = [\n",
    "    '../Data/phase1_rasters/DEM+B_C123/UH17_GEM051_TR.tif',\n",
    "    '../Data/phase1_rasters/DEM_C123_3msr/UH17_GEG051_TR.tif',\n",
    "    '../Data/phase1_rasters/DEM_C123_TLI/UH17_GEG05_TR.tif',\n",
    "    '../Data/phase1_rasters/DSM_C12/UH17c_GEF051_TR.tif',\n",
    "    '../Data/phase1_rasters/Intensity_C1/UH17_GI1F051_TR.tif',\n",
    "    '../Data/phase1_rasters/Intensity_C2/UH17_GI2F051_TR.tif',\n",
    "    '../Data/phase1_rasters/Intensity_C3/UH17_GI3F051_TR.tif',\n",
    "    #'../Data/ground_truth/2018_IEEE_GRSS_DFC_GT_TR.tif',\n",
    "    #'../Res/HVR/C123_num_returns_0_5_nearest.tif',\n",
    "    '../Res/HVR noisy/C123_num_returns_0_5_nearest.tif'\n",
    "\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**IDEA:** We could try to combinate rasters into new ones (e.g. $R_{DSM} - R_{DTM}$ to obtain trees and building height map)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the Profiles Pattern"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Basic APs classification flow:\n",
    "\n",
    "- Load rasters\n",
    "- Filter input rasters with a treshold value: for reasons DFC rasters are noisy with very high values\n",
    "- Construct filtered rasters with basic attributes profiles\n",
    "    + Area: [10, 100, 1e3, ..., 1e4]\n",
    "    + ...\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load and filter rasters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = ld2dap.LoadTIFF(layers_files)\n",
    "dfc_filter = ld2dap.Treshold(1e4)\n",
    "rasters_disp = ld2dap.ShowFig('all')\n",
    "\n",
    "dfc_filter.input = loader\n",
    "rasters_disp.input = dfc_filter\n",
    "\n",
    "loader.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute APs\n",
    "\n",
    "Choose area filter tresholds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "areas = [10., 100.]\n",
    "areas.extend([x * 1e3 for x in range(1,100,1)])\n",
    "plt.plot(areas, '.')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Disable previous display then add the APs node and the vectors output to the flow ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rasters_disp.input = None\n",
    "\n",
    "aps = ld2dap.AttributeProfiles(area=areas)\n",
    "aps.input = dfc_filter\n",
    "\n",
    "out_vectors = ld2dap.RawOutput()\n",
    "out_vectors.input = aps\n",
    "\n",
    "out_vectors.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Concatenate filtered rasters into pixel description vector\n",
    "- Split the vectors in train and test sets for cross validation with a spatial approach: random sampling is not good for spatial descriptors!\n",
    "- Random forests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "att = out_vectors.data\n",
    "att.shape, att.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ground Truth"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gt = triskele.read('../Data/ground_truth/2018_IEEE_GRSS_DFC_GT_TR.tif')\n",
    "\n",
    "plt.figure(figsize=figsize)\n",
    "plt.imshow(gt)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cross Valid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction = np.zeros_like(gt)\n",
    "\n",
    "for xt, xv, yt, yv, ti in APsCVG(gt, att, 5):\n",
    "    plt.imshow(ti * 1.)\n",
    "    plt.show()\n",
    "    \n",
    "    rfc = RandomForestClassifier(n_jobs=-1, random_state=0, n_estimators=100, verbose=True)\n",
    "    rfc.fit(xt, yt)\n",
    "    \n",
    "    ypred = rfc.predict(xv)\n",
    "    \n",
    "    prediction[ti] = ypred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=figsize)\n",
    "plt.imshow(prediction)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imsave('../Res/tmppred.png', prediction)\n",
    "plt.imsave('../Res/gt.png', gt)\n",
    "triskele.write('../Res/tmppred_8.tif', prediction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = attributes.reshape(-1, attributes.shape[2])\n",
    "\n",
    "(attributes[0,0] == X[0]).all()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "labels_file = Path('../Data/ground_truth/2018_IEEE_GRSS_DFC_GT_TR.tif')\n",
    "labels = triskele.read(labels_file)\n",
    "display(labels.shape)\n",
    "\n",
    "plt.figure(figsize=(16*2,3*2))\n",
    "plt.imshow(labels)\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y = labels.reshape(-1)\n",
    "\n",
    "X.shape, Y.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random Forest Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import importlib\n",
    "from sklearn import metrics\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "import pickle\n",
    "sys.path.insert(0, '..')\n",
    "import CrossValidationGenerator as cvg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "importlib.reload(cvg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import metrics\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "def scores(actual, prediction):\n",
    "    ct = pd.crosstab(prediction, actual,\n",
    "            rownames=['Prediction'], colnames=['Reference'],\n",
    "            margins=True, margins_name='Total',\n",
    "            normalize=False # all, index, columns\n",
    "            )\n",
    "    display(ct)\n",
    "    \n",
    "    scores = metrics.precision_recall_fscore_support(actual, prediction)\n",
    "    print(metrics.classification_report(actual, prediction))    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cv_labels = np.zeros(labels[:].shape)\n",
    "\n",
    "for xtrain, xtest, ytrain, ytest, train_index in cvg.CVG(attributes[:], labels[:], 10, 1): \n",
    "    rfc = RandomForestClassifier(n_jobs=-1, random_state=0, n_estimators=100, verbose=True)\n",
    "    rfc.fit(xtrain, ytrain)\n",
    "    \n",
    "    ypred = rfc.predict(xtest)\n",
    "    \n",
    "    display(ytest.shape, ypred.shape)\n",
    "    \n",
    "    scores(ytest, ypred)\n",
    "    \n",
    "    cv_labels[:,train_index == False] = ypred.reshape(cv_labels.shape[0], -1)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show(labels)\n",
    "show(cv_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imsave('../Res/labels.png', labels)\n",
    "plt.imsave('../Res/prediction.png', cv_labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scores(actual=labels.reshape(-1), prediction=cv_labels.reshape(-1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Labels\n",
    "\n",
    "\n",
    "    0 – Unclassified\n",
    "    1 – Healthy grass\n",
    "    2 – Stressed grass\n",
    "    3 – Artificial turf\n",
    "    4 – Evergreen trees\n",
    "    5 – Deciduous trees\n",
    "    6 – Bare earth\n",
    "    7 – Water\n",
    "    8 – Residential buildings\n",
    "    9 – Non-residential buildings\n",
    "    10 – Roads\n",
    "    11 – Sidewalks\n",
    "    12 – Crosswalks\n",
    "    13 – Major thoroughfares\n",
    "    14 – Highways\n",
    "    15 – Railways\n",
    "    16 – Paved parking lots\n",
    "    17 – Unpaved parking lots\n",
    "    18 – Cars\n",
    "    19 – Trains\n",
    "    20 – Stadium seats\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}