{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quickstart\n",
    "\n",
    "This library provides functions to hone in on coincident datasets and visualize their coverage. \n",
    "\n",
    "The current recommended workflow is to:\n",
    "\n",
    "1. start with a lidar dataset \n",
    "1. reduce to region with coincident maxar stereo within an acceptable temporal range\n",
    "1. optionally reduce further with additional datasets such as icesat-2 and gedi altimetry\n",
    "\n",
    "This notebook provides an example starting from USGS 3DEP LiDAR in Colorado, USA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import geopandas as gpd\n",
    "import coincident\n",
    "\n",
    "print(coincident.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define an area of interest\n",
    "\n",
    "Searches often start with an spatial subset, in this notebook we know we are interested in datasets in Colorado. We recommend restricting searches to areas at the 'State' scale rather than Country or Global scales"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aoi = gpd.read_file(\n",
    "    \"https://raw.githubusercontent.com/unitedstates/districts/refs/heads/gh-pages/states/CO/shape.geojson\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aoi.explore(color=\"black\", style_kwds=dict(fill=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "this polygon for the state of Colorado is simple and has only 4 vertices in the corners, but it's good practice good to check the number of vertices you have before searching. The simpler your search polygons are the faster your searches will be!\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Original number of vertices:\", aoi.count_coordinates().iloc[0])\n",
    "aoi = aoi.simplify(0.01)\n",
    "print(\"Simplified number of vertices:\", aoi.count_coordinates().iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uniform search method\n",
    "\n",
    "the `coincident` package provides a [search()](#coincident.search.search) method that has the same syntax regardless of which dataset you are searching. Behind the scenes, polygons intersecting your area of interest are efficiently located and returned as a geodataframe. \n",
    "\n",
    "For 3DEP LiDAR, we start by searching bounding boxes for each 'workunit'. Once we identify a workunit, we load a precise polygon delineating the extent of LiDAR measurements.\n",
    "\n",
    "The search function is essentially a [pystac_client.ItemSearch](#pystac_client.ItemSearch) with an extra argument `dataset`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "coincident.datasets.aliases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf = coincident.search.search(\n",
    "    dataset=\"3dep\",\n",
    "    intersects=aoi,\n",
    "    datetime=[\"2018\", \"2024\"],\n",
    ")\n",
    "gf.explore(column=\"workunit\", popup=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# From this search we identify a specific lidar acquisition of interest\n",
    "gf = gf[gf.workunit == \"CO_WestCentral_2019\"]\n",
    "gf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each Workunit has a unique 'Feature ID', or 'FID' that can be used to efficiently retrieve the full-resultion detailed MultiPolygon footprint from the USGS's WESM.gpkg file in AWS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf.index.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### USGS 3DEP Lidar\n",
    "\n",
    "Some datasets have additional functions available to load auxiliary data. For example, we can load original high resolution polygon. In addition to loading \"swath\" polygons to understand exact days when acquisitions were made.\n",
    "\n",
    "```{warning}\n",
    "Swath polygons are not available for all LiDAR workunits. They are more likely to exist for acquisitions after 2018.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf_wesm = coincident.search.wesm.load_by_fid(fids=gf.index.values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf_wesm.explore(column=\"workunit\", popup=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf_wesm[[\"workunit\", \"start_datetime\", \"end_datetime\", \"duration\"]]  # duration in days"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within this 'workunit' data was collected over 29 days, in order to see when data was collected within this polygon we need to load a corresponding 'swath' polygon. `coincident` provides a helper function for this, which loads the swath polygon for a given workunit\n",
    "\n",
    "```{note}\n",
    "Swath polygons have detailed timing information for individual LiDAR flight lines composing a given 'workunit'. Not all workunits have swath polygons. They tend to be available for data collected after 2019.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: be patient here, reading shapefiles from s3 can be slow depending on your bandwidth\n",
    "gf_swath = coincident.search.wesm.get_swath_polygons(gf.workunit.iloc[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Number of swath polygons:\", len(gf_swath))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot them, simplify first for faster plotting\n",
    "gf_swath[\"geometry\"] = gf_swath.simplify(0.01)\n",
    "gf_swath.explore(column=\"dayofyear\", cmap=\"plasma\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see in the above figure, the actual day of observation for any point on the ground can vary in complex ways based on the flight paths taken during the LiDAR collection period."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maxar stereo search\n",
    "\n",
    "Now that we understand our lidar collection, let's return to the uniform search function, and now search for Maxar stereo pairs. We are only interested in pairs in the same date range of the lidar. Note that searching by a simplified polygon (in this case back to the convex hull we started with) is recommended as most APIs have limits on the number of polygon vertices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simplify search polygon and add date range\n",
    "pad = gpd.pd.Timedelta(days=14)\n",
    "date_range = [gf.start_datetime.iloc[0] - pad, gf.end_datetime.iloc[0] + pad]\n",
    "aoi = gf.geometry\n",
    "\n",
    "gf_maxar = coincident.search.search(\n",
    "    dataset=\"maxar\",\n",
    "    intersects=aoi,\n",
    "    datetime=date_range,\n",
    "    filter=\"eo:cloud_cover < 20\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(len(gf_maxar))\n",
    "gf_maxar.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with gpd.pd.option_context(\"display.max_rows\", None, \"display.max_columns\", None):\n",
    "    print(gf_maxar.iloc[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Group by stereo pair id (NOTE: assumes single id per mono acquisition)\n",
    "gf_maxar[\"stereo_pair_id\"] = gf_maxar.stereo_pair_identifiers.str[0]\n",
    "\n",
    "gf_stereo = gf_maxar.dissolve(by=\"stereo_pair_id\", as_index=False)\n",
    "gf_stereo.explore(column=\"stereo_pair_identifiers\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = gf_wesm.explore(color=\"black\")\n",
    "gf_maxar.explore(column=\"dayofyear\", cmap=\"plasma\", m=m, popup=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Restrict polygon footprints to area of overlap\n",
    "\n",
    "We will make regular use of the GeoPandas overlay() function to shrink original acquisition footprints so that they only overlap an area of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Consider area of overlap\n",
    "m = gf_wesm.explore(color=\"black\")\n",
    "# NOTE: don't really need wesm attributes in result, so just use geodataframe w/ geometry column\n",
    "# gf_i = gf_wesm[['geometry']].overlay(gf_stereo, how='intersection')\n",
    "gf_i = gf_stereo.overlay(gf_wesm[[\"geometry\"]], how=\"intersection\")\n",
    "gf_i.explore(column=\"dayofyear\", cmap=\"plasma\", m=m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exclude very small overlaps\n",
    "\n",
    "Consider further filtering results by a minimum overlap area criteria, or by a more precise estimate of the number of days elapsed for the maxar stereo acquisition relative to lidar estimated from swath polygons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "min_area_km2 = 20\n",
    "gf_i = coincident.overlaps.subset_by_minimum_area(gf_i, min_area_km2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Look at intersection with swath polygons for exact date difference\n",
    "# Since both have 'dayofyear', these cols get expanded to dayofyear_1 and dayofyear_2\n",
    "stereo_pair = gf_i.iloc[[0]]\n",
    "gf_dt = gf_swath.overlay(stereo_pair, how=\"intersection\")\n",
    "print(\"Maxar Stereo Acquisition DOY - Swath Lidar DOY:\")\n",
    "print(stereo_pair.stereo_pair_id.values[0])\n",
    "(gf_dt.dayofyear_2 - gf_dt.dayofyear_1).describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GEDI\n",
    "\n",
    "Search for Coincident GEDI L2A."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We've refined our search polygon again, so use a new AOI\n",
    "aoi = gpd.GeoSeries(gf_i.union_all().convex_hull, crs=\"EPSG:4326\")\n",
    "\n",
    "gf_gedi = coincident.search.search(\n",
    "    dataset=\"gedi\",\n",
    "    intersects=aoi,\n",
    "    datetime=date_range,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(len(gf_gedi))\n",
    "gf_gedi.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf_gedi.explore()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Altimeter granules span a large geographic area! So let's again cut results down to the area of intersection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Normalize colormap across both dataframes\n",
    "vmin, vmax = gpd.pd.concat([gf_i.dayofyear, gf_gedi.dayofyear]).agg([\"min\", \"max\"])\n",
    "cmap = \"plasma\"\n",
    "\n",
    "# NOTE: here we just take the boundary of the union of all maxar+lidar regions to avoid many overlay geometries\n",
    "union = gpd.GeoDataFrame(geometry=[gf_i.union_all()], crs=\"EPSG:4326\")\n",
    "m = gf_i.explore(column=\"dayofyear\", cmap=cmap, vmin=vmin, vmax=vmax)\n",
    "gf_gedi = union.overlay(gf_gedi, how=\"intersection\")\n",
    "gf_gedi.explore(m=m, column=\"dayofyear\", cmap=cmap, vmin=vmin, vmax=vmax, legend=False)\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Immediately we see some potential for close-in time acquisitions. \n",
    "\n",
    "```{note}\n",
    "As an additional step, you might want to add another filtering step to remove tiny polygons\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For each Stereo pair, describe number of altimeter passes and day offset\n",
    "stereo_pair = gf_i.iloc[[0]]\n",
    "gf_dt = stereo_pair.overlay(gf_gedi, how=\"intersection\")\n",
    "m = stereo_pair.explore(\n",
    "    column=\"dayofyear\", vmin=vmin, vmax=vmax, cmap=cmap, legend=False\n",
    ")\n",
    "gf_dt.explore(m=m, column=\"dayofyear_2\", vmin=vmin, vmax=vmax, cmap=cmap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Maxar Stereo Acquisition DOY - GEDI DOY:\")\n",
    "print(stereo_pair.stereo_pair_identifiers.values[0])\n",
    "(gf_dt.dayofyear_1 - gf_dt.dayofyear_2).describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Consider just the minimum offset for each stereo pair\n",
    "# NOTE: probably a fancier way to do this using a multiindex/groupby\n",
    "print(\"Maxar Stereo Acquisition DOY - GEDI DOY\")\n",
    "for i in range(len(gf_i)):\n",
    "    stereo_pair = gf_i.iloc[[i]]\n",
    "    gf_dt = stereo_pair.overlay(gf_gedi, how=\"intersection\")\n",
    "    min_dt = (gf_dt.dayofyear_1 - gf_dt.dayofyear_2).min()\n",
    "    print(stereo_pair.stereo_pair_identifiers.values[0], min_dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "In this case, we have some altimeter acquisitions that were just 3 days after a maxar stereo acquisition!\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ICESat-2\n",
    "\n",
    "Search for Coincident ICESat-2 Altimetry (ATL03)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gf_is2 = coincident.search.search(\n",
    "    dataset=\"icesat-2\",\n",
    "    intersects=aoi,\n",
    "    datetime=date_range,\n",
    ")\n",
    "print(len(gf_is2))\n",
    "gf_is2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Normalize colormap across both dataframes\n",
    "vmin, vmax = gpd.pd.concat([gf_i.dayofyear, gf_is2.dayofyear]).agg([\"min\", \"max\"])\n",
    "cmap = \"plasma\"\n",
    "\n",
    "# NOTE: here we just take the boundary of the union of all maxar+lidar regions to avoid many overlay geometries\n",
    "union = gpd.GeoDataFrame(geometry=[gf_i.union_all()], crs=\"EPSG:4326\")\n",
    "m = gf_i.explore(\n",
    "    column=\"dayofyear\", cmap=cmap, vmin=vmin, vmax=vmax, style_kwds=dict(color=\"black\")\n",
    ")\n",
    "gf_is2 = union.overlay(gf_is2, how=\"intersection\")\n",
    "gf_is2.explore(m=m, column=\"dayofyear\", cmap=cmap, vmin=vmin, vmax=vmax, legend=False)\n",
    "m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For each Stereo pair, describe number of altimeter passes and day offset\n",
    "stereo_pair = gf_i.iloc[[0]]\n",
    "gf_dt = stereo_pair.overlay(gf_is2, how=\"intersection\")\n",
    "m = stereo_pair.explore(\n",
    "    column=\"dayofyear\",\n",
    "    vmin=vmin,\n",
    "    vmax=vmax,\n",
    "    cmap=cmap,\n",
    "    legend=False,\n",
    "    style_kwds=dict(color=\"black\"),\n",
    ")\n",
    "gf_dt.explore(m=m, column=\"dayofyear_2\", vmin=vmin, vmax=vmax, cmap=cmap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Maxar Stereo Acquisition DOY - ICESat-2 DOY\")\n",
    "for i in range(len(gf_i)):\n",
    "    stereo_pair = gf_i.iloc[[i]]\n",
    "    gf_dt = stereo_pair.overlay(gf_is2, how=\"intersection\")\n",
    "    min_dt = (gf_dt.dayofyear_1 - gf_dt.dayofyear_2).min()\n",
    "    print(stereo_pair.stereo_pair_identifiers.values[0], min_dt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- Any of these Maxar stereo pairs could be worth ordering. But if the goal is to have coincident acquisitions as close as possible in time '11a11283-6661-4f91-9bcb-5dab3c6a5d02-inv' seems like a good bet:\n",
    "    - There is an overlapping ICESat-2 track 2 days later\n",
    "    - There is an overlapping GEDI track 3 days earlier\n",
    "    - 3DEP Lidar was acquired 27 to 36 days earlier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Can save this dataframe, or just note STAC ids to work with later ('102001008EC5AC00','102001008BE9BB00')\n",
    "pair_id = \"11a11283-6661-4f91-9bcb-5dab3c6a5d02-inv\"\n",
    "full_frame = gf_maxar[gf_maxar.stereo_pair_id == pair_id]\n",
    "subset = gf_i[gf_i.stereo_pair_id == pair_id]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save the original STAC metadata and polygon\n",
    "full_frame.to_parquet(\"/tmp/maxar-mono-images.parquet\")\n",
    "# Also save the subsetted region for this stereo pair\n",
    "subset.to_parquet(\"/tmp/stereo-subset.parquet\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dev",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}