Load in Precursor Coincident Dataset sites On-Demand#

coincident supports both reading into memory and downloading the Precursor Coincident Dataset (PCD) sites explained in https://coincident.readthedocs.io/en/latest/datasets

You can read in individual sites by memory given their PCD ID, or you can download the respective spatiotemporal metadata for ALL sites

from coincident import pcd_fixtures
import matplotlib.pyplot as plt
/home/docs/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/io/download.py:25: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm
pcd_fixtures.read_pcd_site?

Remember, our sites are below:

Provider

PCD Site Identifier

Description

Fourway Overlap Area (km²)

Aerial Lidar Start Date

Aerial Lidar End Date

USGS

CA_SanFrancisco_1_B23

Urban over San Francisco

55

2023-04-20

2023-04-20

USGS

AZ_PimaCo_2_2021

Desert / Mine in southern Arizona

53

2021-09-27

2021-11-17

NEON

REDB

Deciduous / Conifer in northern Utah

25

2021-05-20

2021-05-21

USGS

NE_Northeast_Phase2_2_2020

Cropland in eastern Nebraska

540

2020-11-16

2020-12-09

USGS

WI_Brown_2_2020

Urban / Wetlands in Green Bay, Wisconsin

89

2020-05-07

2020-05-07

USGS

GA_Central_3_2019

Mixed LULC in southern Georgia

745

2020-02-02

2020-03-28

NCALM

OTLAS.092021.32611.1

Southern San Andreas fault line

12

2020-02-15

2020-02-18

USGS

CA_YosemiteNP_2019

Coniferous / Mountainous in northern Yosemite National Park

84

2019-10-07

2019-10-23

USGS

TX_DesertMountains_B1_2018

Shrubland / Grassland in western Texas

165

2019-09-11

2019-10-20

NEON

BART

Mixed hardwood forest in eastern New Hampshire

32

2019-08-25

2019-08-25

USGS

CO_WestCentral_2019

Coniferous / Mountainous in the Colorado Rockies

184

2019-08-21

2019-09-19

USGS

WY_FEMA_East_B9_2019

Glaciers / Mountainous in western Wyoming

681

2019-07-26

2019-09-22

NEON

WREF

Conifer forest in southern Washington state

18

2019-07-12

2019-07-15

Note

The NCALM site’s “ground truth” is not aerial lidar, but rather dense aerial SfM from a drone

%%time

site = "GA_Central_3_2019"

dict_GA_Central_3_2019 = pcd_fixtures.read_pcd_site(site)
CPU times: user 263 ms, sys: 92.2 ms, total: 356 ms
Wall time: 3.94 s
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 get_ipython().run_cell_magic('time', '', '\nsite = "GA_Central_3_2019"\n\ndict_GA_Central_3_2019 = pcd_fixtures.read_pcd_site(site)\n')

File <timed exec>:3
      1 'Could not get source, probably due dynamically evaluated source code.'

File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/pcd_fixtures.py:511, in read_pcd_site(pcd_id)
    509 msg_vantor = "Searching for Vantor data..."
    510 logging.info(msg_vantor)
--> 511 gf_vantor = search(
    512     dataset="vantor", intersects=gf_overlap_search, datetime=search_date
    513 )
    514 gf_vantor = gf_vantor[gf_vantor.id.isin(filters["vantor"]["stereo_ids"])]
    516 msg_gedi = "Searching for GEDI data..."

File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/search/main.py:110, in search(dataset, intersects, datetime, **kwargs)
    108 client = stac.configure_vantor_client(dataset.area_based_calc)  # type: ignore[attr-defined]
    109 item_collection = stac.search(client, **stac_api_kwargs)
--> 110 gf = stac.to_geopandas(item_collection)
    111 # TODO: don't add this restriction? Just add to documentation?
    112 # Client-side reduce to only acquisitions having stereo pairs
    113 gf = gf.loc[gf.stereo_pair_identifiers.str[0].dropna().index]

File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/search/stac.py:76, in to_geopandas(collection)
     74 if collection is None or (hasattr(collection, "__len__") and len(collection) == 0):
     75     message = "ItemCollection is empty, cannot convert to GeoDataFrame"
---> 76     raise ValueError(message)
     78 if isinstance(collection, pystac.item_collection.ItemCollection):
     79     collection = rustac.to_arrow(collection.to_dict())

ValueError: ItemCollection is empty, cannot convert to GeoDataFrame
dict_GA_Central_3_2019.keys()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 dict_GA_Central_3_2019.keys()

NameError: name 'dict_GA_Central_3_2019' is not defined

Note

Some sites take a longer time to read in than others (ranging from a few seconds to ~1 minute). This is mainly dependent on the overlap area and length of date range.

gf_als = dict_GA_Central_3_2019["als"]
gf_vantor = dict_GA_Central_3_2019["vantor"]
gf_is2 = dict_GA_Central_3_2019["is2"]
gf_gedi = dict_GA_Central_3_2019["gedi"]
gf_overlap = dict_GA_Central_3_2019["overlap"]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 gf_als = dict_GA_Central_3_2019["als"]
      2 gf_vantor = dict_GA_Central_3_2019["vantor"]
      3 gf_is2 = dict_GA_Central_3_2019["is2"]
      4 gf_gedi = dict_GA_Central_3_2019["gedi"]

NameError: name 'dict_GA_Central_3_2019' is not defined
gf_als.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 gf_als.head()

NameError: name 'gf_als' is not defined
gf_vantor.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 gf_vantor.head()

NameError: name 'gf_vantor' is not defined
gf_is2.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 gf_is2.head()

NameError: name 'gf_is2' is not defined
gf_gedi.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 gf_gedi.head()

NameError: name 'gf_gedi' is not defined
gf_overlap
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 gf_overlap

NameError: name 'gf_overlap' is not defined

Visualize the acquisition times of the PCD site

fig, ax = plt.subplots(figsize=(12, 5))

als_start = gf_als["start_datetime"].iloc[0]
als_end = gf_als["end_datetime"].iloc[0]
ax.axvspan(als_start, als_end, color="gray", alpha=0.3, label="ALS Window")

ax.scatter(
    gf_vantor["datetime"],
    ["Vantor"] * len(gf_vantor),
    marker="D",
    s=80,
    label="Vantor Stereo",
)
ax.scatter(
    gf_is2["datetime"], ["ICESat-2"] * len(gf_is2), marker="D", s=80, label="ICESat-2"
)
ax.scatter(gf_gedi["datetime"], ["GEDI"] * len(gf_gedi), marker="D", s=80, label="GEDI")

ax.set_title("Data Availability for Site: CA_SanFrancisco_1_B23")
ax.set_xlabel("Date")
ax.set_ylabel("Data Collection")
ax.legend(loc="best")
ax.grid(axis="x", linestyle=":", alpha=0.4)
fig.autofmt_xdate()
plt.tight_layout();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 3
      1 fig, ax = plt.subplots(figsize=(12, 5))
      2 
----> 3 als_start = gf_als["start_datetime"].iloc[0]
      4 als_end = gf_als["end_datetime"].iloc[0]
      5 ax.axvspan(als_start, als_end, color="gray", alpha=0.3, label="ALS Window")
      6 

NameError: name 'gf_als' is not defined
../_images/7d30a4b8d3219bd5fab3dda1816714d99a27b7aaba66bc0beb29e86e581fa740.png

See the spatial extent

style_args = {"fillOpacity": 0.15, "weight": 2.5}

m = gf_als.explore(
    name="ALS", color="gray", style_kwds=style_args, tiles="Esri.WorldImagery"
)
m = gf_vantor.explore(m=m, name="Vantor", color="blue", style_kwds=style_args)
m = gf_is2.explore(m=m, name="ICESat-2", color="orange", style_kwds=style_args)
m = gf_gedi.explore(m=m, name="GEDI", color="green", style_kwds=style_args)
m = gf_overlap.explore(m=m, name="Overlap Area", color="black", style_kwds=style_args)
m
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 3
      1 style_args = {"fillOpacity": 0.15, "weight": 2.5}
      2 
----> 3 m = gf_als.explore(
      4     name="ALS", color="gray", style_kwds=style_args, tiles="Esri.WorldImagery"
      5 )
      6 m = gf_vantor.explore(m=m, name="Vantor", color="blue", style_kwds=style_args)

NameError: name 'gf_als' is not defined

Downloading#

You can also download the PCD files for all sites using the coincident.pcd_fixtures module. coincident.pcd_fixtures.download_pcd_files() supports this by streaming calls to the respective STAC catalogs and ALS endpoints.

Source code in coincident/scripts as seen below supports the downloading of the latest released GitHub assets of these PCD dataset files.

pixi run python src/coincident/scripts/generate_pcd.py

What’s the difference between the two?

coincident.pcd_fixtures.download_pcd_files() grabs the metadata for all PCD sites as provided by the respective STAC catalogs and API endpoints. src/coincident/scripts/generate_pcd.py pulls from the latest GitHub assets, which includes more complex overlap area geometries, LULC and elevation statistics over these geometries, and extended ALS metadata. The difference exists at the PCD sites from the latest GitHub assets have this extra metadata manually determined uniquely for each site (via reading individual lidar metadata reports, manually defining overlap geometries based on filtered data, using code that exists outside of coincident, etc.).

Because of this, running coincident.pcd_fixtures.download_pcd_files() will take minutes and running src/coincident/scripts/generate_pcd.py will take seconds.

pcd_fixtures.download_pcd_files?
# pcd_fixtures.download_pcd_files("/tmp")

Note

This takes ~4 minutes to run and the total output size for all files is 31mb (parquet files sum to ~8.5mb and geojsons sum to ~22mb)