Load in Precursor Coincident Dataset sites On-Demand#
coincident supports both reading into memory and downloading the Precursor Coincident Dataset (PCD) sites explained in https://coincident.readthedocs.io/en/latest/datasets
You can read in individual sites by memory given their PCD ID, or you can download the respective spatiotemporal metadata for ALL sites
from coincident import pcd_fixtures
import matplotlib.pyplot as plt
/home/docs/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/io/download.py:25: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
pcd_fixtures.read_pcd_site?
Remember, our sites are below:
Provider |
PCD Site Identifier |
Description |
Fourway Overlap Area (km²) |
Aerial Lidar Start Date |
Aerial Lidar End Date |
|---|---|---|---|---|---|
USGS |
|
Urban over San Francisco |
55 |
2023-04-20 |
2023-04-20 |
USGS |
|
Desert / Mine in southern Arizona |
53 |
2021-09-27 |
2021-11-17 |
NEON |
|
Deciduous / Conifer in northern Utah |
25 |
2021-05-20 |
2021-05-21 |
USGS |
|
Cropland in eastern Nebraska |
540 |
2020-11-16 |
2020-12-09 |
USGS |
|
Urban / Wetlands in Green Bay, Wisconsin |
89 |
2020-05-07 |
2020-05-07 |
USGS |
|
Mixed LULC in southern Georgia |
745 |
2020-02-02 |
2020-03-28 |
NCALM |
|
Southern San Andreas fault line |
12 |
2020-02-15 |
2020-02-18 |
USGS |
|
Coniferous / Mountainous in northern Yosemite National Park |
84 |
2019-10-07 |
2019-10-23 |
USGS |
|
Shrubland / Grassland in western Texas |
165 |
2019-09-11 |
2019-10-20 |
NEON |
|
Mixed hardwood forest in eastern New Hampshire |
32 |
2019-08-25 |
2019-08-25 |
USGS |
|
Coniferous / Mountainous in the Colorado Rockies |
184 |
2019-08-21 |
2019-09-19 |
USGS |
|
Glaciers / Mountainous in western Wyoming |
681 |
2019-07-26 |
2019-09-22 |
NEON |
|
Conifer forest in southern Washington state |
18 |
2019-07-12 |
2019-07-15 |
Note
The NCALM site’s “ground truth” is not aerial lidar, but rather dense aerial SfM from a drone
%%time
site = "GA_Central_3_2019"
dict_GA_Central_3_2019 = pcd_fixtures.read_pcd_site(site)
CPU times: user 263 ms, sys: 92.2 ms, total: 356 ms
Wall time: 3.94 s
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 get_ipython().run_cell_magic('time', '', '\nsite = "GA_Central_3_2019"\n\ndict_GA_Central_3_2019 = pcd_fixtures.read_pcd_site(site)\n')
File <timed exec>:3
1 'Could not get source, probably due dynamically evaluated source code.'
File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/pcd_fixtures.py:511, in read_pcd_site(pcd_id)
509 msg_vantor = "Searching for Vantor data..."
510 logging.info(msg_vantor)
--> 511 gf_vantor = search(
512 dataset="vantor", intersects=gf_overlap_search, datetime=search_date
513 )
514 gf_vantor = gf_vantor[gf_vantor.id.isin(filters["vantor"]["stereo_ids"])]
516 msg_gedi = "Searching for GEDI data..."
File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/search/main.py:110, in search(dataset, intersects, datetime, **kwargs)
108 client = stac.configure_vantor_client(dataset.area_based_calc) # type: ignore[attr-defined]
109 item_collection = stac.search(client, **stac_api_kwargs)
--> 110 gf = stac.to_geopandas(item_collection)
111 # TODO: don't add this restriction? Just add to documentation?
112 # Client-side reduce to only acquisitions having stereo pairs
113 gf = gf.loc[gf.stereo_pair_identifiers.str[0].dropna().index]
File ~/checkouts/readthedocs.org/user_builds/coincident/checkouts/stable/src/coincident/search/stac.py:76, in to_geopandas(collection)
74 if collection is None or (hasattr(collection, "__len__") and len(collection) == 0):
75 message = "ItemCollection is empty, cannot convert to GeoDataFrame"
---> 76 raise ValueError(message)
78 if isinstance(collection, pystac.item_collection.ItemCollection):
79 collection = rustac.to_arrow(collection.to_dict())
ValueError: ItemCollection is empty, cannot convert to GeoDataFrame
dict_GA_Central_3_2019.keys()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 dict_GA_Central_3_2019.keys()
NameError: name 'dict_GA_Central_3_2019' is not defined
Note
Some sites take a longer time to read in than others (ranging from a few seconds to ~1 minute). This is mainly dependent on the overlap area and length of date range.
gf_als = dict_GA_Central_3_2019["als"]
gf_vantor = dict_GA_Central_3_2019["vantor"]
gf_is2 = dict_GA_Central_3_2019["is2"]
gf_gedi = dict_GA_Central_3_2019["gedi"]
gf_overlap = dict_GA_Central_3_2019["overlap"]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 gf_als = dict_GA_Central_3_2019["als"]
2 gf_vantor = dict_GA_Central_3_2019["vantor"]
3 gf_is2 = dict_GA_Central_3_2019["is2"]
4 gf_gedi = dict_GA_Central_3_2019["gedi"]
NameError: name 'dict_GA_Central_3_2019' is not defined
gf_als.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 gf_als.head()
NameError: name 'gf_als' is not defined
gf_vantor.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 gf_vantor.head()
NameError: name 'gf_vantor' is not defined
gf_is2.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 gf_is2.head()
NameError: name 'gf_is2' is not defined
gf_gedi.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 gf_gedi.head()
NameError: name 'gf_gedi' is not defined
gf_overlap
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 gf_overlap
NameError: name 'gf_overlap' is not defined
Visualize the acquisition times of the PCD site
fig, ax = plt.subplots(figsize=(12, 5))
als_start = gf_als["start_datetime"].iloc[0]
als_end = gf_als["end_datetime"].iloc[0]
ax.axvspan(als_start, als_end, color="gray", alpha=0.3, label="ALS Window")
ax.scatter(
gf_vantor["datetime"],
["Vantor"] * len(gf_vantor),
marker="D",
s=80,
label="Vantor Stereo",
)
ax.scatter(
gf_is2["datetime"], ["ICESat-2"] * len(gf_is2), marker="D", s=80, label="ICESat-2"
)
ax.scatter(gf_gedi["datetime"], ["GEDI"] * len(gf_gedi), marker="D", s=80, label="GEDI")
ax.set_title("Data Availability for Site: CA_SanFrancisco_1_B23")
ax.set_xlabel("Date")
ax.set_ylabel("Data Collection")
ax.legend(loc="best")
ax.grid(axis="x", linestyle=":", alpha=0.4)
fig.autofmt_xdate()
plt.tight_layout();
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 3
1 fig, ax = plt.subplots(figsize=(12, 5))
2
----> 3 als_start = gf_als["start_datetime"].iloc[0]
4 als_end = gf_als["end_datetime"].iloc[0]
5 ax.axvspan(als_start, als_end, color="gray", alpha=0.3, label="ALS Window")
6
NameError: name 'gf_als' is not defined
See the spatial extent
style_args = {"fillOpacity": 0.15, "weight": 2.5}
m = gf_als.explore(
name="ALS", color="gray", style_kwds=style_args, tiles="Esri.WorldImagery"
)
m = gf_vantor.explore(m=m, name="Vantor", color="blue", style_kwds=style_args)
m = gf_is2.explore(m=m, name="ICESat-2", color="orange", style_kwds=style_args)
m = gf_gedi.explore(m=m, name="GEDI", color="green", style_kwds=style_args)
m = gf_overlap.explore(m=m, name="Overlap Area", color="black", style_kwds=style_args)
m
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 3
1 style_args = {"fillOpacity": 0.15, "weight": 2.5}
2
----> 3 m = gf_als.explore(
4 name="ALS", color="gray", style_kwds=style_args, tiles="Esri.WorldImagery"
5 )
6 m = gf_vantor.explore(m=m, name="Vantor", color="blue", style_kwds=style_args)
NameError: name 'gf_als' is not defined
Downloading#
You can also download the PCD files for all sites using the coincident.pcd_fixtures module. coincident.pcd_fixtures.download_pcd_files() supports this by streaming calls to the respective STAC catalogs and ALS endpoints.
Source code in coincident/scripts as seen below supports the downloading of the latest released GitHub assets of these PCD dataset files.
pixi run python src/coincident/scripts/generate_pcd.py
What’s the difference between the two?
coincident.pcd_fixtures.download_pcd_files() grabs the metadata for all PCD sites as provided by the respective STAC catalogs and API endpoints. src/coincident/scripts/generate_pcd.py pulls from the latest GitHub assets, which includes more complex overlap area geometries, LULC and elevation statistics over these geometries, and extended ALS metadata. The difference exists at the PCD sites from the latest GitHub assets have this extra metadata manually determined uniquely for each site (via reading individual lidar metadata reports, manually defining overlap geometries based on filtered data, using code that exists outside of coincident, etc.).
Because of this, running coincident.pcd_fixtures.download_pcd_files() will take minutes and running src/coincident/scripts/generate_pcd.py will take seconds.
pcd_fixtures.download_pcd_files?
# pcd_fixtures.download_pcd_files("/tmp")
Note
This takes ~4 minutes to run and the total output size for all files is 31mb (parquet files sum to ~8.5mb and geojsons sum to ~22mb)