This notebook shows you how to work with the spatio-temporal library that is pre-installed on all Spark environments in Watson Studio. You can use the spatio-temporal library to expand your data science analysis in Python notebooks to include location analytics by gathering, manipulating, and displaying imagery, GPS, satellite photography and historical data.
In this notebook, you will learn how to get started with using the library, how to read and write data using pyst
which supports most of the common geospatial formats, which includes shapefile, GeoJSON and Well-Known Text (WKT). You will learn how to use topological relations to confine the returned results of your location data analysis, which geohashing functions to use for proximity search (encoding latitude and longitude and grouping nearby points), and how to calculate the distance between points using ellipsoidal metrics.
This notebook runs on the latest Python and Spark.
Before you can start using any of the spatio-temporal library functions in your notebook, you must register STContext
to access the st
functions. The STContext
is linked to the Spark session.
To register STContext
:
from pyst import STContext
# Register STContext, which is the main entry point
stc = STContext(spark.sparkContext._gateway)
Watson Studio supports most of the common geospatial formats, including shapefile, GeoJSON and Well-Known Text (WKT).
Each data reader function can take various types of input to its .read()
method. For example, a URL pointing to a resource file, a path to a local GeoJSON file, streaming bytes from a GeoJSON file, or content in GeoJSON format from a Python dictionary.
In this notebook, you will learn how to work with data in GeoJSON and WKT format.
To work with files in GeoJSON format, first create a GeoJSON reader and writer.
The geojson_reader
returns a pandas DataFrame when calling the .read()
method. And the returned pandas DataFrame has one column called geometry
that contains the spatial objects of the geometries while the remaining columns contain the properties of geometries.
geojson_reader = stc.geojson_reader()
geojson_writer = stc.geojson_writer()
The following code snippets show reading US counties polygon data from various kinds of sources.
!wget -q "https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/1ec43d48a694c6c1d052ddca4d68bdc4/data?accessKey=c6a5e2bce9822ecbff831eedc9a8dde7" -O counties.geojson
df = geojson_reader.read('counties.geojson')
df.head(3)
import json
data = json.load(open('counties.geojson'))
df = geojson_reader.read(data)
df.head(3)
df = geojson_reader.read(open('counties.geojson', 'rb').read())
df.head(3)
# @hidden cell
# This code cell is commented out as COS for each user is not public.
# If you wish to read data from your COS bucket, uncomment the code below and replace the ... with your COS object to run the read.
# Get streamingbody from your COS bucket
#streaming_body = ...
#df = geojson_reader.read(streaming_body.read())
#df.head(3)
To work with WKT strings in a Python notebook, first create a WKT reader and writer:
wkt_reader = stc.wkt_reader()
wkt_writer = stc.wkt_writer()
Read a WKT string to a geometry object and then query different values of the object:
westchester_WKT = 'POLYGON((-73.984 41.325,-73.948 41.33,-73.78 41.346,-73.625 41.363,-73.545 41.37,-73.541 41.368,-73.547 41.297,-73.485 41.223,-73.479 41.215,-73.479 41.211,-73.493 41.203,-73.509 41.197,-73.623 41.144,-73.628 41.143,-73.632 41.14,-73.722 41.099,-73.714 41.091,-73.701 41.073,-73.68 41.049,-73.68 41.047,-73.673 41.041,-73.672 41.038,-73.668 41.035,-73.652 41.015,-73.651 41.011,-73.656 41,-73.655 40.998,-73.656 40.995,-73.654 40.994,-73.654 40.987,-73.617 40.952,-73.618 40.946,-73.746 40.868,-73.751 40.868,-73.821 40.887,-73.826 40.886,-73.84 40.89,-73.844 40.896,-73.844 40.9,-73.85 40.903,-73.853 40.903,-73.854 40.9,-73.859 40.896,-73.909 40.911,-73.92 40.912,-73.923 40.914,-73.923 40.918,-73.901 40.979,-73.894 41.023,-73.893 41.043,-73.896 41.071,-73.894 41.137,-73.94 41.207,-73.965 41.24,-73.973 41.244,-73.975 41.247,-73.976 41.257,-73.973 41.266,-73.95 41.288,-73.966 41.296,-73.98 41.309,-73.984 41.311,-73.987 41.315,-73.987 41.322,-73.984 41.325))'
westchester = wkt_reader.read(westchester_WKT)
westchester.area()
westchester.centroid()
westchester.get_bounding_box()
Write a geometry object to a WKT string:
wkt_writer.write(westchester)
Besides reading geometries from different sources, direct input is also supported and each geometry type comes with a constructor.
These are the currently supported geometries types and their constructors:
point(lat, lon)
line_segment(start_point, end_point)
line_string([point_1, point_2, ...])
and line_string([line_segment_1, line_segment_2, ...])
ring([point_1, point_2, ...])
and ring([line_segment_1, line_segment_2, ...])
polygon(exterior_ring, [interior_ring_1, interior_ring_2, ...])
multi_geometry(geom_1, geom_2, ...)
multi_point(point_1, point_2, ...)
multi_line_string(line_string_1, line_string_2, ...)
multi_polygon(polygon_1, polygon_2, ...)
null_geometry()
full_earth()
bounding_box(lower_lat, lower_lon, upper_lat, upper_lon)
Here are some examples of supported geometry types:
point_1 = stc.point(37.3, -74.4)
point_2 = stc.point(37.5,-74.1)
point_3 = stc.point(38.1, -74.4)
line_segment = stc.line_segment(point_1, point_2)
line_string = stc.line_string([point_1, point_2, point_3])
ring = stc.ring([point_1, point_2, point_3, point_1])
poly = stc.polygon(ring)
multi_points = stc.multi_point([point_1, point_2, point_3])
poly.area()
multi_points
next(multi_points)
next(multi_points)
next(multi_points)
You can use topological relations to confine the returned results of your location data analysis.
The Python code snippets in this notebook, which show how to use the different topological functions, run on test data. Begin by preparing the test data in WKT string format and reading it:
westchester_WKT = 'POLYGON((-73.984 41.325,-73.948 41.33,-73.78 41.346,-73.625 41.363,-73.545 41.37,-73.541 41.368,-73.547 41.297,-73.485 41.223,-73.479 41.215,-73.479 41.211,-73.493 41.203,-73.509 41.197,-73.623 41.144,-73.628 41.143,-73.632 41.14,-73.722 41.099,-73.714 41.091,-73.701 41.073,-73.68 41.049,-73.68 41.047,-73.673 41.041,-73.672 41.038,-73.668 41.035,-73.652 41.015,-73.651 41.011,-73.656 41,-73.655 40.998,-73.656 40.995,-73.654 40.994,-73.654 40.987,-73.617 40.952,-73.618 40.946,-73.746 40.868,-73.751 40.868,-73.821 40.887,-73.826 40.886,-73.84 40.89,-73.844 40.896,-73.844 40.9,-73.85 40.903,-73.853 40.903,-73.854 40.9,-73.859 40.896,-73.909 40.911,-73.92 40.912,-73.923 40.914,-73.923 40.918,-73.901 40.979,-73.894 41.023,-73.893 41.043,-73.896 41.071,-73.894 41.137,-73.94 41.207,-73.965 41.24,-73.973 41.244,-73.975 41.247,-73.976 41.257,-73.973 41.266,-73.95 41.288,-73.966 41.296,-73.98 41.309,-73.984 41.311,-73.987 41.315,-73.987 41.322,-73.984 41.325))'
white_plains_WKT = 'POLYGON((-73.792 41.024,-73.794 41.031,-73.779 41.046,-73.78 41.049,-73.779 41.052,-73.776 41.054,-73.775 41.057,-73.767 41.058,-73.769 41.062,-73.768 41.067,-73.762 41.073,-73.759 41.074,-73.748 41.069,-73.746 41.056,-73.742 41.056,-73.74 41.053,-73.74 41.049,-73.749 41.04,-73.748 41.035,-73.739 41.034,-73.729 41.029,-73.725 41.025,-73.72 41.016,-73.717 41.015,-73.716 41.006,-73.718 41.002,-73.732 40.988,-73.732 40.985,-73.739 40.979,-73.745 40.978,-73.749 40.981,-73.749 40.986,-73.751 40.986,-73.756 40.991,-73.759 40.991,-73.76 40.993,-73.765 40.994,-73.769 40.997,-73.774 41.002,-73.775 41.006,-73.788 41.018,-73.792 41.024))'
manhattan_WKT = 'POLYGON((-74.023 40.709,-74.02 40.723,-74.018 40.725,-74.019 40.731,-74.016 40.737,-74.017 40.741,-74.014 40.755,-74.011 40.757,-74.011 40.761,-74.006 40.767,-74.006 40.769,-73.998 40.778,-73.996 40.778,-73.995 40.783,-73.991 40.784,-73.991 40.788,-73.98 40.8,-73.966 40.822,-73.964 40.823,-73.961 40.83,-73.957 40.832,-73.954 40.836,-73.951 40.845,-73.951 40.853,-73.947 40.855,-73.94 40.863,-73.936 40.87,-73.936 40.873,-73.93 40.881,-73.924 40.882,-73.919 40.879,-73.909 40.877,-73.906 40.873,-73.907 40.867,-73.912 40.86,-73.918 40.855,-73.931 40.835,-73.93 40.81,-73.925 40.803,-73.925 40.795,-73.939 40.781,-73.938 40.774,-73.968 40.741,-73.969 40.733,-73.967 40.73,-73.967 40.726,-73.969 40.717,-73.973 40.709,-73.977 40.706,-73.996 40.705,-73.999 40.701,-74.008 40.696,-74.016 40.696,-74.017 40.698,-74.019 40.698,-74.023 40.703,-74.023 40.709))'
westchester = wkt_reader.read(westchester_WKT)
white_plains = wkt_reader.read(white_plains_WKT)
manhattan = wkt_reader.read(manhattan_WKT)
You can use the following topological relation functions in your notebook:
contains
returns the exact opposite of within
.westchester.contains(white_plains)
within
returns the exact opposite of contains
.white_plains.within(westchester)
intersects
is the opposite of disjoint
.westchester.intersects(manhattan)
disjoint
is the opposite of intersects
.westchester.disjoint(manhattan)
linestring
, polygon
, multilinestring
, or multipolygon
.westchester.touch(manhattan)
westchester.overlap(white_plains)
westchester.cross(white_plains)
westchester.equality(westchester)
You can use the following topological operations:
westchester.intersection(white_plains)
westchester.intersection(white_plains).equality(white_plains)
westchester.union(white_plains)
westchester.union(white_plains).equality(westchester)
westchester.difference(white_plains)
westchester.symmetric_difference(white_plains)
westchester.centroid()
westchester.buffer(10)
westchester.buffer(10).contains(westchester)
westchester.get_bounding_box()
You can use the following functions for topological metrics:
white_plains.area()
white_plains.distance(manhattan)
white_plains.get_topological_dimensionality()
You can use the following aggregation functions:
white_plains_bbox = white_plains.get_bounding_box()
westchester_bbox = westchester.get_bounding_box()
manhattan_bbox = manhattan.get_bounding_box()
aggregated_bbox = white_plains_bbox.get_containing_bb(westchester_bbox).get_containing_bb(manhattan_bbox)
white_plains_points = white_plains.get_exterior_ring().get_points()
westchester_points = westchester.get_exterior_ring().get_points()
manhattan_points = manhattan.get_exterior_ring().get_points()
all_points = white_plains_points + westchester_points + manhattan_points
hull = stc.convex_hull.compute_convex_hull(all_points)
The spatio-temporal library includes geohashing functions for proximity search (encoding latitude and longitude and grouping nearby points) in location data analysis.
Before you can begin calculating and grouping location data, create a geohash engine:
geohash = stc.geohash
Encode a geographic location:
p = stc.point(37, -74)
Encode a geographic location to a string of base-32 encoded characters:
geohash.string_hash_encode(p)
Encode a geographic location to a string of characters and letter:
geohash.string_hash_encode(p, precision=5)
Encode a geographic location to a BitVector:
geohash.number_hash_encode(p)
Decode a base-32 encoded string to its decimal longitude or latitude coordinates:
geohash.string_hash_decode('dqeh4')
Decode base-32 encoded characters to reveal the geographic location:
bv = geohash.number_hash_encode(p)
geohash.number_hash_decode(bv)
The geohash neighbors functions returns the neighboring geohash codes around a specified code.
Get the geohash neighbors for the given latitude, longitude and bit_depth
:
geohash.get_all_neighbors(70, -40, 25)
Get the geohash neighbors for the given latitude, longitude and bit_depth
, but only return results that are within the given distance:
geohash.get_all_neighbors(70, -40, 25, distance_precision=1000)
Other useful geohash functions include:
geohash.expand('ezs42')
geohash.neighbors('ezs42')
geohash.get_east('ezs42')
bv = geohash.number_hash_encode(p)
bv.truncate(25)
geohash.expand(bv)
geohash.neighbors(bv)
geohash.get_north(bv)
To calculate a set of geohashes that wholly covers the bounding box:
test_wkt = 'POLYGON((-73.76223024988917 41.04173285255264,-73.7749331917837 41.04121496082817,-73.78197130823878 41.02748934524744,-73.76476225519923 41.023733725449326,-73.75218805933741 41.031633228865495,-73.7558787789419 41.03752486433286,-73.76223024988917 41.04173285255264))'
poly = wkt_reader.read(test_wkt)
cover = geohash.geohash_cover_at_bit_depth(poly, 36)
buffered_cover = stc.geohash.geohash_cover_at_bit_depth(poly, 36, distance=50)
raw_cover = geohash.geohash_cover_at_bit_depth(poly, 36)
compact_cover = stc.geohash.geohash_compression(raw_cover)
You can use ellipsoidal metrics to calculate the distance between points.
Before you can begin using ellipsoidal metrics, create a metrics object:
eg_metric = stc.eg_metric
Examples of metrics you can compute:
azimuth
:p1 = stc.point(47.1, -73.5)
p2 = stc.point(47.6, -72.9)
eg_metric.azimuth(p1, p2)
eg_metric.distance(p1, p2)
point = p1
heading = eg_metric.azimuth(p1, p2)
distance = eg_metric.distance(p1, p2)
eg_metric.destination_point(p1, heading, distance)
In this notebook, you learnt how to get started with the geospatio-temporal library. You were shown the following:
Linsong Chu, Research Engineer at IBM Research
Copyright © 2019-2024 IBM. This notebook and its source code are released under the terms of the MIT License.