Data Skipping Sample for Python
Decision Optimization
May 14, 2020

Learn how to improve SQL queries performance with the demonstration of the performance optimization technique of data skipping. Metadata is used to mark columns which have data that has no relevance to the analysis. All Spark native data formats are supported, including Parquet, ORC, CSV, JSON and Avro. This notebook runs on Spark and Python 3.6.

Drag and drop files to add data source.