Clone this repository or download it as a zip archive. Note: plsql_utilties and app_html_table_pkg are provided as submodules, so use the clone command with recurse ...
Build and process the Common Crawl index table – an index to WARC files in a columnar data format (Apache Parquet). Not part of this project. Please have a look at cc-pyspark for examples how to query ...