TPC-DS Benchmark With Trino
TPC-DS (Transaction Processing Performance Council Decision Support Benchmark) is recognized as an industry-standard benchmark that evaluates the performance of data warehousing and analytics systems. The main objective of TPC-DS is to replicate real-world decision support workloads. It measures the query time and throughput through a series of intricate queries and data operations, including joins, aggregations, sorting, filtering, and subqueries. These queries represent a spectrum of scenarios, from straightforward report generation to sophisticated data mining and OLAP (Online Analytical Processing). For more information about TPC-DS please refer to its web page.
This document mainly introduces the performance of Alluxio DA 3.2 on the TPC-DS 100G test set.
For the 99 queries on the TPC-DS standard test data set, we compared the performance for the following versions and scenarios.
With S3 UFS
Case | Alluxio version | S3 location | TLS enabled |
---|---|---|---|
1 (baseline) | DA 3.2 | Same region | False |
2 | No Alluxio | Remote region | False |
3 | No Alluxio | Same region | False |
4 | DA 2.10 | Same region | False |
5 | DA 3.2 | Same region | True |
With HDFS UFS
Case | Alluxio version | HDFS location | Ranger enabled |
---|---|---|---|
1 (baseline) | DA 3.2 | Same region | False |
2 | No Alluxio | Same region | True |
Hardware environment
- Worker hardware
- Instance type: c4d.9xlarge
- CPU: 36 cores
- Memory: 72G
- Alluxio Cluster
- Number of coordinators: 1
- Number of workers: 3
- JVM heap memory: 8G
- Trino Cluster
- Number of coordinators: 1
- Number of workers: 3
- Max heap size: 64G
- Max memory per node: 40G
Software environment
Component | Version |
---|---|
Trino | 449 |
Hive Metastore | 3.1.3 |
Iceberg | V1/V2 |
Alluxio | DA 3.2 |
SQL queries
TPC-DS 99 test query statements: TPC-DS-Query-SQL
Test result
In the test, we use query time (ms) as the main performance indicator. We will compare DA 3.2 query time with different scenarios and calculate the percentage improvement. Charts will be shown in pairs, where the first chart shows the raw query times and the second chart shows the percentage difference.
Results using S3 UFS
Alluxio DA 3.2 vs No Alluxio remote region S3 (case 1 vs case 2)
Alluxio DA 3.2 vs No Alluxio remote region - query times (ms)
Alluxio DA 3.2 vs No Alluxio remote region S3 - percentage difference
Alluxio DA 3.2 vs No Alluxio same region S3 (case 1 vs case 3)
Alluxio DA 3.2 vs No Alluxio same region S3 - query times (ms)
Alluxio DA 3.2 vs No Alluxio same region S3 - percentage difference
Alluxio DA 3.2 vs DA 2.10 on same region S3 (case 1 vs case 4)
Alluxio DA 3.2 vs DA 2.10 on same region S3 - query times (ms)
Alluxio DA 3.2 vs DA 2.10 on same region S3 - percentage difference
Alluxio DA 3.2 vs DA 3.2 TLS enabled (case 1 vs case 5)
Alluxio DA 3.2 vs DA 3.2 TLS enabled - query times (ms)
Alluxio DA 3.2 vs DA 3.2 TLS enabled - percentage difference
Results using HDFS UFS
With Iceberg V1, Alluxio DA 3.2 with Ranger vs DA 3.2 without Ranger (case 1 vs case 2)
DA 3.2 vs DA 3.2 with Ranger - query times (ms)
DA 3.2 vs DA 3.2 with Ranger - percentage difference
With Iceberg V2, Alluxio DA 3.2 with Ranger vs DA 3.2 without Ranger (case 1 vs case 2)
DA 3.2 vs DA 3.2 with Ranger - query times (ms)
DA 3.2 vs DA 3.2 with Ranger - percentage difference
Environment preparation
Summary
Overall we see varying levels of performance improvement with Alluxio DA 3.2. Taking the average across all queries, we conclude that:
- When reading from S3 in a remote region, queries run about twice as fast with Alluxio vs without.
- When reading from S3 in the same region, there is little difference in performance with or without Alluxio.
- The performance between Alluxio 2.10 and Alluxio DA 3.2 is similar.
- When TLS is enabled, the performance drops as expected, by about 15%.
- When Ranger is enabled, there is a negligible performance difference of 1%.
These conclusions are derived from the following numbers
With S3 UFS
Case | Alluxio version | Net Improvement |
---|---|---|
1 | DA 3.2 vs No Alluxio remote region S3 | 93.9% |
2 | DA 3.2 vs No Alluxio same region S3 | 4.8% |
3 | DA 3.2 vs DA 2.10 | 2.9% |
4 | DA 3.2 vs DA 3.2 TLS enabled | -13.9% |
With HDFS UFS
Case | Alluxio version | Net Improvement |
---|---|---|
1 | DA 3.2 with Ranger vs without Ranger with Iceberg V1 | -1.0% |
2 | DA 3.2 with Ranger vs without Ranger with Iceberg V2 | -1.7% |