What is the problem the feature request solves?
Observed the issue when Comet is not fully utilizing Spark cluster parallelism.
Input: 1200 HDFS files, number of Spark planned tasks: 1800. Every file is splittable, so Spark utilizes all 1800 scanning and writing the shuffle whereas Comet utilizing only 1200 tasks having 600 idle.
I was not able to reproduce the same locally, will try on local HDFS later
Describe the potential solution
No response
Additional context
No response