Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster or group level. When CRTs are employed in pragmatic settings, diverse population characteristics may moderate treatment effects, creating what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs enable a rigorous understanding of how interventions may impact outcomes for important subpopulations. Guidance on planning pragmatic CRTs with pre-specified HTE analyses is scarce especially when trying to ensure sufficient power for confirmatory HTE analyses. This is made more difficult by the need to specify values for two different intraclass correlation coefficients (ICCs), which are rarely known. In addition, the population average treatment effect (ATE) is also often of interest and the sample sizes required to properly power each set of analyses, HTE and ATE, do not align. In this article, we derive formulas to determine the cluster size and number of clusters to achieve the locally optimal design that minimizes HTE variance given a budget constraint and ICC values. We then extend the maximin design to HTE estimators and identify the combination of cluster size and total number of clusters that maximizes the relative efficiency of an HTE analysis, with respect to its locally optimal design, in the worst scenarios of high ICC values. We also develop a multiple-objective maximin design that considers a weighting of both the HTE and ATE objectives to determine the optimal design across ICC value ranges.