Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster level but outcomes are typically collected at the individual level. When CRTs are employed in pragmatic settings, baseline population characteristics may moderate treatment effects, leading to what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs can enable an understanding of how interventions may impact subpopulation outcomes. While closed-form sample size formulas have recently been proposed, assuming known intracluster correlation coefficients (ICCs) for both the covariate and outcome, guidance on optimal cluster randomized designs to ensure maximum power with pre-specified HTE analyses has not yet been developed. We derive new design formulas to determine the cluster size and number of clusters to achieve the locally optimal design (LOD) that minimizes variance for estimating the HTE parameter given a budget constraint. Given the LODs are based on covariate and outcome-ICC values that are usually unknown, we further develop the maximin design for assessing HTE, identifying the combination of design resources that maximize the relative efficiency of the HTE analysis in the worst case scenario. In addition, given the analysis of the average treatment effect is often of primary interest, we also establish optimal designs to accommodate multiple objectives by combining considerations for studying both the average and heterogeneous treatment effects. We illustrate our methods using the context of the Kerala Diabetes Prevention Program CRT, and provide an R Shiny app to facilitate calculation of optimal designs under a wide range of design parameters.