Guidance on planning cluster randomized trials (CRTs) with prespecified analyses of heterogeneous treatment effects (HTEs) is scarce, especially when trying to ensure sufficient power for confirmatory HTE analyses to enable a rigorous understanding of an intervention’s effect on important subpopulations. This is complicated by the need to specify two different intraclass correlation coefficients (ICCs), which are rarely known. In this presentation, we derive new design formulas to determine the cluster size and number of clusters to achieve the locally optimal design that minimizes HTE variance given a budget constraint and known ICC values. We then develop a maximin design, identifying the combination of cluster size and number of clusters that maximizes the relative efficiency of an HTE analysis, with respect to its locally optimal design, in the worst scenarios of ICC values. We further extend the optimal designs to accommodate multiple objectives. We illustrate our methods using the context of the Kerala Diabetes Prevention Program CRT.