Imbalanced data distribution in crowd counting datasets leads to severe under-estimation and over-estimation problems, which has been less investigated in existing works. In this paper, we tackle this challenging problem by proposing a simple but effective locality-based learning paradigm to produce generalizable features by alleviating sample bias. Our proposed method is locality-aware in two aspects. First, we introduce a locality-aware data partition (LADP) approach to group the training data into different bins via locality-sensitive hashing. As a result, a more balanced data batch is then constructed by LADP. To further reduce the training bias and enhance the collaboration with LADP, a new data augmentation method called locality-aware data augmentation (LADA) is proposed where the image patches are adaptively augmented based on the loss. The proposed method is independent of the backbone network architectures, and thus could be smoothly integrated with most existing deep crowd counting approaches in an end-to-end paradigm to boost their performance. We also demonstrate the versatility of the proposed method by applying it for adversarial defense. Extensive experiments verify the superiority of the proposed method over the state of the arts.