The development of sophisticated mathematical reasoning in large language models (LLMs) is often hindered by the scarcity of large-scale, high-quality, and domain-specific training data. To address this gap, we introduce StackMathQA, a comprehensive dataset containing nearly 2 million question-and-answer pairs sourced from the Stack Exchange network. This dataset aggregates expert-level and enthusiast discussions from premier platforms including Math Stack Exchange, MathOverflow, Statistics Stack Exchange, and Physics Stack Exchange. We provide the data in multiple formats and curated subsets created through importance resampling to cater to a wide range of research needs, from large-scale pre-training to targeted fine-tuning. This report details the dataset's construction methodology, structure, content, and potential applications, establishing StackMathQA as a valuable resource for advancing machine reasoning in quantitative domains.