Alzheimer’s disease (AD) classification models usually segment the entire brain image into voxel blocks and assign them labels consistent with the entire image, but not every voxel block is closely related to the disease. To this end, an AD auxiliary diagnosis framework based on weakly supervised multi-instance learning (MIL) and multi-scale feature fusion is proposed, and the framework is designed from three aspects: within the voxel block, between voxel blocks, and high-confidence voxel blocks. First, a three-dimensional convolutional neural network was used to extract deep features within the voxel block; then the spatial correlation information between voxel blocks was captured through position encoding and attention mechanism; finally, high-confidence voxel blocks were selected and combined with multi-scale information fusion strategy to integrate key features for classification decision. The performance of the model was evaluated on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies (OASIS) datasets. Experimental results showed that the proposed framework improved ACC and AUC by 3% and 4% on average compared with other mainstream frameworks in the two tasks of AD classification and mild cognitive impairment conversion classification, and could find the key voxel blocks that trigger the disease, providing an effective basis for AD auxiliary diagnosis.