After 500 epochs the multilayer perceptron fails on 33 of the MNIST training set images. Here are they. The y-label is their array index, while the title contains the predicted value and the (obviously different) expected value according to MNIST.
The above image is was obtained running test_3(data_set='train') from mlp_test/test_mlp.py after creating and saving the model best_model_mlp_500_zero.pkl with mlp_modified-py. Some of the expected values are truly surprising. It is hard to see how 10994 (second row, first column) can be read as a 3. The trained network identifies it, imo correctly, as a 3. The same can be said of 43454 (5th row, 5th col) and of several other images. Some ones and sevens are hard to tell apart. . I've tried training the model on a training set where the naughty images above are replaced by duplicates from the rest of training set (see load_data in mlp_modified.py to see how it's done). However, this alone does not improve performance on the test set.