The Semantic "MNIST" - Counting the Number of Dots in an Image

Have you seen the movie, Rain Man? Do you remember the scene when Dustin Hoffman can count the exact number of toothpicks on the floor in the blink of an eye. This scene gave the idea to implement it as a machine learning example. We don't count toothpicks but dots in an image.

For further details see the blog post: https://bhaxor.blog.hu/2019/03/10/the_semantic_mnist
The sources can be found at https://gitlab.com/nbatfai/smnist
Here is an introductory video: https://youtu.be/-tSRwJgVpJk

SMNIST for Humans

The experiments called SMNIST for Humans are intended to measure the capacity of the parallel individuation system in humans.

SMNIST for Machines

The experiments called SMNIST for Machines are similar to previous ones but they investigate computer programs.

Downloads

Series 1

Naive

Source: smnistg.cpp

No-centering

Source: smnistg-no-centering.cpp

Disjunct

Source: smnistg-disjunct-test-set-no-centering.cpp

Disjunct 1PX

Source: smnistg-disjunct-1px-test-set-no-centering.cpp

Hard

Source: smnistg-hard-disjunct-test-set-no-centering.cpp

Hard 1PX

Source: smnistg-hard-1px-disjunct-test-set-no-centering.cpp

Series 2

Disjunct 10x10, 1PX

Source: smnistg-s2-disjunct.cpp

Hard 10x10, 1PX

Source: smnistg-s2-hard.cpp

Disjunct 10x10, 1PX, pow 10 2x+

Source: smnistg-s2-disjunct-pow102x+.cpp

Hard 10x10, 1PX pow 10 2x+

Source: smnistg-s2-hard-pow102x+.cpp

Hard 10x10, 1PX pow 10 2x+, m=4

Hard 10x10, 1PX pow 10 2x+, m=5

Hard 10x10, 1PX pow 10 2x+, m=6

Hard 10x10, 1PX pow 10 2x+, m=7

Hard 10x10, 1PX pow 10 2x+, m=8

SMNIST for Anyone

Downloads

Series 1

Hard 10x10, 1PX pow 10 2x+, m=4-9, 60.000/10.000

Source: smnistg-sX5-hard-pow10x+.cpp

m=4

m=5

m=6

m=7

m=8

m=9

Series 2

Hard 10x10, 1PX pow 10 2x+, m=4-9, 60.000/10.000

Source: smnistg-s5-hard-pow10x+.cpp

m=4

m=5

m=6

m=7

m=8

m=9

Version history

Bugfix: 1544 of 60000 train images contain points that have the same coords

At Viktor Simkó's suggestion, the random images that have already been in the training set has been filtered out from the test set. This code satisfied a stronger property that what Viktor gave, namely all generated random images are unique images. It follows that training images are excluded from the test images.

hard: the set of coordinate pairs of pixels is divided into two disjunct sets by the function void init_drnd(double p_train). Then the training images are generated from the one set and the test images are generated from the other set.

Regenerating missing images.

"1px": In the former datasets, dots are 3x3 pixels, now they are 1 pixel.

The case of 0 dots is handled standalone.

Uniform test set

Last modified: 13 June 2019

Norbert Bátfai, PhD
batfai.norbert@inf.unideb.hu
https://arato.inf.unideb.hu/batfai.norbert/
IT Dept, UD