2. Dataset loading utilities¶
This package also features helpers to fetch larger datasets and parameters commonly used by the machine learning community to benchmark algorithm on data that comes from the ‘real world’.
2.1. Sample images¶
sklearn-theano embeds sample JPEG images published under Creative Commons license by their authors. These images can be useful to test algorithms and pipelines for images and other multidimensional data.
The default coding of images is based on the
uint8 dtype to
spare memory. Often machine learning algorithms work best if the
input is converted to a floating point representation first. Also,
if you plan to use
pylab.imshow don’t forget to scale to the range
0 - 1 as done in the following example.
2.2. Sample generators¶
In addition, sklearn-theano includes various random sample generators that can be used to build artificial datasets of controlled size.
2.3. Larger datasets¶
sklearn-theano also includes downloaders for larger datasets that can be used for something closer to “real world” testing.