2. Dataset loading utilities

This package also features helpers to fetch larger datasets and parameters commonly used by the machine learning community to benchmark algorithm on data that comes from the ‘real world’.

2.1. Sample images

sklearn-theano embeds sample JPEG images published under Creative Commons license by their authors. These images can be useful to test algorithms and pipelines for images and other multidimensional data.



The default coding of images is based on the uint8 dtype to spare memory. Often machine learning algorithms work best if the input is converted to a floating point representation first. Also, if you plan to use pylab.imshow don’t forget to scale to the range 0 - 1 as done in the following example.


  • plot_single_localization.py

2.2. Sample generators

In addition, sklearn-theano includes various random sample generators that can be used to build artificial datasets of controlled size.


2.3. Larger datasets

sklearn-theano also includes downloaders for larger datasets that can be used for something closer to “real world” testing.