Now let's define some special methods starting with the one called in the initializer on_epoch_end() that is called after each epoch as the name may suggest, duh! :param batch_size: The size of each batch returned by _getitem_ :param shuffle: shuffle the data after each epoch :param output_size: image output size after preprocessing :param base_dir: the directory in which all images are stored :param csv_file: file in which image names and numeric labels are stored utils import SequenceĬlass DataGenerator (Sequence ) : def _init_ (self, csv_file, base_dir, output_size, shuffle = False, batch_size = 10 ) : """ It will also take the output shape of the batchįrom tensorflow. The directory containing all of the images.Let's define an initializer, the initializer is going to take the information needed to get the data such as: So your features are images and labels are (x, y, h, w) for coordinate and dimensions of the containing box, and the labels and image names are stored in a csv file. Working with images is a good example for this, so let's say that you have pictures of objects that you need to localize, Tensorflow keras has a Sequence class that can be used for this purpose. Python generators are lazy which means they are iterables that give you the data upon request, unlike regular lists that just store the data in memory all the time.
If you're dealing with a small dataset, that might work, but that is just a waste of resources, and worse if you're working on a huge dataset like the imageNet dataset, this won't work at all. Believe it or not, but loading the entire dataset in memory is NOT the best idea.