Data


source

SLDataModule

 SLDataModule (parq_file, data_path='', train_tfms=None, test_tfms=None,
               train_size=0.8, valid_size=0.2, test_size=0.1,
               img_size=512, channels=3, batch_size=64, num_workers=6,
               pin_memory=False, calc_stats=False, stats_img_size=224,
               stats_file='img_stats.pkl', seed=42, **kwargs)

DataModule for single label classification.


source

SLDataset

 SLDataset (data, data_path='', tfms=None, img_idx=0, label_idx=1,
            channels=3, class_names=None, **kwargs)

Dataset for single label classification.


source

StatsDataset

 StatsDataset (data, data_path='', img_idx=0, tfms=None, channels=3,
               **kwargs)

Dataset for calculating the mean and std of a dataset.


source

get_data_stats

 get_data_stats (df, data_path='', img_idx=0, img_size=224, channels=3,
                 stats_percentage=0.7, bs=32, num_workers=4, device=None)

Calculates the mean and std of a dataset.


source

split_df

 split_df (train_df, test_size=0.15, stratify_idx=1)

Usage

data_path = Path('../img_data/')
imgs_path = data_path/'images'
parq_file = data_path/'data.parquet'
img_size = 512
channels = 1
batch_size = 4
# CALCULATE STATS

train_tfms = create_transform(img_size, color_jitter=None, hflip=0.5, vflip=0.5, scale=(0.8,1.0),
                              is_training=True, mean=[1,2,3], std=[4,5,6])

test_tfms = create_transform(img_size, mean=[1,2,3], std=[4,5,6])

dm = SLDataModule(parq_file, data_path=data_path, img_size=img_size, batch_size=batch_size,
                  train_tfms=train_tfms, test_tfms=test_tfms, channels=channels, num_workers=6, calc_stats=True)
dm.prepare_data()
dm.train_tfms
Global seed set to 42
Calculating dataset mean and std. This may take a while.

Mean loop:
Batch: 1/1

Std loop:
Batch: 1/1

Done.
Compose(
    RandomResizedCropAndInterpolation(size=(512, 512), scale=(0.8, 1.0), ratio=(0.75, 1.3333), interpolation=bilinear)
    RandomHorizontalFlip(p=0.5)
    RandomVerticalFlip(p=0.5)
    ToTensor()
    Normalize(mean=tensor([0.7367]), std=tensor([0.4174]))
)
dm.setup()
tb = next_batch(dm.train_dataloader())
img = tb['image'][0]
tb['image'].shape, img.shape
(torch.Size([4, 1, 512, 512]), torch.Size([1, 512, 512]))
preds = [f'Label: {l}\nClass: {dm.idx_to_class[int(l)]} Columns' for l in tb['label']]
preds
['Label: 1\nClass: 2 Columns',
 'Label: 0\nClass: 1 Columns',
 'Label: 1\nClass: 2 Columns',
 'Label: 2\nClass: 3 Columns']
show_img([img for img in tb['image']], cmap='gray', titles=preds)