Login

Proceedings

Containing words
Authors
Topics
Types
Years
A Generative Adversarial Network-based Method for High Fidelity Synthetic Data Augmentation
J. Bier, S. Sridharan, S. Sornapudi, Q. Hu, S. Kumpatla
Corteva Agriscience

Digital Agriculture has led to new phenotyping methods that use artificial intelligence and machine learning solutions on image and video data collected from lab, greenhouse, and field environments. The availability of accurately annotated image and video data remains a bottleneck for developing most machine learning and deep learning models. Typically, deep learning models require thousands of unique samples to accurately learn a given task. However, manual annotation of a large dataset will either take a long time if done by a single annotator or drive-up costs significantly if done by many expert annotators. To provide some relief to the data bottleneck, automatic augmentation algorithms can be employed, however, traditional techniques such as rotation, cropping, resampling, adjusting colors, white balance, and contrast are too simplistic and have significant limitations. For example, if the orientation of the samples is important, then rotation would not be an available technique. Clearly a better approach for data augmentation is required which can address these issues and maintain the original properties of the data. Generative Adversarial Network (GAN), a recent development in deep learning, offers a better solution where two neural networks, one to generate data and the other to detect fake/synthetic data, compete against each other to improve the accuracy of both. A carefully trained GAN model can generate images with high fidelity that can be used as a tool for massive data augmentation. We propose a novel method for data augmentation leveraging CycleGAN model and the Mask R-CNN model to generate unique, synthetic data that are indistinguishable from the real data. A few hundred manually annotated samples are used to train a Mask R-CNN model to generate additional semantic segmentation masks. These generated masks and their corresponding real images are used to train the CycleGAN model. Once the CycleGAN has been trained, a virtually unlimited quantity of diverse training samples and annotations can be generated for other downstream processes. To improve the quality of the segmentation masks the Mask R-CNN model is retrained with the generated samples and its corresponding annotations. This process can be iteratively performed to improve the accuracy of the Mask R-CNN and CycleGAN models. In this paper, we also present Toodle, a Python Dash webapp integrated with the CycleGAN model for interactive synthetic data generation on lab images collected to study insect damage to maize and soy leaf samples.

Keyword: digital agriculture, big data, data augmentation, machine learning, deep learning, generative adversarial networks, GAN