Brian Lee | Undergraduate Compuational Modeling and Data Analytics at Virginia Tech.

Background

Ever since their introduction in 2014 by Ian Goodfellow and colleagues, generative adversarial networks (GANs) have been a popular subject of study in the field of generative modeling methods. They have gained particular popularity in the community because of the relatively heightened realism and originality of the samples they can generate.

Despite the exciting results GANs have achieved, the training process is often challenging, fraught with known pitfalls resulting in poor model behavior. Some of these training failures occur by the adversarial nature of the model's two networks, in which one model's performance comes at the expense of the other. Other training failures arise from computational constraints. Not only does training time need to be long enough, but sufficiently diverse and abundant training data is required to produce realistic results.

In this project, I worked with my supervisor Logan Eisenbeiser to study how to train GANs capable of realistic image generation with shorter training time and with less data by way of transfer learning. Transfer learning is the method of continuing the training of a pretrained network on a new dataset, a target domain.

Methods: Data Pipeline

Firstly, we desired a way to curate small datasets of images. To do so, we created a data pipeline to automate the scraping of images and filtering of images. For the scraping portion of the pipeline, we scraped images from public Instagram posts by hashtag using the Python Instascrape package. Images from Instagram emulate the kind of variability that would exist in domains limited by specifiicity or an abundance of examples.

After scraping and storing these images, the filter portion of a pipeline would sort through the raw dataset and discard poor training examples or those with elements difficult for the GAN to train with. This entails only keeping images of a sufficient resolution, sharpness, and automatically rejecting images that contain text, human faces, or are grayscale. What is left behind is a small dataset ready for training.

Methods: Training and Performance

Next, training experiments were then run on Virginia Tech Advanced Research Computing (ARC) GPU clusters. We used NVIDIA's StyleGAN3 model, a GAN model known for being adept at differentiating alternate styles of generated images. Using batch jobs of Python scripts, GANs were trained using found hyperparameters, a chosen network, dataset, and training length.

Performance of these trained GANs and the realism of their generations are measured by the Frechet Inception Distance (FID) score that measures the distance between the real and generated images. The model generates examples at snapshots in the training process to assemble performance metrics throughout training.

Results

As hypothesized, certain datasets resulted in better quality generations than others. In addition, the choice of pre-trained network seemed to yield different training results as well.

First, I chose images from "#beachsunsets" to serve as a control dataset because of its relative simple contents and geometry. While some datasets would perform worse with a higher FID score, there would be some datasets that perform better. A dataset scraped from "#bettafish" roughly saw a 21% reduction in FID score compared to the control dataset as training continued. The sample generations from the networks seemed to confirm this improvement, as generations appeared consistent and lifelike, with the generated fish possessing anatomically correct features.

Future Work

WIP