NVIDIA breakthrough emulates images from small datasets for groundbreaking AI training

Lead Editor — Mon, 07 Dec 2020 16:08:23 +0000

NVIDIA’s latest breakthrough emulates new images from existing small datasets with truly groundbreaking potential for AI training.

The company demonstrated its latest AI model using a small dataset – just a fraction of the size typically used for a Generative Adversarial Network (GAN) – of artwork from the Metropolitan Museum of Art.

From the dataset, NVIDIA’s AI was able to create new images which replicate the style of the original artist’s work. These images can then be used to help train further AI models.

The AI achieved this impressive feat by applying a breakthrough neural network training technique similar to the popular NVIDIA StyleGAN2 model.

The technique is called Adaptive Discriminator Augmentation (ADA) and NVIDIA claims that it reduces the number of training images required by 10-20x while still getting great results.

David Luebke, VP of Graphics Research at NVIDIA, said:

“These results mean people can use GANs to tackle problems where vast quantities of data are too time-consuming or difficult to obtain.
I can’t wait to see what artists, medical experts and researchers use it for.”

Healthcare is a particularly exciting field where NVIDIA’s research could be applied. For example, it could help to create cancer histology images to train other AI models.

The breakthrough will help with the issues around most current datasets.

Large datasets are often required for AI training but aren’t always available. On the other hand, large datasets are difficult to ensure their content is suitable and does not unintentionally lead to algorithmic bias.

Earlier this year, MIT was forced to remove a large dataset called 80 Million Tiny Images. The dataset is popular for training AIs but was found to contain images labelled with racist, misogynistic, and other unacceptable terms.

A statement on MIT’s website claims it was unaware of the offensive labels and they were “a consequence of the automated data collection procedure that relied on nouns from WordNet.”

The statement goes on to explain the 80 million images contained in the dataset – with sizes of just 32×32 pixels – meant that manual inspection would be almost impossible and couldn’t guarantee all offensive images would be removed.

By starting with a small dataset that can be feasibly checked manually, a technique like NVIDIA’s ADA could be used to create new images which emulate the originals and can scale up to the required size for training AI models.

In a blog post, NVIDIA wrote:

“It typically takes 50,000 to 100,000 training images to train a high-quality GAN. But in many cases, researchers simply don’t have tens or hundreds of thousands of sample images at their disposal.
With just a couple thousand images for training, many GANs would falter at producing realistic results. This problem, called overfitting, occurs when the discriminator simply memorizes the training images and fails to provide useful feedback to the generator.”

You can find NVIDIA’s full research paper here (PDF). The paper is being presented at this year’s NeurIPS conference as one of a record 28 NVIDIA Research papers accepted to the prestigious conference.

The post NVIDIA breakthrough emulates images from small datasets for groundbreaking AI training appeared first on AI News.

MIT has removed a dataset which leads to misogynistic, racist AI models

Lead Editor — Thu, 02 Jul 2020 15:43:05 +0000

MIT has apologised for, and taken offline, a dataset which trains AI models with misogynistic and racist tendencies.

The dataset in question is called 80 Million Tiny Images and was created in 2008. Designed for training AIs to detect objects, the dataset is a huge collection of pictures which are individually labelled based on what they feature.

Machine-learning models are trained using these images and their labels. An image of a street – when fed into an AI trained on such a dataset – could tell you about things it contains such as cars, streetlights, pedestrians, and bikes.

Two researchers – Vinay Prabhu, chief scientist at UnifyID, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland – analysed the images and found thousands of concerning labels.

MIT’s training set was found to label women as “bitches” or “whores,” and people from BAME communities with the kind of derogatory terms I’m sure you don’t need me to write. The Register notes the dataset also contained close-up images of female genitalia labeled with the C-word.

The Register alerted MIT to the concerning issues found by Prabhu and Birhane with the dataset and the college promptly took it offline. MIT went a step further and urged anyone using the dataset to stop using it and delete any copies.

A statement on MIT’s website claims it was unaware of the offensive labels and they were “a consequence of the automated data collection procedure that relied on nouns from WordNet.”

The statement goes on to explain the 80 million images contained in the dataset, with sizes of just 32×32 pixels, means that manual inspection would be almost impossible and cannot guarantee all offensive images will be removed.

“Biases, offensive and prejudicial images, and derogatory terminology alienates an important part of our community – precisely those that we are making efforts to include. It also contributes to harmful biases in AI systems trained on such data,” wrote Antonio Torralba, Rob Fergus, and Bill Freeman from MIT.

“Additionally, the presence of such prejudicial images hurts efforts to foster a culture of inclusivity in the computer vision community. This is extremely unfortunate and runs counter to the values that we strive to uphold.”

You can find a full pre-print copy of Prabhu and Birhane’s paper here (PDF)

(Photo by Clay Banks on Unsplash)

The post MIT has removed a dataset which leads to misogynistic, racist AI models appeared first on AI News.

ai model – AI News

NVIDIA breakthrough emulates images from small datasets for groundbreaking AI training

MIT has removed a dataset which leads to misogynistic, racist AI models