Facebook created more than 10,000 ‘hateful memes’ to help researchers learn how to spot hate speech, Business Insider – Business Insider Malaysia

  • Facebook has built a dataset of thousands of “hateful memes” for researchers.
  • They will be able to use this dataset to learn how to identify online hate speech to better protect against it.
  • The dataset contains “multimodal content” – pieces of content where its meaning is derived from multiple different elements, which can be difficult for AI to understand.
  • Facebook also released its latest community enforcement report on Tuesday.
  • Facebook has made more than 10,000 racist, sexist, and hateful memes – and it’s for a good cause.

On Tuesday, Facebook announced it has created a dataset of more than 10,000 “hateful memes” that will be made available to select researchers working to tackle hate speech online. The database was announced alongside the company’s latest community enforcement report – a report detailing the volume of harmful content that the social networking giant detects and takes down from its platform, from hate speech to illegal material.

Modern content moderation is heavily reliant on advanced machine learning and artificial intelligence – but such technologies typically need to be trained by being shown numerous examples of a particular kind of content before they can learn to recognise it reliably. The “hateful memes” dataset created by Facebook is intended to provide a readily available corpus of data for researchers who can then analyse it so as to build technology that can better detect it in future. Facebook also rebuilt the memes using licensed imagery from Getty to avoid copyright issues.

The dataset includes material that is racist, sexist, and incites violence, Facebook said in a blog post: “Our examples also cover a wide variety of protected categories (such as religion, gender, and sexual orientation) and types of attacks (such as inciting violence or portraying types of people as criminals or terrorists). The distribution in the data set reflects the real-world distribution found in the original examples.”

The memes are specifically examples of what is called “multimodal content” – content that derives its full meaning from taking different elements (e.g. text, imagery) into account at the same time. A meme might have a non-offensive caption and a generic photo, but once combined in a certain way they become insulting or hateful.

Very mild examples of multimodal memes shared by Facebook

caption
Very mild examples of multimodal memes shared by Facebook
source
FB
See also  Photography Help: How to run a successful photography business

In one very mild example Facebook shared, a photo of an empty desert is captioned “look how many people love you.” Either element taken in isolation would be innocuous – but once combined, they become insulting.

Facebook says given the sensitivity of the dataset, it will only be made available to researchers who agree to terms of use on how they use, share, and store it. It will not be available to the general public to download.

Facebook is also launching a contest for researchers – the Hateful Memes Challenge – with a $100,000 prize pool to encourage them to develop AI models using the dataset.

Do you work at Facebook? Contact Business Insider reporter Rob Price via encrypted messaging app Signal (+1 650-636-6268), encrypted email (robaeprice@protonmail.com), standard email (rprice@businessinsider.com), Telegram/Wickr/WeChat (robaeprice), or Twitter DM (@robaeprice). We can keep sources anonymous. Use a non-work device to reach out. PR pitches by standard email only, please.

On Tuesday, Facebook announced it has created a dataset of more than 10,000 “hateful memes” that will be made available to select researchers working to tackle hate speech online. The database was announced alongside the company’s latest community enforcement report – a report detailing the volume of harmful content that the social networking giant detects and takes down from its platform, from hate speech to illegal material.

Modern content moderation is heavily reliant on advanced machine learning and artificial intelligence – but such technologies typically need to be trained by being shown numerous examples of a particular kind of content before they can learn to recognise it reliably. The “hateful memes” dataset created by Facebook is intended to provide a readily available corpus of data for researchers who can then analyse it so as to build technology that can better detect it in future. Facebook also rebuilt the memes using licensed imagery from Getty to avoid copyright issues.

The dataset includes material that is racist, sexist, and incites violence, Facebook said in a blog post: “Our examples also cover a wide variety of protected categories (such as religion, gender, and sexual orientation) and types of attacks (such as inciting violence or portraying types of people as criminals or terrorists). The distribution in the data set reflects the real-world distribution found in the original examples.”

See also  How to Open a Coffee Shop - 3 Day Coffee Business Master Class | Texas Coffee School

The memes are specifically examples of what is called “multimodal content” – content that derives its full meaning from taking different elements (e.g. text, imagery) into account at the same time. A meme might have a non-offensive caption and a generic photo, but once combined in a certain way they become insulting or hateful.

Very mild examples of multimodal memes shared by Facebook

In one very mild example Facebook shared, a photo of an empty desert is captioned “look how many people love you.” Either element taken in isolation would be innocuous – but once combined, they become insulting.

Facebook says given the sensitivity of the dataset, it will only be made available to researchers who agree to terms of use on how they use, share, and store it. It will not be available to the general public to download.

Facebook is also launching a contest for researchers – the Hateful Memes Challenge – with a $100,000 prize pool to encourage them to develop AI models using the dataset.

Do you work at Facebook? Contact Business Insider reporter Rob Price via encrypted messaging app Signal (+1 650-636-6268), encrypted email (robaeprice@protonmail.com), standard email (rprice@businessinsider.com), Telegram/Wickr/WeChat (robaeprice), or Twitter DM (@robaeprice). We can keep sources anonymous. Use a non-work device to reach out. PR pitches by standard email only, please.

This content was originally published here.