While machine learning models need a lot of data and computing power, to drive most progress from these models through AI, researchers use different tests they put against AI to see how advanced their models are. These tests are called benchmarks.
So, Facebook has created a platform that will essentially rethink those AI benchmarks through a one-of-its-kind procedure called ‘Dynamic adversarial data collection.’ Through this procedure, Dynabench will evaluate different AI modes and their benchmarks. It will use both humans and models to create newer and more challenging data sets that will lead to more flexible, more efficient AI.
To put it simply, Dynabech will evaluate how easily AI systems are fooled by humans. It is going to provide a better metric that will be able to indicate an AI model’s quality than what the current benchmarks provide, which have become static and saturated over time. Dynabench will be able to reflect on the performance of AI models in different circumstances where their pre-set data will not be enough and humans will be able to demand them to act and respond in complex different ways.
Suppose, a student learns everything and aces the written test, but when he is asked questions, he starts fumbling because he has ‘learned’ things, but he may not have ‘understood’ them accurately. This is what Dynabench is going to help with. AI models with pre-set data points and benchmarks are like that student, and humans are going to be their examiners who will put forward different questions and challenges through Dynabench to know where the AI model will start fumbling or gets fooled by humans, leading to making incorrect predictions. The ‘fumbling points’ of these AI models will help improve these systems with Dynabench’s new and more challenging data sets, which will then be used to train the next generation of these AI models. These models can in turn be benchmarked with Dynabench- meaning they become tests for the next generations of AI. So, this is going to make a virtuous cycle in the progress and advancement of AI research.
Dynabench uses people to interrogate AI models, and it will invite people to go to a website and quiz the AI models behind it, and then score their answers accordingly. Currently, such type of AI model testing is done with GPT-3 where people test its limits, or for the evaluation of chatbots where they try to pass as humans. However, with Dynabench’s platform, the failures of these AI models will be automatically used to create benchmarks for future models, and this will help them get evolved and better with time.
For the time being, Dynabench will work on language models only, because they are the easiest type of AI models that humans interact with. But eventually, it may start working with other types of neural networks too, like speech and image recognition methods and systems.