The creators of a revolutionary AI system that can write news stories and works of fiction—dubbed "deepfakes for text"—have taken the unusual step of not releasing their research publicly, for fear of potential misuse.
OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.
At its core, GPT2 is a text generator. The AI system is fed text, anything from a few words to a whole page, and asked to write the next few sentences based on its predictions of what should come next. The system is pushing the boundaries of what was thought possible, both in terms of the quality of the output, and the wide variety of potential uses.
From a research standpoint, GPT2 is groundbreaking in two ways. One is its size, says Dario Amodei, OpenAI’s research director. The models "were 12 times bigger, and the dataset was 15 times bigger and much broader" than the previous state-of-the-art AI model. It was trained on a dataset containing about 10m articles, selected by trawling the social news site Reddit for links with more than three votes. The vast collection of text weighed in at 40 GB, enough to store about 35,000 copies of Moby Dick.
The amount of data GPT2 was trained on directly affected its quality, giving it more knowledge of how to understand written text. It also led to the second breakthrough. GPT2 has far more general purpose than previous text models. By structuring the text that is input, it can perform tasks including translation and summarization, and pass simple reading comprehension tests, often performing as well or better than other AIs that have been built specifically for those tasks.
That quality, however, has also led OpenAI to go against its remit of pushing AI forward and keep GPT2 behind closed doors for the immediate future while it assesses what malicious users might be able to do with it. "We need to perform experimentation to find out what they can and can’t do, " said Jack Clark, the charity’s head of policy. "If you can’t anticipate all the abilities of a model, you have to prod it to see what it can do. There are many more people than us who are better at thinking what it can do maliciously."
Instead, the goal is to show what is possible to prepare the world for what will be mainstream in a year or two’s time. "I have a term for this. The escalator from hell, " Clark said.
"It’s always bringing the technology down in cost and down in price. The rules by which you can control technology have fundamentally changed."
"We’re not saying we know the right thing to do here, we’re not laying down the line and saying ’this is the way’…We are trying to develop more rigorous thinking here.
被称为“文本深度伪造”的革命性人工智能系统能够撰写新闻故事和小说作品因为存在被恶意滥用的风险,它的创造者们这次采取了不同寻常的举措,决定暂不公开研究成果开放智能(OpenAI)是一家非营利研究公司,它得到埃隆.马斯克(Elon Musk)、里德.霍夫曼 (Reid Hoffman)、萨姆.奥尔特曼(Sam Altman)等人的赞助。它表示,因为其旗下一款名为“文本生成二代”(GPT2)的新一代入工智能模型性能非常强大,被恶意使用的风险非常之高。因此,该公司决定打破向公众公开完整研究报告的常规做法,以便有更多的时间讨论这种技术突破可能带来的负面后果。
据开放智能公司的研究主管达里奥.阿莫代(Dario Amodei)介绍,从研究的角度采看,“文本生成二代”在两方面具有开创性意义,其中之一就在于它的规模。该模型“比先前最先进的人工智能模型大12倍,数据集容量大15倍,且包含更多的数据”。“文本生成二代”在一个包含约1000万篇文章的数据集上进行训练,这些文章是通过检索社交新闻网站红迪网(Reddit)上获得超过三个赞的文章,然后挑选出来的。这个庞大的文本集达400亿个字节,足够存储大约35,000份《白鲸记》(Moby Dick) “文本生成二代”接收的这些数据能够帮助它理解书面文本,因而它接受的数据量也直接影响了它的产出质量。这也带来了第二个技术突破。“文本生成二代”比先前的文本模型更通用。通过调整输入文本的结构. 它可以执行包括翻译、概括文章内容等任务,简单的阅读理解测试它也能通过它与专门为这些任务设计的人工智能性能相当,甚至更好。
然而,这技术优势也导致了开发智能公司违背了它推动人工智能前进的使命,在评估恶意用户可能利用“文本生成二代”做出何种举措的同时,该公司决定不在近期公开“文本生成二代”的研究成果。“我们需要进行实验研究,看看用户能够利用‘文本生成二代’实现什么功能。”该慈善机构的政策主管杰克.克拉克(Jack Clark)说,“如果你不能预测一个模型的所有功能,你必须进行测试,看看它能做什么。比我们更善于思考它能做什么坏事的人要多得多。”
1.第1段of a revolutionary AI system that can write news stories and works of fiction—dubbed "deepfakes for text"为The creators的后置定语,这个成分较长,在翻译时不宜将其译为定语,可将其拆分出来,独立成句并置于句首作为对这个人工智能系统的背景介绍. 然后把The creators与第二个破折号后面的这些创造者们所做的事情合译,使译文更加流畅。
2.第2段句子较长,句首OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others “开放智能(OpenAl)是一家非营利研究公司,它得到埃隆.马斯克(Elon Musk)、里德.霍夫曼(Reid Hoffman)、萨姆.奥尔特曼(Sam Altman)等人的赞助”介绍了开放智能公司的背景,在翻译时可将这部分拆分出来,独立成句。
3.第3段第三句in terms of为固定搭配,意思是with regard to;concerning“关于……”,故可译作“从……来看”。
4.第6段第一句behind closed doors为固定搭配,意思是without the public being allowed to attend or know what is happening“公众不被允许参加或知道正在发生的事情”。结合本文语境,此处可译作“不公开”。
