🚨 AI Policy Alert: The German Federal Office for Information Security publishes the report "Generative AI Models - Opportunities and Risks for Industry and Authorities." Quotes & comments:
"LLMs are trained based on huge text corpora. The origin of these texts and their quality are generally not fully verified due to the large amount of data. Therefore, personal or copyrighted data, as well as texts with questionable, false, or discriminatory content (e.g., disinformation, propaganda, or hate messages), may be included in the training set. When generating outputs, these contents may appear in these outputs either verbatim or slightly altered (Weidinger, et al., 2022). Imbalances in the training data can also lead to biases in the model" (page 9)
-
"If individual data points are disproportionately present in the training data, there is a risk that the model cannot adequately learn the desired data distribution and, depending on the extent, tends to produce repetitive, one-sided, or incoherent outputs (known as model collapse). It is expected that this problem will increasingly occur in the future, as LLM-generated data becomes more available on the internet and is used to train new LLMs (Shumailov, et al., 2023). This could lead to self-reinforcing effects, which is particularly critical in cases where texts with abuse potential have been generated, or when a bias in text data becomes entrenched. This happens, for example, as more and more relevant texts are produced and used again for training new models, which in turn generate a multitude of texts (Bender, et al., 2021)." (page 10)
-
"The high linguistic quality of the model outputs, combined with user-friendly access via APIs and the enormous flexibility of responses from currently popular LLMs, makes it easier for criminals to misuse the models for a targeted generation of misinformation (De Angelis, et al., 2023), propaganda texts, hate messages, product reviews, or posts for social media."
➡️ According to the report, special attention should be given to the following aspects:
➵ Raising awareness of users;
➵ Testing;
➵ Handling sensitive data;
➵ Establishing transparency;
➵ Auditing of inputs and outputs;
➵ Paying attention to (indirect) prompt injections;
➵ Selection and management of training data;
➵ Developing practical expertise.
➡️ Of the dozens of AI reports published lately, this one is especially detailed regarding AI-related risk and potential countermeasures.
➡️The document is a must-read for people developing AI or working on AI policymaking and regulation, especially pages 8-28.
➡️ Link to the
@BSI_Bund report below.
➡️ For more information on AI policy and regulation, subscribe to my weekly newsletter (link in bio).