Unpacking the Potential Risks of Generative AI Chatbots on Local Government Websites
Published: 05/01/24
Author Name: Kristi Nickodem
Local governments are beginning to experiment with using generative artificial intelligence (AI) to communicate directly with members of the public. Generative AI powered chatbots (or “AI-enhanced search” functions) are starting to pop up on local government websites across the country. Using this emerging technology for direct, public-facing communications poses some distinct risks for local governments, as discussed in this post.
How are newer chatbots powered by generative AI different from rule-based chatbots?
Chatbots are not new technology. Some North Carolina cities and counties have already featured chatbots on their websites for a number of years. Historically, these have been built as “rule-based” chatbots (also known as “decision-tree” chatbots), meaning they follow an established set of rules and provide answers from a predetermined set of responses. These chatbots generally follow an “if-then” type of logic: if an individual’s question contains certain keywords or phrases, then the chatbot follows a set of predefined rules to select from a number of predetermined responses to answer that question. These rule-based chatbots are built with varying degrees of complexity in how they interpret and respond to questions, but they are all built around predetermined rules and responses. The answers are predictable and the chatbot cannot make up new information in response to a user question.
Chatbots powered by generative AI work differently. These newer chatbots rely on large language models (LLMs) to generate natural language answers in response to user prompts. Instead of using fixed responses, a generative AI chatbot predicts the most fitting words to reply by analyzing patterns in its extensive training data. These responses are more flexible and adaptive to the user’s prompt and are written in more natural human language than the responses from a rules-based chatbot. However, a generative AI chatbot’s responses are also inherently more unpredictable than those of a rule-based chatbot.
What are the risks when a local government uses a generative AI chatbot on its website?
Unlike rule-based chatbots, chatbots powered by generative AI (those built using LLMs) are known to “hallucinate” responses. These tools are capable of generating responses that are made-up, inaccurate, misleading, and sometimes potentially defamatory. New York City’s generative AI chatbot, built using Microsoft’s Azure AI services, recently provided a number of responses that either encouraged users to violate the law or gave inaccurate information about the law. For example, as reported by the Associated Press, the NYC chatbot confirmed that it was legal for an employer to fire a worker who complains about sexual harassment (it is not) and stated that a restaurant may serve cheese to customers if it has rat bites in it (I suspect NYC’s public health inspectors would disagree). A list of examples of other instances in which the NYC chatbot provided inaccurate information about the law is available here.
This is certainly not the only instance of generative AI chatbots producing inaccurate information. A report by the Washington Post found that generative AI chatbots integrated into TurboTax and H&R Block tax-prep software provided a number of misleading, inaccurate, or simply unhelpful answers in response to questions about tax preparation. Publicly available chatbots like Gemini and GPT-4 have been shown to provide misleading or inaccurate information about elections and voting. Researchers at Stanford and Yale have demonstrated that hallucinations are “alarmingly prevalent” when LLMs are used for legal research.
Chatbots based on LLMs are also vulnerable to adversarial attacks in a way that rule-based chatbots are not. With sufficient prompting, bad actors can “jailbreak” a generative AI chatbot, tricking it into violating its own content restrictions and providing harmful information. A recent research paper showed that by asking AI chatbots (including GPT-4 and Claude 2) to take on different “personas” through a series of prompts, researchers were able to get the chatbots to provide detailed instructions on synthesizing methamphetamine, bomb-building, laundering money, and self-harm.
How could this lead to problems for a local government?
Consider the recent cautionary tale of Air Canada, which faced consequences for inaccurate information provided to a customer by its AI chatbot. When the customer, Jake Moffatt, asked the Air Canada customer service chatbot about Air Canada’s bereavement travel policy, the chatbot told Moffatt he could apply for a bereavement fare retroactively. This was inaccurate advice, as Air Canada’s actual policy stated the airline would not refund bereavement travel after a flight was booked. Relying on the chatbot’s advice, Moffatt booked flights and then attempted to request a refund, which Air Canada refused to provide. Moffatt filed a small claims complaint in the British Columbia Civil Resolution Tribunal. The Tribunal determined that Moffat proved a claim of negligent misrepresentation against Air Canada based on the actions of the chatbot, rejecting Air Canada’s argument that the chatbot was a separate legal entity responsible for its own actions. As the Tribunal’s decision stated, “While a chatbot has an interactive component, it is still just a part of Air Canada’s website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot.” The Tribunal ruled that Moffat was entitled to a partial refund, damages to cover interest on the airfare, and his tribunal fees.
A ruling from the British Columbia Civil Resolution Tribunal obviously has no precedential value in United States courts. However, a similar scenario could lead to negative outcomes for a local government. Imagine a citizen who asks her city’s generative AI chatbot whether a building permit is needed before constructing an accessory dwelling unit, receives incorrect guidance from the chatbot, and acts in reliance on that guidance when building the ADU in her backyard. Or imagine a builder who asks his county’s generative AI chatbot what setback and permitting requirements apply to septic systems, receives inaccurate information from the chatbot, and acts in reliance on that information. When a local government attempts to enforce a code, ordinance, or rule against these individuals, they could point to a screenshot from the local government’s own chatbot as the (incorrect) basis for their actions.
North Carolina courts have not yet addressed whether a local government could be held liable for the inaccurate statements of a chatbot on its website. This would depend on a wide variety of factors, including the nature of the plaintiff’s claim, whether the unit of government has waived its governmental immunity through the purchase of insurance, and whether the operation of the chatbot was determined by a court to be a governmental or proprietary function of the local government.
Given the uncertainty of the legal landscape, if a local government chooses to build or purchase a generative AI chatbot for its website, it should consider posting a clear warning to users regarding the chatbot’s potential to produce inaccurate information, along with a reminder to verify information with a local government official or employee prior to acting on it. This begs the question: does a generative AI chatbot play a useful role on a local government’s website if a second layer of verification is necessary to trust its responses?
What if a chatbot uses retrieval augmented generation that limits the chatbot to using a local government’s website as the “source of truth” for its answers?
A generative AI chatbot built using retrieval augmented generation (RAG) has a far better chance of producing accurate answers than one built without it. Using a RAG system helps to limit the universe of information from which the chatbot can generate its answers. For example, a chatbot built using a RAG system might be constructed to retrieve information from a local government’s website when creating its answers.
Nonetheless, a RAG system does not fully constrain a generative AI chatbot from producing information that is inaccurate, misleading, or unhelpful. In some cases, the system may struggle to retrieve and process information from a local government’s website because of the way in which the website was created or the way data is encoded on the site. The extent to which a RAG system can help a chatbot produce accurate answers varies widely depending on how the relevant data (from which the chatbot is retrieving the basis for its response) is stored and organized. If a local government’s website contains information that is ambiguous, outdated, contradictory, incomplete, or poorly organized, those same flaws will be apparent when a RAG-based chatbot attempts to provide answers based on that information.
Even a well-constructed RAG system does not completely ensure that the chatbot might not hallucinate or misstate some elements of the response. Again, a feature of chatbots built using LLMs is that their answers have an element of unpredictability. This unpredictability of generative AI chatbots makes them nimble in responding to a wide variety of prompts but can also lead to unexpected and undesirable responses.
What should a local government consider when vetting a generative AI chatbot?
A local government considering investing in a generative AI chatbot for its website should vet the product carefully by testing how it responds to a wide variety of different prompts. If possible, test out publicly available chatbots that the vendor has already built for other local governments. Is the chatbot able to provide accurate responses to questions that are easily answered from a traditional search of the local government’s website? How does the chatbot respond to prompts that ask for harmful information? How does it handle questions about local ordinances or state laws? What does the chatbot produce when faced with a question that cannot be answered through information available on the local government’s website? If the chatbot is asked the same question multiple times in a row, will it provide different answers?
Local governments considering a generative AI chatbot should also consider whether the tool is designed as a model that only allows the user to ask a single question and receive a single answer based on that input, or if the chatbot allows a back-and-forth conversation with the user by “remembering” earlier parts of the conversation when generating new responses. There are pros and cons to each approach.
A single-question/single-answer model is unlikely to provide the most helpful information for the user, as generative AI tools tend to function better when they can understand context around a particular question through multiple prompts. Conversely, a conversational chatbot with a context window that allows the chatbot to refine and improve its answers based on multiple prompts from a user has a better chance of generating a more robust and helpful response. However, using a chatbot that allows multiple prompts and responses in a single conversation also increases the chances for the chatbot to generate incorrect information. It may also make it more challenging to evaluate how the chatbot will respond to particular questions, given that its response to a certain prompt may change depending on the context of the conversation around that prompt.
Before investing in a product, a local government’s IT staff should examine any mechanisms that a vendor promises will produce reliable answers, including how the RAG system built into the chatbot (if any) will be constructed and what role the local government will play in collecting or organizing data to be retrieved by the chatbot. The local government should also ask for information about what foundation model the chatbot is using to generate its responses (e.g. GPT-4, Llama 3, Gemini, Claude 3) and how the final product is fine-tuned before it is deployed on the local government’s website. This should include seeking information about how the tool is designed to provide factual responses and any guardrails the vendor claims will limit inappropriate or inaccurate responses. Finally, local governments should also understand the costs associated with updating and retraining the chatbot, which may require significant investments of resources over time.
Conclusion
This technology will continue to improve at a rapid pace, but for the time being, it still poses some risks when used for direct communication between a local government and the public it serves. Even if a local government could theoretically guard itself from liability through careful disclaimers and warnings on its chatbot, leaders will need to decide whether using this technology to communicate with the public aligns with the local government’s values and ethics. A disclaimer on New York City’s generative AI chatbot warns users that it “may occasionally produce incorrect, harmful or biased content.” If a local government employee had a pattern of producing incorrect, harmful, or biased content, that employee would likely face eventual discipline and dismissal. Is it prudent to invest in a generative AI chatbot that poses these same risks when directly communicating with the public on a local government’s website? Local governments will have to make this cost-benefit analysis when deciding whether to invest in these tools for public-facing communication and vet such tools carefully to ensure they serve the aims of good governance.
1
Coates’ Canons NC Local Government Law
Unpacking the Potential Risks of Generative AI Chatbots on Local Government Websites
Published: 05/01/24
Author Name: Kristi Nickodem
Local governments are beginning to experiment with using generative artificial intelligence (AI) to communicate directly with members of the public. Generative AI powered chatbots (or “AI-enhanced search” functions) are starting to pop up on local government websites across the country. Using this emerging technology for direct, public-facing communications poses some distinct risks for local governments, as discussed in this post.
How are newer chatbots powered by generative AI different from rule-based chatbots?
Chatbots are not new technology. Some North Carolina cities and counties have already featured chatbots on their websites for a number of years. Historically, these have been built as “rule-based” chatbots (also known as “decision-tree” chatbots), meaning they follow an established set of rules and provide answers from a predetermined set of responses. These chatbots generally follow an “if-then” type of logic: if an individual’s question contains certain keywords or phrases, then the chatbot follows a set of predefined rules to select from a number of predetermined responses to answer that question. These rule-based chatbots are built with varying degrees of complexity in how they interpret and respond to questions, but they are all built around predetermined rules and responses. The answers are predictable and the chatbot cannot make up new information in response to a user question.
Chatbots powered by generative AI work differently. These newer chatbots rely on large language models (LLMs) to generate natural language answers in response to user prompts. Instead of using fixed responses, a generative AI chatbot predicts the most fitting words to reply by analyzing patterns in its extensive training data. These responses are more flexible and adaptive to the user’s prompt and are written in more natural human language than the responses from a rules-based chatbot. However, a generative AI chatbot’s responses are also inherently more unpredictable than those of a rule-based chatbot.
What are the risks when a local government uses a generative AI chatbot on its website?
Unlike rule-based chatbots, chatbots powered by generative AI (those built using LLMs) are known to “hallucinate” responses. These tools are capable of generating responses that are made-up, inaccurate, misleading, and sometimes potentially defamatory. New York City’s generative AI chatbot, built using Microsoft’s Azure AI services, recently provided a number of responses that either encouraged users to violate the law or gave inaccurate information about the law. For example, as reported by the Associated Press, the NYC chatbot confirmed that it was legal for an employer to fire a worker who complains about sexual harassment (it is not) and stated that a restaurant may serve cheese to customers if it has rat bites in it (I suspect NYC’s public health inspectors would disagree). A list of examples of other instances in which the NYC chatbot provided inaccurate information about the law is available here.
This is certainly not the only instance of generative AI chatbots producing inaccurate information. A report by the Washington Post found that generative AI chatbots integrated into TurboTax and H&R Block tax-prep software provided a number of misleading, inaccurate, or simply unhelpful answers in response to questions about tax preparation. Publicly available chatbots like Gemini and GPT-4 have been shown to provide misleading or inaccurate information about elections and voting. Researchers at Stanford and Yale have demonstrated that hallucinations are “alarmingly prevalent” when LLMs are used for legal research.
Chatbots based on LLMs are also vulnerable to adversarial attacks in a way that rule-based chatbots are not. With sufficient prompting, bad actors can “jailbreak” a generative AI chatbot, tricking it into violating its own content restrictions and providing harmful information. A recent research paper showed that by asking AI chatbots (including GPT-4 and Claude 2) to take on different “personas” through a series of prompts, researchers were able to get the chatbots to provide detailed instructions on synthesizing methamphetamine, bomb-building, laundering money, and self-harm.
How could this lead to problems for a local government?
Consider the recent cautionary tale of Air Canada, which faced consequences for inaccurate information provided to a customer by its AI chatbot. When the customer, Jake Moffatt, asked the Air Canada customer service chatbot about Air Canada’s bereavement travel policy, the chatbot told Moffatt he could apply for a bereavement fare retroactively. This was inaccurate advice, as Air Canada’s actual policy stated the airline would not refund bereavement travel after a flight was booked. Relying on the chatbot’s advice, Moffatt booked flights and then attempted to request a refund, which Air Canada refused to provide. Moffatt filed a small claims complaint in the British Columbia Civil Resolution Tribunal. The Tribunal determined that Moffat proved a claim of negligent misrepresentation against Air Canada based on the actions of the chatbot, rejecting Air Canada’s argument that the chatbot was a separate legal entity responsible for its own actions. As the Tribunal’s decision stated, “While a chatbot has an interactive component, it is still just a part of Air Canada’s website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot.” The Tribunal ruled that Moffat was entitled to a partial refund, damages to cover interest on the airfare, and his tribunal fees.
A ruling from the British Columbia Civil Resolution Tribunal obviously has no precedential value in United States courts. However, a similar scenario could lead to negative outcomes for a local government. Imagine a citizen who asks her city’s generative AI chatbot whether a building permit is needed before constructing an accessory dwelling unit, receives incorrect guidance from the chatbot, and acts in reliance on that guidance when building the ADU in her backyard. Or imagine a builder who asks his county’s generative AI chatbot what setback and permitting requirements apply to septic systems, receives inaccurate information from the chatbot, and acts in reliance on that information. When a local government attempts to enforce a code, ordinance, or rule against these individuals, they could point to a screenshot from the local government’s own chatbot as the (incorrect) basis for their actions.
North Carolina courts have not yet addressed whether a local government could be held liable for the inaccurate statements of a chatbot on its website. This would depend on a wide variety of factors, including the nature of the plaintiff’s claim, whether the unit of government has waived its governmental immunity through the purchase of insurance, and whether the operation of the chatbot was determined by a court to be a governmental or proprietary function of the local government.
Given the uncertainty of the legal landscape, if a local government chooses to build or purchase a generative AI chatbot for its website, it should consider posting a clear warning to users regarding the chatbot’s potential to produce inaccurate information, along with a reminder to verify information with a local government official or employee prior to acting on it. This begs the question: does a generative AI chatbot play a useful role on a local government’s website if a second layer of verification is necessary to trust its responses?
What if a chatbot uses retrieval augmented generation that limits the chatbot to using a local government’s website as the “source of truth” for its answers?
A generative AI chatbot built using retrieval augmented generation (RAG) has a far better chance of producing accurate answers than one built without it. Using a RAG system helps to limit the universe of information from which the chatbot can generate its answers. For example, a chatbot built using a RAG system might be constructed to retrieve information from a local government’s website when creating its answers.
Nonetheless, a RAG system does not fully constrain a generative AI chatbot from producing information that is inaccurate, misleading, or unhelpful. In some cases, the system may struggle to retrieve and process information from a local government’s website because of the way in which the website was created or the way data is encoded on the site. The extent to which a RAG system can help a chatbot produce accurate answers varies widely depending on how the relevant data (from which the chatbot is retrieving the basis for its response) is stored and organized. If a local government’s website contains information that is ambiguous, outdated, contradictory, incomplete, or poorly organized, those same flaws will be apparent when a RAG-based chatbot attempts to provide answers based on that information.
Even a well-constructed RAG system does not completely ensure that the chatbot might not hallucinate or misstate some elements of the response. Again, a feature of chatbots built using LLMs is that their answers have an element of unpredictability. This unpredictability of generative AI chatbots makes them nimble in responding to a wide variety of prompts but can also lead to unexpected and undesirable responses.
What should a local government consider when vetting a generative AI chatbot?
A local government considering investing in a generative AI chatbot for its website should vet the product carefully by testing how it responds to a wide variety of different prompts. If possible, test out publicly available chatbots that the vendor has already built for other local governments. Is the chatbot able to provide accurate responses to questions that are easily answered from a traditional search of the local government’s website? How does the chatbot respond to prompts that ask for harmful information? How does it handle questions about local ordinances or state laws? What does the chatbot produce when faced with a question that cannot be answered through information available on the local government’s website? If the chatbot is asked the same question multiple times in a row, will it provide different answers?
Local governments considering a generative AI chatbot should also consider whether the tool is designed as a model that only allows the user to ask a single question and receive a single answer based on that input, or if the chatbot allows a back-and-forth conversation with the user by “remembering” earlier parts of the conversation when generating new responses. There are pros and cons to each approach.
A single-question/single-answer model is unlikely to provide the most helpful information for the user, as generative AI tools tend to function better when they can understand context around a particular question through multiple prompts. Conversely, a conversational chatbot with a context window that allows the chatbot to refine and improve its answers based on multiple prompts from a user has a better chance of generating a more robust and helpful response. However, using a chatbot that allows multiple prompts and responses in a single conversation also increases the chances for the chatbot to generate incorrect information. It may also make it more challenging to evaluate how the chatbot will respond to particular questions, given that its response to a certain prompt may change depending on the context of the conversation around that prompt.
Before investing in a product, a local government’s IT staff should examine any mechanisms that a vendor promises will produce reliable answers, including how the RAG system built into the chatbot (if any) will be constructed and what role the local government will play in collecting or organizing data to be retrieved by the chatbot. The local government should also ask for information about what foundation model the chatbot is using to generate its responses (e.g. GPT-4, Llama 3, Gemini, Claude 3) and how the final product is fine-tuned before it is deployed on the local government’s website. This should include seeking information about how the tool is designed to provide factual responses and any guardrails the vendor claims will limit inappropriate or inaccurate responses. Finally, local governments should also understand the costs associated with updating and retraining the chatbot, which may require significant investments of resources over time.
Conclusion
This technology will continue to improve at a rapid pace, but for the time being, it still poses some risks when used for direct communication between a local government and the public it serves. Even if a local government could theoretically guard itself from liability through careful disclaimers and warnings on its chatbot, leaders will need to decide whether using this technology to communicate with the public aligns with the local government’s values and ethics. A disclaimer on New York City’s generative AI chatbot warns users that it “may occasionally produce incorrect, harmful or biased content.” If a local government employee had a pattern of producing incorrect, harmful, or biased content, that employee would likely face eventual discipline and dismissal. Is it prudent to invest in a generative AI chatbot that poses these same risks when directly communicating with the public on a local government’s website? Local governments will have to make this cost-benefit analysis when deciding whether to invest in these tools for public-facing communication and vet such tools carefully to ensure they serve the aims of good governance.
All rights reserved. This blog post is published and posted online by the School of Government to address issues of interest to government officials. This blog post is for educational and informational use and may be used for those purposes without permission by providing acknowledgment of its source. Use of this blog post for commercial purposes is prohibited. To browse a complete catalog of School of Government publications, please visit the School’s website at www.sog.unc.edu or contact the Bookstore, School of Government, CB# 3330 Knapp-Sanders Building, UNC Chapel Hill, Chapel Hill, NC 27599-3330; e-mail sales@sog.unc.edu; telephone 919.966.4119; or fax 919.962.2707.