+63 (02) 8971 8926
qwerty logo black

The decline of ChatGPT: Is ChatGPT’s accuracy getting worse?

 / 
 / 
September 11, 2023

ChatGPT, developed by OpenAI, was hyped for its remarkable ability to generate human-like text and engage in context-based conversations. Its potential to revolutionise various sectors, including customer service, content creation, and language translation, was widely recognised. 

However, with the recent findings from researchers, it is crucial to reassess the current state of ChatGPT's performance and quality. Understanding the causes and implications of any degradation is vital for users, AI developers, researchers, and organisations relying on this technology for their applications and services. Learn more about ChatGPT’s accuracy below!

Discussing the impact of ChatGPT decline on user experience and satisfaction

Users who previously relied on ChatGPT for various tasks, such as drafting emails, brainstorming ideas, or learning new topics, now find it much less reliable and helpful. Most users feel frustrated since ChatGPT’s responses recently are often incorrect or nonsensical, which, in turn, significantly decreases user trust and confidence in it.

Besides the loss of trust, quick work and productivity for users have declined due to this. It is reported that users spend more time correcting or rephrasing prompts, trying to elicit desired responses rather than engaging in productive conversations. This inefficiency obstructs the overall user experience, hindering the completion of tasks and achieving desired outcomes.

Understanding ChatGPT’s performance decline

Over time, researchers have noticed a decline in the performance and quality of how accurate is ChatGPT. Take a closer look at the specific aspects of ChatGPT's performance that are said to have worsened:

Response accuracy

One of the key areas where ChatGPT's performance has declined is in response accuracy. Earlier versions provided more accurate and relevant responses to user queries. However, recent iterations have shown a decrease in accuracy, with ChatGPT sometimes generating incorrect or nonsensical answers.

User queries

Another aspect that has suffered is ChatGPT's ability to understand user queries effectively. Initially, the model exhibited a strong understanding of context and could generate coherent responses. However, as time progressed, the model's performance in understanding user queries has declined, resulting in often unrelated or off-topic responses.

Evidence of decline

Several examples and evidence highlight the decline in ChatGPT's performance. Users have reported instances where the model failed to grasp the context of a conversation, producing confusing or irrelevant responses. Furthermore, benchmark tests have shown a decrease in accuracy scores over multiple evaluation metrics, indicating a noticeable decline in performance.

ql-new-cta-banner

Experts’ analysis of ChatGPT’s decline

Some researchers hypothesise that the increasing complexity of the model might make it harder for the system to generate accurate responses. Additionally, the lack of diverse and extensive training data could limit the model's ability to effectively handle a wide range of queries. These factors, along with others, could be potential explanations for the observed decline in performance.

The researchers evaluated the performance of GPT-4 and GPT-3.5 on certain tasks to determine whether it improved or declined over time. To measure its capabilities, they used the following safety and performance benchmarks:

  • Solving math problems
  • Answering sensitive/dangerous questions
  • Answering opinion surveys
  • Generating and formatting code
  • Visual reasoning
Experts’ analysis of ChatGPT’s decline

Illustrations 1: Comparison of GPT-4 and GPT-3.5 Performance (March 2023 vs. June 2023)

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

GPT-4 and GPT-3.5: Solving math problems

Matei Zaharia - one of the researchers, shared the research findings on Twitter. He notes the drastic decrease in GPT-4's ability to identify prime numbers from 97.6% in March to 2.4% in June.

In comparison, the accuracy of GPT-3.5 increased from 7.4% in March to 86.8% in June. This marked a clear improvement in the performance of the older model, while GPT-4 had seen a severe decline.

GPT-4 and GPT-3.5: Solving math problems-1
GPT-4 and GPT-3.5: Solving math problems-2

Illustrations 2: Comparison of GPT-4 and GPT-3.5 Performance in Solving Math Problems

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

GPT-4 and GPT-3.5: Answering sensitive/dangerous questions

The research revealed ChatGPT's responses on sensitive topics, such as ethnicity and gender. It had become shorter, more direct and more definite in its refusal to provide an answer.

The previous chatbot gave detailed reasons why it could not answer questions that were considered sensitive. But by June, it no longer gave any sort of reasoning, only apologising for its inability to answer.

GPT-4 and GPT-3.5: Answering sensitive/dangerous questions

Illustrations 3: Comparison of GPT-4 and GPT-3.5 Performance in Answering Sensitive/Dangerous Questions

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

GPT-4 and GPT-3.5: Answering opinion surveys

The researchers gained further insights by taking a closer look at the changes in opinion. In March 2023, GPT-4 had the opinion that the United States would be less important in the world. By June, the model had refused to answer the question, citing that it was too subjective for an AI to answer. This GPT-4's attitude towards subjective questions shows a considerable alteration in its behaviour.

GPT-4 and GPT-3.5: Answering opinion surveys

Illustrations 4: Comparison of GPT-4 and GPT-3.5 Performance in Answering Opinion Surveys

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

GPT-4 and GPT-3.5: Generating and formatting code

A similar drop in performance was also observed for code generation. The team input answers from the updated version of a coding learning platform - LeetCode. But, only 10% of the code was functioning as specified by the platform. In the March version, 50% of that code was executable. When it came to generating lines of new code, the abilities of both models got worse between March and June.

GPT-4 and GPT-3.5: Generating and formatting code

Illustrations 5: Comparison of GPT-4 and GPT-3.5 Performance in Generating and Formatting Code

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

GPT-4 and GPT-3.5: Visual reasoning

The final tests showed a general improvement of 2% for the LLMs. From March to June, both LLMs provided the same answers for visual puzzle queries more than 90% of the time. Unfortunately, the performance rating was not high, with GPT-4 scoring 27.4% and GPT-3.5 scoring 12.2%.

GPT-4 and GPT-3.5: Visual reasoning

Illustrations 6: Comparison of GPT-4 and GPT-3.5 Performance in Visual Reasoning

Image and Research Study Credits: “How Is ChatGPT’s Behavior Changing over Time?" by Lingjiao Chen, Matei Zaharia, James Zou

ql-new-cta-banner

What were the researchers’ take on the study?

So, has ChatGPT gotten worse? Taking into account experts’ views on the recently published research on its decline, they published their opinions to highlight what’s going on in this field of AI. Transparency is needed, especially since numerous industries are slowly turning towards the need and importance of AI in their fields.

Recent reports suggest that GPT-4's performance has, in fact, decreased. OpenAI has denied these claims based on the recent tweet from Peter Welinder. The VP of Product and Partnerships at OpenAI recently tweeted that their new versions of GPT-4 are smarter than the previous ones.

Arvind Narayanan, a professor of computer science at Princeton University, raised doubts about the conclusions of the research. He said in his tweet that they only looked at its immediate capability to execute and did not test the accuracy of the code created by GPT-4. AI researcher Simon Willison shares similar criticisms, pointing to the study's method and the novelty of LLMs wearing off.

OpenAI's head of developer relations, Logan Kilpatrick, responded about it. He tweeted that the team is looking into the reported regressions. 

To help solve the issue, AI researcher Sasha Luccioni of Hugging Face has a suggestion. She mentioned that model creators should provide access to underlying models for audit purposes. She also included that standardised benchmarks should be included with model releases, which Willison agreed with, emphasising the need for more transparency and release notes.

Potential factors behind the ChatGPT’s quality decline: What experts say about it

It is essential to identify the potential causes of the decrease in ChatGPT's performance and quality to improve it to lessen questions like why is ChatGPT getting worse? There may be a variety of reasons behind the observed ChatGPT decline, such as:

Technical factors

Several technical aspects could be behind the decrease in ChatGPT's performance. These could be due to inefficient coding, the complexity of algorithms, or a lack of adequate hardware resources.

Implementation issues

Without prior notice, OpenAI's GPT-3.5 and GPT-4 language models have been updated. While these updates may have been meant to enhance performance, they have had the opposite effect. The models are now more cautious and lack creativity in their responses. Introducing new features or updates into the system could lead to over-optimisation. It will result in integration and architecture miscalculations that cause errors and slow down the system.

The problem is that it is unclear when and how GPT-3.5 and GPT-4 are updated and what each update does to the models' behaviour. This can lead to conversations becoming predictable and less engaging.

Training data limitations

ChatGPT's performance depends on the quality of the data it is trained on. If the data is outdated, biased, insufficient, or lacks variety, the output of the AI model will suffer. As the internet constantly changes, it can be difficult for ChatGPT to keep up with the latest and most diverse data. ChatGPT can remain up-to-date, and effective regular updates to the training dataset must be made. It will result in ensuring accuracy and promptness.

Fine-tuning challenges

It is necessary to adjust AI models through fine-tuning to make them suitable for particular assignments. If fine-tuning is done incorrectly, it can produce politically biased or even false content. Model creators must look into ways to enhance the fine-tuning process to improve the performance of the model.

Scalability challenges

The scalability of ChatGPT should be evaluated to accommodate a larger user base as its popularity increases. If the system is not equipped to bear the load of more users, it could result in slow response times and subpar performance.

Grammar and syntax issues

Using ChatGPT for content creation may have setbacks. It is vulnerable to grammatical mistakes and a lack of proper sentence structure, which can hinder its performance. Also, the repetition of template responses can lead to a lack of originality and creativity. The model has limited contextual understanding. The result is responses that make little sense or are completely irrelevant.

Ethical dilemmas

ChatGPT ensures that all content shared is not discriminatory, inappropriate, or otherwise unacceptable. It is made to prevent any misinformation or biased content from being spread. This is because it relies on a large amount of data. 

To ensure ChatGPT's output aligns with ethical values, content creators must review and verify it. It is essential to be mindful of the potential for plagiarism or violations of intellectual property rights. Make sure to give proper credit to original sources and to respect intellectual property.

Cost-cutting

Some experts have speculated about OpenAI and other AI companies. Many believe that the expense of operating it might be too great for these firms to release their most advanced chatbot models. This could lead to smaller, more specific GPT-4 models, possibly reducing quality due to faster processing.

ql-new-cta-banner

Why it’s crucial to assess ChatGPT's performance over time

It is essential to regularly test ChatGPT's performance since users want to maintain accuracy, reliability, and user satisfaction. Analysing the performance when new data, algorithms, and updates are applied can help find potential for optimisation and improvement, as you can see below:

Impact on user experience

It is clear that the ChatGPT decline in quality has had a negative impact on the user experience. To understand the root causes of this decrease and to come up with ways to improve, research is essential to ensure user satisfaction.

Impact on businesses

The quality of ChatGPT and other AI models must be constantly monitored and improved to ensure user satisfaction and trust. If the quality of these AI models deteriorates, it can have a negative impact on businesses that rely on them. 

OpenAI and other AI developers must prioritise quality control and make sure their AI models are accurate and up to standard. This is essential for ChatGPT for businesses to meet customer expectations and keep user trust and engagement.

Mitigation strategies to address ChatGPT decline in quality and performance

To improve ChatGPT's performance and response times, several strategies like below have been proposed:

Algorithm optimisation

ChatGPT's algorithms can be optimised to ensure smoother and more effective conversations. This optimisation will lead to quicker and more precise responses from the ChatGPT.

Hardware upgrades

Upgrading to powerful processors, more RAM and larger storage capacity will provide a variety of advantages. These include improved operations, faster response times, and enhanced performance.

Dataset expansion

In order for the AI model to have a broader knowledge of a variety of topics, it's important to include a larger amount of data from various sources. This will lead to more precise and contextualised responses from the AI.

Incremental and continuous learning

Incremental learning techniques are critical for maintaining the accuracy and responsiveness of ChatGPT. This will guarantee that it remains current with changing data and user feedback. Regularly updating the model will ensure that it can keep up with changes.

Computational optimisation

Optimise computational processes to increase speed while maintaining accuracy. This could result in quicker response times.

Feedback loop

By considering user feedback, ChatGPT's improvement cycle is kept in an ongoing loop. It is possible to track the changes in AI and make any adjustments to ensure an ideal user experience.

Human-in-the-loop system

Human reviewers are necessary to ensure accuracy. This input provides ongoing knowledge and development, as human judgment is still essential.

Implementing these strategies can improve ChatGPT's quality and performance.

Embrace the power of human-AI synergy with QWERTYLABS

In summary, checking AI models to maintain accuracy and user satisfaction is important. We must be aware that AI has its limits. When combined with human intelligence, it can be a powerful tool for navigating the constantly changing AI landscape. With the right combination of AI and human creativity, we can use technology to our advantage.

At QWERTYLABS, we believe in the power of collaboration between AI and human intelligence. We want to go beyond providing SEO and digital marketing services since we aim to create a strategic partnership with our clients that boosts their brand's message. If you’re looking to improve your brand in SEO, content, and design, contact QWERTYLABS today!

Frequently asked questions

Will OpenAI's efforts guarantee the restoration of ChatGPT's previous quality?

Currently, OpenAI is attempting to tackle the concerns, so there’s no specific guarantee at the moment.

How can users mitigate and cope with the impact of ChatGPT's decline in quality?

Users can do this by directly cross-referencing ChatGPT’s responses from multiple reliable sources.

Should businesses stop using ChatGPT?

Businesses can still use ChatGPT while being mindful of its limitations. However, it’s best to keep in mind that OpenAI's efforts to enhance the model's quality show positive changes in the future.

ql-new-cta-banner
Recent Posts
7 web coding mistakes every web developer should know

Discover common coding mistakes like skipping code validation to help you write bug-free code. Build better websites with the expertise of QWERTYLABS.

Read More
Embrace the future of web development with web testing automation for your online casino

Stop wasting time on manual web testing! Discover the efficiency of automation with QWERTYLABS. Explore tools, benefits, and trends for streamlined testing to improve your online casino brand.

Read More
A guide to proper brand event marketing to elevate your business

Engage your audience with the right brand event marketing strategy. Discover tips and tricks to create impactful events with QWERTYLABS.

Read More

Boost Your Business’ Performance With Our Help

qwerty logo black
We offer full-scale casino and sportsbook SEO, content, and design services for your brand
facebook-icon-blackfacebook-icon-colorinstagram-bnwinstagram-coloredyoutube-iconyoutube-icon-colortwitter-icon-blacktwitter-icon-colorlinkedin-icon-blacklinkedin-icon-color
Contact

8F, One Trium Tower,
Filinvest Avenue, Alabang, Muntinlupa, Metro Manila
Philippines, 1799
[email protected]
+63 (02) 8971 8926