paint-brush
Weakness of AI Models in Terms of Mathematical Problem Solvingby@induction
219 reads

Weakness of AI Models in Terms of Mathematical Problem Solving

by Vision NPJune 27th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

I conducted a series of inquiries to assess the accuracy of AI solutions. I began with simple quadratic equations and gradually progressed to more complex problems. Considering OpenAI's caution regarding potentially incorrect or harmful responses from ChatGPT, my article will contribute to the improvement and proper training of AI models in the future.
featured image - Weakness of AI Models in Terms of Mathematical Problem Solving
Vision NP HackerNoon profile picture

Do you have complete confidence in the responses provided by ChatGPT or Google Bard when it comes to solving mathematical problems? Well, let me present you with some important findings. I conducted a series of inquiries to assess the accuracy of AI solutions. I began with simple quadratic equations and gradually progressed to more complex problems, and I aimed to evaluate the problem-solving capabilities of AI. Considering OpenAI's caution regarding potentially incorrect or harmful responses from ChatGPT, my article will contribute to the improvement and proper training of AI models in the future. It's worth noting that Google Bard is still in its experimental phase and has shown comparatively more errors than ChatGPT.

So, let’s get started with a few examples.


First example:

Initially, I asked ChatGPT to solve the following problem to solve:


“Check whether  the roots of x obtained from the equation 5^(x*x)*5^x-15625=0 satisfies other equations 9^(x*x)*9^x-81=0 or not.”


ChatGPT:

⚠️Check carefully for x = -3, it calculated the value 531360, which is not equal to zero, but it declares that the equation is satisfied, amazing!


Now let’s head to Google Bard. I asked the same question:

⚠️Oh, no! It incorrectly calculated the roots. Check the code snippets.


✅Both roots of x are obtained from the first equation, i.e., 2 and -3 does not satisfy by the second equation because for x = 2, it gives 531360, and for x = -3, it gives 531360 but let’s see how ChatGPT and Google Bard have responded. They are not as mature as you think in terms of mathematical problem-solving.


Second example:

Here, I asked the 12th grade’s mathematical problem to test AI responses.


“If logx(1 / 8) = -3 / 4, than what is x? (x is base of log)”


ChatGPT:


⚠️Here, it creates a mistake in the highlighted area as the correct calculation should be (½)^4.


Google Bard:


⚠️It has created a mistake while calculating the third step in the code snippet.


✅The correct calculation gives us a value of x 16, but AI models failed to calculate it.

Let’s see:

The correct solution:

Rewriting the equation in the exponential form gives:

x^(- 3 / 4) = 1 / 8

x= (1 / 8)^(- 4 / 3)

x= 2^4 = 16


Example 3:

Here, I asked the B.Sc. level mathematical problem to test AI responses.


“The area bounded by the parabola y^2=4px and the line x = a revolves about the x-axis. Find the exact value of the volume generated.”


ChatGPT:


⚠️Here, the formula used by ChatGPT to calculate V (in the highlighted area) is incorrect as it has to be V = ∫πy²dx; within limits 0 to a.


Google Bard:


⚠️Here, Google Bard has also generated an incorrect answer (8/3)*a^4.


✅The correct solution to the question is,

First, volume (V) = ∫πy²dx ; within limits a to 0

= π ∫4ax dx

= 4aπ [x²/2]; within the limit 0 to a

= 4aπ.(a²/2)

= 2πa³

Example 3:

Here, I ask AI bots to solve another math problem to check their responses:


“A semicircular piece of paper is folded to make a cone with the centre of the semicircle

as the apex. The half-angle of the resulting cone would be:”


ChatGPT:


⚠️Here, ChatGPT created the mistakes as the circumference of the semicircle “πr” should be equal to the circumference of the cone “2πl*sin(θ)”.\

Google Bard:

⚠️Surprisingly, Google Bard has found 90 degrees as the half angle of the resulting cone with the incorrect reference.


The correct answer in this case:

Let’s say R is the radius of the semicircle, then the slant height of the cone made from it is also R. If r is the radius of the base of the cone, then,

2πr = πR

So, r = R/2

Let A be the half angle of the cone, then,

SinA = (R/2)/R

A = 30 degrees


Despite its several mistakes in the calculations, ChatGPT has genuinely solved some integration-based problems, for example:


Question: Solve ∫xtan^2(x)dx



So, all the above showcases indicate that AI chatbots are still in the experimental phases to solve mathematical problems. As they are backed by huge datasets, chatbots should be properly trained to perform mathematical calculations without any errors. In the above examples, ChatGPT has exhibited some minor to severe mistakes.


If you receive the incorrect solutions or information for your problems, I ask you to “Regenerate” responses until the Chatbots give you accurate solutions; else, you can suggest the correct solutions so that next time bots may offer you the correct solutions. In the cases of the above mistakes, I provided detailed feedback to ChatGPT so that in the next attempts, I received the correct solutions to the above problems. You can also give it a try to test whether it shares accurate information or the same recurring mistakes.


Conclusion:

Chatbots do not share accurate information or the solutions for your mathematical problems. Are you leaving your homework to AI chatbots to solve? Be aware they are not mature enough in mathematical problem-solving; however, the bots are slowly but steadily improving their responses. If the training data of the bots are incorrect, then they can not give you precise solutions. In addition, AI chatbots like ChatGPT and Google Bard may not receive extensive training specifically in mathematics, so they still can’t offer knowledge with advanced mathematical reasoning skills. So, if you are a student, teacher, or expert in the related field and notice the mistakes, help AI models by offering feedback so that a huge number of users will get benefits in the future.