For a long time, computer scientists struggled to develop artificial intelligence that could solve difficult symbolic math. At best, it could solve high school math problems—and not even well enough to pass those classes. That disappointed Iddo Drori, a computer science lecturer at the Massachusetts Institute of Technology, whose 500 students in one of his classes a couple of years ago had more questions than he had time to answer. Enticed by the possibility that artificial intelligence might address this tutoring gap, he and his team set to work attempting to develop a machine learning model that could solve calculus, differential equations and linear algebra problems sourced from undergraduate math courses at MIT and Columbia University.
Now, the team has introduced a neural network—an AI algorithm inspired by the brain’s structure and function—that solves college-level math problems at a human level in seconds, according to their Proceedings of the National Academy of Sciences paper published last month. The model can also explain the solutions and generate new problems that students found indistinguishable from human-generated problems. The development could assist faculty in creating new course content and automate grading in large in-person or massive open online courses. And it could tutor students by explaining the steps in difficult problems.
But some scholars are concerned that the explanations provided by the algorithm are not yet on par with those offered by humans. And others worry that the algorithm introduces new ways for students to cheat or the prospect of other unintended consequences.
“We’re now working on an AI that will graduate from MIT in computer science,” Drori said. “It won’t earn a degree formally, but it would complete [and pass] the courses.”
Neural networks excel at solving problems by way of pattern recognition. They train themselves by looking at large data sets—the larger the better—after which they generate new examples. They can produce realistic images of faces, for example, after looking at many images of real faces. But math with symbols such as the integrals found in calculus demands precision rather than approximation. This stands in contrast to number crunching, at which computers and calculators excel. That technology often produces approximate solutions that work well enough for engineers or physicists.
Faculty members can already use the new algorithm, which is available open source on GitHub, to build curricular content. In addition to recovering time for them to focus on human tasks such as coaching and mentoring, professors could use the data it produces to understand whether their course prerequisites are, in fact, the right prerequisites. That could help ensure that the students they encounter in class are set up for success.
“I’m surprised this wasn’t done sooner, since undergrad mathematics courses are such closed systems,” said Doug Ensley, a math professor at Shippensburg University who has held a Mathematical Association of America leadership position. “These developments that more and more confound the traditional lecture-homework-test framework actually provide more and more reasons for instructors to shift toward more active classroom strategies.”
But some experts see limitations in the neural network. The algorithm’s explanations, according to some, are a high-level summary of steps rather than a deep account of the process and underlying concepts.
“This may be useful in generating practice problems and evaluating student answers to problems,” said Jay McClelland, professor and director at Stanford University’s Center for Mind, Brain, Computation and Technology. “But there is still a long way to go to better support helping students understand the concepts being taught in these math courses.”
Drori is aware that his team still has work ahead. Though the students rated the machine-generated questions on par with human-written questions, his team has not yet evaluated students’ perceptions of the algorithms’ explanations. The AI also cannot handle questions that rely on images such as graphs or questions that involve mathematical proofs. But he is pleased with the results so far and optimism about making more progress.
“We improved a high school math benchmark from 8 percent to 80 percent accuracy, and we solved university-level course problems for the first time and at a human level,” Drori said. “It’s not every day that you move the needle by an order of magnitude.”
One member of the research team, however, expressed reservations about the work.
“It might be even a scary step forward,” said Avi Shporer, an MIT astronomer and co-author of the study. “If a machine can answer these questions, then how do we know that the student really did answer those questions?” Shporer noted that the work was done in a controlled laboratory setting and could have unintended consequences when released to the real world.
Not everyone is concerned about students using the tool to cheat.
“COVID taught us that this is already pretty easy,” Ensley said. “It certainly reminds us that there must be more to our courses than problem sets and exams.”
Since not all learning in math is about obtaining numerical answers, Drori sometimes encourages students to use the tool to solve problems in his classes. This way, they take minutes rather than hours to solve problems, which provides opportunities to ponder large, conceptual questions and learn tech skills that may transfer to future careers.
“They still solve plain vanilla exercises without the tools,” Drori said, before adding that “it’s a part of progress.” He likened the advancement to the development of calculators and self-driving cars, which also free humans from tedious tasks and encourages the development of new skills.
Drori’s machine learning breakthrough was likely influenced by his willingness to think differently. Instead of asking the AI to solve the symbolic math problems, he asked the program to complete programming tasks. For example, “find the difference between two points” could be asked as “write a program that finds the difference between two points.” (Not every problem was this simple; in some cases, the neural network needed context to understand the problem.) Then, instead of pretraining the neural network by showing it millions of examples of text-based problems as is typically done, he pretrained it on millions of examples of text and code.
The work is not intended to replace human professors, according to Drori. Rather, he sees the advancement as an opportunity for faculty members to be more thoughtful and creative in their teaching and research.
“Every time you solve something, someone will come up with a harder question,” he said in an MIT press release. “But this work opens the field for people to start solving harder and harder questions with machine learning.”