ICMS 2026

Mathematical Knowledge Management

Session Organizers:

Ilka Agricola (Marburg),
Patrick Ion (Ann Arbor),
Moritz Schubotz (FIZ Karlsruhe),
Stephen Watt (University of Waterloo).

Session Abstract:

This session is about the opportunities and technologies that software provides to facilitate communication and development of mathematics and access to mathematical knowledge.

Topics (including, but not limited to)

What is mathematical knowledge and how can it be represented?
Software for manipulation, storage and retrieval of mathematical knowledge
Recovery of machine-usable mathematical knowledge from traditional publications
Integration of machine-usable mathematical knowledge in new publication
Integrating levels of formalism in the treatment of mathematical knowledge
The use of generative AI, computer algebra and formal deduction systems for mathematical knowledge management
Mathematical search (for definitions, results, explanations, etc)
Automated summary of mathematical documents
Automated classification of mathematical documents
Mathematical knowledge network navigation
Indexing and meta-knowledge
Rights management
Fraud detection
Authority in mathematical knowledge management
Collaboration tools
Curation and the life cycle of mathematical knowledge bases
Scalability of mathematical knowledge systems

Session Talks (as of May 1):

Click on triangle (▶) to open talk abstract.

Fraudulent publishing in the mathematical sciences: A glimpse at the parallel universe of fake science
Ilka Agricola (U Marburg)

Abstract In November 2023, Clarivate Plc announced that it had excluded the entire field of mathematics from the most recent edition of its influential list of authors of highly cited papers because of massive citation manipulation, which in return influences the so-called "Shanghai ranking" of top universities (or those claiming to be top). While most mathematicians would probably not care, the exclusion is in fact the tip of the iceberg of a parallel universe of predatory and mega-journals whose main purpose is to offer publishing opportunities for whoever is willing to pay the right price. I will explain how the system works, why we should care, and what measures we can all take against. In preparation, I invite you to think about the following questions: How often have you been contacted in the past months to attend a conference not in your field / submit a paper to or edit a special issue in a journal you don't know / review an article within 10 days or so? What do you know about the following journals: "Mathematics", "Axioms" (published by MDPI), "Chaos, solitons, fractals" (Elsevier), "Advances in Difference Equations" (Springer)? Do you know the following mathematicians: Abdon Atangana, Dumitru Baleanu, Hari M. Srivastava?

Math for visually impaired people
Ilka Agricola (U Marburg)

Abstract The most interesting part of mathematics is not linear text, whether it be formulas, matrices, abstract sketches, diagrams or tables. I will introduce the challenges of making maths accessible to everyone, explaining why it is not just a software problem and how modern tools are changing the traditional way of thinking about accessible maths.

Multi-Label Classification of Mathematical Software Communities Using Graph-Based Modeling of Structured Domain Metadata
Maxence Azzouz-Thuderoz, Yuni Susanti, Moritz Schubotz (FIZ Karlsruhe)

Abstract Mathematical research software forms distinct functional communities that span traditional disciplinary boundaries. We present a graph-based approach to classify research publications into software communities using the swMATH database (32,000 software packages, 628,019 citations). Using Louvain community detection, we identify 197 software communities exhibiting a heavy-tailed distribution. We then com pare two paper representations for predicting community membership: expert-assigned Mathematics Subject Classification (MSC) codes versus SPECTER neural embeddings of paper titles. Under a balanced precision–recall regime (neural network and logistic regression classifiers), MSC codes reach 77% micro-recall against 49–59% for title embeddings, while degenerate high-recall/near-zero-precision regimes (Complement NB, Passive Aggressive) are reached by both representations and discussed separately. Our results indicate that, at comparable precision, low-dimensional structured domain metadata is competitive with—and in our setup outperforms—title-only neural embeddings for software community prediction, with direct implications for software discovery and recommendation in computational mathematics.

The landscape of collections of mathematical objects
Katja Berčič (U Ljubljana)

Abstract The landscape of mathematical data, including collections of mathematical objects and properties, remains fragmented and under charted, yet has been gaining visibility thanks to new opportunities opened by rapid advances in formalized mathematics and AI. We take stock of what currently exists: what kinds of mathematical knowledge these collections encode, how they relate to each other, and where significant gaps remain. Along the way, we examine different types of collections through the lens of formalization and AI, and their potential in these directions. We also touch on questions of data models, curation, and long-term sustainability. We close with a call for greater coordination within the MKM community around shared standards and discovery infrastructure, all the more urgent as formal methods and AI raise both the stakes and the possibilities for mathematical data at scale.

Mathlib: A formally-verified repository of mathematical knowledge
Kevin M Buzzard (Imperial College London)

Abstract I will talk about "mathlib", a formalized repository of mathematical definitions and theorems, written in Lean. I will say a little about the history of the project (including an explanation of what Lean is). I will discuss applications to AI, and also the question of why we should formalize mathematics at all. I will discuss why I personally think that having a formalized repository of mathematics is a really important idea, and I will also discuss objections which people have raised when I have made this argument.

Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4
Ginev Deyan (arXiv.org)

Abstract arXiv introduced an HTML format for its article collection several years ago, targeting improved accessibility for its readers. This talk offers an update on that effort, with particular emphasis on the adoption of the forthcoming MathML 4 specification.

The project's overarching objectives remain to achieve at least 90% corpus coverage and to stabilize both the HTML representation and its surrounding tooling ecosystem.

Keywords MathML, accessible mathematics, scholarly HTML, LaTeXML

How to Build a Formal Mathematical Library in which Communication is Prioritized over Certification
William M. Farmer (McMaster U)

A formal mathematical library is a repository of formalized mathematical knowledge organized as a network of interconnected axiomatic theories. Most formal mathematical libraries are built using the standard approach to formal mathematics in which mathematics is done with a proof assistant and all details are formally proved and mechanically checked. The overarching objective of the standard approach is to certify mathematical results. However, most mathematics practitioners – mathematicians, computing professionals, engineers, and scientists – are much more interested in communicating mathematical ideas than in formally certifying their correctness. This is particularly true for applications that involve well-understood mathematics, the kind of mathematics that arises in mathematics education and routine applications. Thus the typical formal mathematical library, which is structured to support certification, does not serve well the typical mathematics practitioner who is more interested in the communication of mathematical knowledge than in its certification.

This talk will describe how to build a formal mathematical library in which communication is prioritized over certification. The key idea is to utilize the free approach to formal mathematics in which mathematics is done within the framework of a formal logic free of the obligation to formally prove and mechanically check all details using a proof assistant. The free approach is characterized by the following four attributes: (1) The underlying logic is fully formal and supports standard mathematical practice. (2) Proofs can be traditional, formal, or a combination of the two. (3) There is support for organizing mathematical knowledge as a theory graph whose nodes are theories and whose directed edges are theory morphisms. (4) There are several levels of supporting software from just LaTeX support to a full proof assistant. These attributes enable the mathematical library builder to (1) stay close to mathematical practice, (2) use traditional proofs that are usually easier to read and write than formal proofs and better suited for communicating the ideas behind proofs, (3) maximize clarity and minimize redundancy using the little theories method, and (4) employ software systems that are much simpler and thus easier to both implement and learn how to use than a typical proof assistant. Pursuing this approach will significantly lower the bar for developing formal mathematical libraries. The talk will end by encouraging the mathematics software community to develop the logics and software needed to support the construction of communication-oriented formal mathematical libraries for applications in which the high assurance of correctness achieved by fully certified formal mathematics is unnecessary.

NuFrameIT: Serious Games via Computation and Reasoning
Stefanie Heim, Michael Kohlhase, Silas Kuder, Florian Rabe (FAU Erlangen-Nürnberg)

Abstract In the UFrameIT framework for serious games we have previously experimented with ideas for co-maintaining virtual worlds (such as those maintained by game engines like Unity) with logical worlds (such as those maintained by knowledge representation and reasoning systems).

The former employ complete concrete representations, including e.g. positions, velocities, and bounding boxes, that is well-suited for simulating an immersive learning environment for the user.

The latter focus on a knowledge-oriented high-level perspective, including e.g. trajectories and exact geometric shapes, that is well-suited for logical problem-solving.

By treating the virual world as the standard model of the logical world, we achieve a coupling that justifies the players ability to freely extend the logical information that is available for problem-solving with measurements taken in the virtual world.

In this paper we present a major reconceptualization, NuFrameIT. It uses a new representation language for the logical worlds that combines both axiomatic and efficiently executable knowledge descriptions, and allows for a more dynamic and fine-grained maintenance of the logical world.

It is coupled systematically to the virtual world, which is seen as the standard model of the logical world, and players can freely apply definitions and measurements to extend the logical information that is available for problem-solving.

We show a simple, but fully functional game based on the new framework.

Can we have a Global Digital Mathematics Library?
Patrick D. F. Ion (U Michigan & IMKT)

Abstract The ideal way to answer the title question would be to produce a useful GDML. This has not yet been done, in spite of years passing since such a thing was first mooted.. There are a good number of reasons that can be suggested for this state of affairs. The topics listed as relevant in this session's description provide much of a framework for the necessary discussions. This piece will bring forward concrete illustrations of several aspects noted in arriving at the present day.

Extend your Math with Semantics and Services
Michael Kohlhase, Dennis Müller (FAU Erlangen-Nürnberg)

Abstract In the last decades we have developed the sTeX framework into a practical publication ecosystem and tested it in large deployments. sTeX can be used as a LaTeX package to produce PDF, but more importantly can generate instrumented HTML that hosts added-value services driven by the mathematical knowledge management engine FLAMS.

In this paper we survey recent improvements in tooling, semantic authoring support, and in-situ services that help readers understand the content. We also show how just anyone can use sTeX to incrementally add semantics to mathematical articles or course materials so that their readers and students can directly profit from added-value services that build on these annotations.

Finally, we show that semantic mathematics is much more FAIR (Findable, Accessible, Interoperable, and Reusable); we could even say that the addition of semantics turns it into mathematical (research) data.

Keywords LaTeX, semantics, MKM, added-value-services

Improving Research Software Discoverability in Mathematical Literature at zbMATH Open
Ankit Satpute*, Maxence Azzouz-Thuderoz*, André Greiner-Petter+, Daniel Mietchen@, Olaf Teschke*, Moritz Schubotz*
(* FIZ Karlsruhe, + U Göttingen, @ U Virginia)

Abstract esearch software is central to modern mathematical research, yet implementations of algorithms and computational methods are often difficult to locate because they are scattered across repositories and rarely linked to the literature. We present ongoing work to improve mathematical software discoverability within zbMATH Open by identifying and linking software implementations associated with mathematical publications. Our aim is to connect zbMATH open with related software implementations present at software hosting platforms including GitHub, Zenodo, and Software Heritage. To support this, we explore mathematics-specific metadata Mathematical Subject Classification (MSC) codes4, named algorithms, and extracted formulae to identify candidate implementations and connect them with relevant publications. This enables discovery of mathematical software across disciplinary boundaries and supports a more integrated ecosystem linking literature, algorithms, and software.

Keywords Research Software, Content Similarity, zbMATH Open

5 Years Mathematical Resarch Data Initative
Moritz Schubotz (FIZ Karlsruhe – Leibniz Institute for Information Infrastructure)

Abstract This talk recapitulates the inaugural lustrum of Germany’s Mathematical Research Data Initiative (MaRDI), established to axiomatize research data as a third primordial entity alongside articles and software. Our data ingestion strategy originates from anthropogenic, high-fidelity sources such as the NIST Digital Library of Mathematical Functions and the zbMATH Open metadata collections, while integrating emergent formal corpora like Lean’s mathlib. We employ stochastic heuristics to approximate solutions to user queries, yet enforce rigorous verification against our curated knowledge graph. Additionally, we delineate a strategy for long-term permanence, proposing the repurposing of Fair Digital Objects and the MediaWiki framework to orthogonalize the nuances of mathematical knowledge management from baseline infrastructure concerns.

Keywords MKM-Session, short abstract, Mathematical Research Data Management, IMKT

Author Intent: The Next Step in Advancing the Accessibility of Mathematics
Neil Soiffer (Talking Cat Software)

Abstract MathML has long been a standard for representing mathematical expressions on the web. It can be exported by a wide variety of authoring systems such as MS Word, LaTeX, and Adobe Illustrator. All major browsers support the display of MathML and all major screen readers support speaking math encoded in MathML. However, some notations such as “|x|” (absolute value? determinant? cardinality?) and “(1,2)” (open interval? coordinate pair? list? GCD?) are ambiguous. Accessibility software can guess at how it should be spoken, but the software is only guessing and could be wrong. However, authors know what they mean when they write something and have a good idea how it should be spoken. MathML 4 now allows authors to convey their intent within MathML so that screen readers correctly speak the notation. The latest versions of LaTeX contain macros that allow authors to provide an intent that gets exported. These are picked up and used by recent versions of the leading screen readers.

We will discuss how author intent fits in MathML, design decisions made in incorporating it, and how it is already being used to significantly improve the readability of math being generated by LaTeX.

Comparing human perception, computer-algebra-based algorithmic approaches, and generative AI for ranking the complexity/difficulty/interest of elementary geometry statements
P. Tolmos (U Rey Juan Carlos), P. Velez, T. Recio (U Nebrija)

Abstract This communication reports on recent developments in our ongoing work exploring the potential synergy between GeoGebra Discovery's computer-algebra-based automated reasoning tools and generative AI programs. Specifically, we describe and analyze generated data comparing the complexity/difficulty/interest ranking of geometric statements across five contexts: (1) data collecting the intuitively perceived, by math teachers and students, via questionnaire, difficulty of each of the items in a collection of standard, secondary education level, theorems, (2) performance data of mathematical olympiad teams when attempting to solve a certain geometry problem, (3) automated ranking by the GeoGebra Discovery ShowProof command, utilizing Gröbner bases and the Rabinowitsch trick to determine statement complexity as the degree of the polynomials multiplying the given hypotheses and the negation of the thesis to get 1, (4) alternative automated raking approaches by other programs such as JGEX, and (5) success rates of generative AI programs such as ChatGPT in proving the same statements. The final goal of this research is twofold: to provide educators with a reliable tool for automated difficulty ranking suitable for classroom decision-making, and to enable automatic cooperation between GeoGebra Discovery and generative AI when the complexity ranking suggests a high likelihood of error, whereby GeoGebra Discovery alerts generative AI to check —using GeoGebra Discovery automated reasoning tools— its own proof steps.

Where Should the IMKT be at Age 10?
Stephen M. Watt (U Waterloo)

Abstract The International Mathematical Knowledge Trust (IMKT) emerged from a long-standing recognition that the global corpus of mathematical knowledge—spanning centuries of books and journals, and now millions of digitally available documents—remains only partially accessible, unevenly organized, and difficult to navigate in a coherent way. Despite large-scale digitization and indexing efforts, the mathematical literature still lacks the level of structural integration, semantic connectivity, and machine accessibility needed to function as a true global knowledge resource.

Early work in Mathematical Knowledge Management (MKM), including the development of standards such as MathML and OpenMath, digital library initiatives, and advances in formal representation, established important foundations but did not fully realize an interoperable, scalable knowledge infrastructure. As IMKT approaches its tenth year, this talk revisits the motivations that led to its formation: the need for a coordinated international effort, sustainable governance, and an open, evolving mathematical knowledge commons grounded in the literature itself.

We review the current landscape, including the emergence of large-scale open resources such as zbMathOpen, growing semantic enrichment of mathematical texts, and the transformative impact of machine learning systems trained over vast mathematical corpora, including arXiv-scale datasets. At the same time, changing economic and technological pressures on scholarly publishing have sharpened both the opportunities and constraints for building shared infrastructure.

Looking forward, we ask: where should IMKT be at age ten? Rather than an idealized end state, we emphasize realistic, impactful goals that enhance what users can actually do with the literature: improved discoverability, reliable semantic linking across documents, and integration with everyday research workflows. We outline a pragmatic path toward a Global Digital Mathematics Library that prioritizes incremental utility, interoperability, and adoption.