The Distributed Data Masses (MDD) 2026 thematic school, dedicated to “Data Management for Language Models”, brought together researchers, PhD students, engineers and AI regulation stakeholders in Cargèse, Corsica. Co-organised with the European ARMADA network, the event explored scientific, technical and societal challenges linked to language models and data management.
From vector indexing to Green AI and AI regulation, the programme combined lectures, workshops, poster sessions and scientific exchanges across multiple disciplines.
A thematic school focused on language models and data management
Held at the Institute for Scientific Studies of Cargèse at the University of Corsica, the 2026 edition of the “Distributed Data Masses” (MDD) thematic school continued a long-standing initiative connected to the French database research community.
Created as a complementary educational event to the national BDA conference (“Data Management — Principles, Technologies, and Applications”), the MDD school has been organised regularly since 2010. Previous editions took place in Les Houches, Aussois, Oléron, Urrugne, Bastia and Ceillac-en-Queyras.
The 2026 edition focused on a central topic in contemporary artificial intelligence research: Data Management for Language Models. The programme addressed both the technical foundations of large language models and broader questions related to sustainability, explainability, regulation and data governance.
A multidisciplinary audience from research and industry
The school welcomed master’s students, PhD candidates, postdoctoral researchers, research engineers and faculty members interested in the latest developments in AI and data management.
The event also opened discussions to participants outside computer science, particularly professionals involved in AI governance and regulation. This multidisciplinary approach encouraged exchanges between technical researchers, legal experts and innovation stakeholders.
MDD 2026 was jointly organised with ARMADA, a European International Training Network dedicated to training researchers in data management and AI-related topics. Poster sessions, PhD presentations and informal scientific discussions contributed to a collaborative environment throughout the week.
Scientific topics at the intersection of AI and data
The programme explored several thematic axes linked to language models and large-scale data systems:
High-dimensional vector indexing and similarity search
As language models increasingly rely on embeddings and vector databases, efficient indexing and similarity search remain major research topics. Sessions led by Themis Palpanas addressed techniques for managing high-dimensional vector spaces and improving large-scale retrieval systems.
Data provenance and explainable AI
Questions related to transparency and traceability were discussed through sessions on data provenance and explainable AI. Katja Hose, Silviu Maniu and Maxime Jakubowski presented approaches for tracking data usage and understanding model outputs, while Jasmina Bogojeska explored data modalities and explainability challenges.
Green AI and data reduction
The environmental impact of AI systems formed another major topic of discussion. Nicolas Burger presented research perspectives related to Green AI and resource-efficient machine learning systems. Sessions led by Yannis Velegrakis focused on data reduction methods designed to optimise storage and computation requirements.
AI agents and regulation
The programme also examined the growing role of AI agents in human-AI collaboration and software development. Behrooz Omidvar-Tehrani presented sessions on coding agents and language models applied to software engineering.
Legal and regulatory dimensions were addressed by Robin Plique, whose lectures focused on AI regulation and governance frameworks.
Workshops on experimentation and scientific communication
Alongside the scientific lectures, participants attended workshops on research methodologies, experimentation, and scientific dissemination.
Nicolas Travers, Full Professor (HDR), Deputy Director of DVRC lab at ESILV, led a workshop on conducting experiments and scientific practices. The programme also included discussions on scientific writing, publications, open data sharing and research dissemination.
These sessions aimed to support young researchers in structuring experimental protocols, publishing scientific results and developing collaborative research practices.
A collaborative environment for the research community
The school created opportunities for networking and interdisciplinary exchanges between participants from different institutions and research backgrounds. Discussions extended beyond formal lectures through poster sessions and collaborative moments organised throughout the week.
The organisers highlighted the importance of creating spaces dedicated to scientific dialogue around emerging AI challenges and data management practices.
MDD 2026 was organised with the support of several academic and research partners, including ARMADA, MIAI Cluster IA, De Vinci Research Center, CEDRIC-CNAM and LabEx PERSYVAL.
A growing research focus on responsible AI systems
As language models continue to evolve rapidly, topics such as sustainability, explainability, data governance and AI regulation are becoming central research priorities.
By bringing together researchers, engineers and policy stakeholders, the MDD 2026 thematic school contributed to ongoing discussions about the future of responsible and scalable AI systems.
Learn more about AI and data science research at ESILV and De Vinci Research Center.
This post was last modified on 19 May 2026 2:44 pm