Inférence de langages réguliers

Bernard Bel

Journées Françaises de l'Apprentissage (1990), Lannion, France : 5-27
https://hal.science/hal-00275789v2

Résumé

Cet exposé présente une méthode générale d'acquisition de connaissances dans un domaine formalisé à l'aide d'automates finis (langages réguliers). A partir d'un échantillon d'exemples le système construit un automate "presque minimal" qui n'est pas nécessairement déterministe. Cette construction peut être contrainte par des connaissances sur la segmentation des exemples que le système peut acquérir en questionnant l'informateur. Dans une deuxième phase, le système généralise l'automate à partir de propriétés connues du langage ou(et) à partir d'hypothèses validées à l'aide d'oracles. Une liste non exhaustive d'heuristiques générales est proposée. La démonstration s'appuie sur un cas réel de modélisation d'apprentissage de schémas d'improvisation musicale.

Excerpts of an AI review of this paper (June 2025)

Summary of the Work

The paper explores a methodology for inferring regular languages (langages réguliers) from examples. It discusses both theoretical underpinnings — particularly Gold’s theorem and subsequent refinements — along with an incremental approach to building “almost minimal” automata. The author addresses central notions such as the canonical acceptor, the prefix-tree acceptor, and the importance of maintaining correctness when moving from a set of examples to a generalized automaton. The study also showcases an application to musical improvisation schemata, illustrating how domain-specific rules can constrain and refine grammatical inference.

Main Contributions

Incremental Learning Framework

The paper offers a structured, step-by-step method: starting with an acceptor built from examples alone (the prefix-tree or “arborescent” automaton), merging states to reduce complexity, and applying constraints so as not to overgeneralize. This incremental perspective is well-motivated by real-world use cases where examples become available over time.

Formal Properties and Correctness

The author places strong emphasis on ensuring the correctness of generalization. By referencing Gold’s theorem, k-reversible languages, and the concept of learning from positive data only, the paper grounds its approach in established theoretical results.

Use of Domain Constraints

A significant portion of the paper deals with how additional domain knowledge (e.g., known segmentation rules for a music notation system) can guide the automaton’s construction. This introduces practical heuristics that make grammatical inference more tractable in real-world scenarios.

Detailed Mathematical Rigor

The proofs, definitions, and lemmas (for instance, those on heads and tails, or minimal canonical acceptors) are stated thoroughly. This rigor is helpful both for ensuring correctness and for conveying the method’s theoretical reliability.

Illustrative Examples

The application to musical improvisation, specifically to Indian drumming traditions, serves as a concrete illustration. It nicely demonstrates the interplay between abstract formalism (like merging states) and domain-specific knowledge (like rhythmic “words” or onomatopoeic syllables).

Clarity and Structure

  • The paper is logically well-structured: it begins with a theoretical foundation, then introduces incremental methods for automaton construction, culminating in a discussion of domain constraints and examples.
  • The textual organization, with numerous definitions in the appendix, is beneficial for reference but can require multiple cross-referencing steps while reading. A concise reminder of key definitions in the main text can further aid comprehension.

Strengths

  • Thorough grounding in established theory (Gold’s theorem, reversible languages).
  • A practical approach that balances systematic inference with interactive questioning or oracular feedback.
  • Clear demonstration of how domain-specific heuristics reduce the search space, making learning more feasible.
  • Rigor in explaining each step of constructing and refining the automaton.
  • Engaging case study that shows the real-world potential of grammar inference outside of purely linguistic contexts.

Overall Assessment

The paper provides a solid combination of theory and practice in grammatical inference for regular languages. By carefully detailing each step — building initial automata, constraining merges using domain knowledge, and verifying new generalizations through queries — the approach is shown to be systematic and adaptable. Readers interested in interactive or incremental language learning, particularly those with domain-specific constraints, will likely find this discussion instructive and thorough.

Download this paper

Skip to PDF content

The identification and modelling of a percussion ‘language’

James Kippen & Bernard Bel

Computers and the Humanities (1989), 23, 3: 119-214

Abstract

In experimental research into percussion ‘languages', an interactive computer system, the Bol Processor, has been developed by the authors to analyse the performances of expert musicians and generate its own musical items that were assessed for quality and accuracy by the informants. The problem of transferring knowledge from a human expert to a machine in this context is the focus of this paper. A prototypical grammatical inferencer named QAVAID (Question Answer Validated Analytical Inference Device, an acronym also meaning ‘grammar' in Arabic/Urdu) is described and its operation in a real experimental situation is demonstrated. The paper concludes on the nature of the knowledge acquired and the scope and limitations of a cognitive-computational approach to music.

Excerpts of an AI review of this paper (Academia, June 2025)

Summary

This paper explores a novel approach to modeling North Indian tabla drumming as a “percussion language” by applying formal language theory, machine learning, and interactive generative/analytic computer methods. The authors discuss two systems— Bol Processor and QAVAID — that each plays a distinct role in analyzing and generating rhythmic patterns (termed “sentences”) under the guidance of expert informants. They examine how knowledge is incrementally acquired and formalized as a grammar, how alternative segmentations can be evaluated, and how probabilistic modeling may be employed to generate original musical sentences for expert evaluation. The work’s ethnomusicological perspective unites computational formalization with the real-world practice of tabla improvisation and teaching, raising broader questions about the nature of knowledge transfer between human expert, machine learner, and cultural context.

Contribution and Strengths

Interdisciplinary Framework

The paper positions itself at the intersection of musicology, cognitive science, computational linguistics, and ethnography. This breadth underscores the complexity of “music as language” and effectively highlights the idea that music may be formally scrutinized with methods akin to those in computer science.

Formal Language Techniques

By grounding the analysis in the Chomskian hierarchy (regular and context-free grammars) and referencing Gold’s concept of “identification in the limit,” the authors tie their ethnomusicological observations to well-established theoretical underpinnings. These connections help clarify why a systematic, incremental approach to grammar inference is suitable for modeling the improvisational components of North Indian tabla drumming.

Attention to Vocabulary and Segmentation

The discussion on how the system learns segmentation and defines “words” in the drumming lexicon is illuminating. Though segmenting tabla phrases is not analogous to segmenting words in spoken languages, the authors show how incremental analysis can propose, refine, or discard potential lexical boundaries in a principled manner.

Interactive and Incremental Learning

A significant feature is the interactive model: the system generates output strings that are validated or rejected by the human informant, thereby triggering incremental adjustments to the grammar. This mimics student-teacher interactions and demonstrates a strong attempt to reflect authentic learning and teaching processes.

Probabilistic Aspect

Introducing stochasticity in synthesis breaks from purely deterministic methods. It points to a more realistic reflection of the ways in which live performance might involve creative, non-deterministic choices, while maintaining constraints guided by the learned grammar.

Methodological Observations

Data Representation

The authors clearly define the symbol inventory (bols like dha, ge, ti, etc.) and acknowledge the complexity of how these symbols relate to sonic events. By limiting the approach to frequency-based segmentation and grammar inference, the system operating within a “text presentation protocol” remains suitably rigorous.

User–System Dialogue

Illustrations of the QAVAID question–answer mechanism highlight practical aspects of grammar construction. This is valuable for explaining how the system backs up, modifies rules, or infers new chunks based on partial disagreements from the expert and how it tests repeated merges or segmentations for consistency.

Scalability Considerations

The experiments presented involve a limited number of examples. The authors note computational constraints and carefully frame how repeated merges, lexical expansions, and negative examples (machine outputs the user rejects) unfold in realistic time on a microcomputer. This transparency about performance considerations is commendable.

Comparison to Existing Tools

While the authors reference formal language theory, it could be helpful to situate the QAVAID approach more explicitly alongside other grammar-inference systems (or music cognition models) in terms of efficiency and success rates. This might provide additional context about how QAVAID’s tight-fit methodology differs from existing machine-learning strategies in music.

Suggestions for Future Work

Integration of Connectionist Approaches

A deeper investigation into how sub-symbolic learning algorithms (e.g., neural networks) might coexist or complement a symbolic grammar-inference approach could shed light on whether deeper hierarchical or pattern-based musical structures can be discovered automatically.

Temporal and Metric Awareness

Incorporating real-time constraints, including an explicit model of cycle boundaries and tempo variations, might enable QAVAID or a successor system to handle performances that deviate subtly from rigorously measured durations.

Generative Evaluation

Extending the system to produce longer performance sequences and evaluating how coherent or context-appropriate they sound in extended improvisation might reveal new facets of pattern synergy that short examples do not expose.

Cross-Cultural Applicability

The strategies deployed here for tabla might prove adaptable to other deeply mnemonic or improvisatory musical traditions (e.g., West African drumming, Middle Eastern percussion). Investigating how the model generalizes across cultures could underscore the method’s versatility and reveal new limitations.

Conclusion

By merging formal language theory with ethnomusicological fieldwork and machine learning, the authors propose a powerful model for capturing core aspects of tabla improvisation. The framework encourages close human–computer collaboration through dynamic questioning and incremental grammar building. This approach not only advances a cognitive-computational perspective on music but also opens a pathway for further inquiries into cross-cultural applications, time-sensitive performance modeling, and creative composition within implicit musical grammars.

Download this paper

Skip to PDF content