Inférence de langages réguliers

Bernard Bel

Journées Françaises de l'Apprentissage (1990), Lannion, France : 5-27

Résumé

Cet exposé présente une méth­ode générale d'acquisition de con­nais­sances dans un domaine for­mal­isé à l'aide d'automates finis (lan­gages réguliers). A par­tir d'un échan­til­lon d'exemples le sys­tème con­stru­it un auto­mate "presque min­i­mal" qui n'est pas néces­saire­ment déter­min­iste. Cette con­struc­tion peut être con­trainte par des con­nais­sances sur la seg­men­ta­tion des exem­ples que le sys­tème peut acquérir en ques­tion­nant l'informateur. Dans une deux­ième phase, le sys­tème généralise l'automate à par­tir de pro­priétés con­nues du lan­gage ou(et) à par­tir d'hypothèses validées à l'aide d'oracles. Une liste non exhaus­tive d'heuristiques générales est pro­posée. La démon­stra­tion s'appuie sur un cas réel de mod­éli­sa­tion d'apprentissage de sché­mas d'improvisation musicale.

Excerpts of an AI review of this paper (June 2025)

Summary of the Work

The paper explores a method­ol­o­gy for infer­ring reg­u­lar lan­guages (lan­gages réguliers) from exam­ples. It dis­cuss­es both the­o­ret­i­cal under­pin­nings — par­tic­u­lar­ly Gold’s the­o­rem and sub­se­quent refine­ments — along with an incre­men­tal approach to build­ing “almost min­i­mal” automa­ta. The author address­es cen­tral notions such as the canon­i­cal accep­tor, the prefix-tree accep­tor, and the impor­tance of main­tain­ing cor­rect­ness when mov­ing from a set of exam­ples to a gen­er­al­ized automa­ton. The study also show­cas­es an appli­ca­tion to musi­cal impro­vi­sa­tion schema­ta, illus­trat­ing how domain-specific rules can con­strain and refine gram­mat­i­cal inference.

Main Contributions

Incremental Learning Framework

The paper offers a struc­tured, step-by-step method: start­ing with an accep­tor built from exam­ples alone (the prefix-tree or “arbores­cent” automa­ton), merg­ing states to reduce com­plex­i­ty, and apply­ing con­straints so as not to over­gen­er­al­ize. This incre­men­tal per­spec­tive is well-motivated by real-world use cas­es where exam­ples become avail­able over time.

Formal Properties and Correctness

The author places strong empha­sis on ensur­ing the cor­rect­ness of gen­er­al­iza­tion. By ref­er­enc­ing Gold’s the­o­rem, k-reversible lan­guages, and the con­cept of learn­ing from pos­i­tive data only, the paper grounds its approach in estab­lished the­o­ret­i­cal results.

Use of Domain Constraints

A sig­nif­i­cant por­tion of the paper deals with how addi­tion­al domain knowl­edge (e.g., known seg­men­ta­tion rules for a music nota­tion sys­tem) can guide the automaton’s con­struc­tion. This intro­duces prac­ti­cal heuris­tics that make gram­mat­i­cal infer­ence more tractable in real-world scenarios.

Detailed Mathematical Rigor

The proofs, def­i­n­i­tions, and lem­mas (for instance, those on heads and tails, or min­i­mal canon­i­cal accep­tors) are stat­ed thor­ough­ly. This rig­or is help­ful both for ensur­ing cor­rect­ness and for con­vey­ing the method’s the­o­ret­i­cal reliability.

Illustrative Examples

The appli­ca­tion to musi­cal impro­vi­sa­tion, specif­i­cal­ly to Indian drum­ming tra­di­tions, serves as a con­crete illus­tra­tion. It nice­ly demon­strates the inter­play between abstract for­mal­ism (like merg­ing states) and domain-specific knowl­edge (like rhyth­mic “words” or ono­matopoe­ic syllables).

Clarity and Structure

  • The paper is log­i­cal­ly well-structured: it begins with a the­o­ret­i­cal foun­da­tion, then intro­duces incre­men­tal meth­ods for automa­ton con­struc­tion, cul­mi­nat­ing in a dis­cus­sion of domain con­straints and examples.
  • The tex­tu­al orga­ni­za­tion, with numer­ous def­i­n­i­tions in the appen­dix, is ben­e­fi­cial for ref­er­ence but can require mul­ti­ple cross-referencing steps while read­ing. A con­cise reminder of key def­i­n­i­tions in the main text can fur­ther aid comprehension.

Strengths

  • Thorough ground­ing in estab­lished the­o­ry (Gold’s the­o­rem, reversible languages).
  • A prac­ti­cal approach that bal­ances sys­tem­at­ic infer­ence with inter­ac­tive ques­tion­ing or orac­u­lar feedback.
  • Clear demon­stra­tion of how domain-specific heuris­tics reduce the search space, mak­ing learn­ing more feasible.
  • Rigor in explain­ing each step of con­struct­ing and refin­ing the automaton.
  • Engaging case study that shows the real-world poten­tial of gram­mar infer­ence out­side of pure­ly lin­guis­tic contexts.

Overall Assessment

The paper pro­vides a sol­id com­bi­na­tion of the­o­ry and prac­tice in gram­mat­i­cal infer­ence for reg­u­lar lan­guages. By care­ful­ly detail­ing each step — build­ing ini­tial automa­ta, con­strain­ing merges using domain knowl­edge, and ver­i­fy­ing new gen­er­al­iza­tions through queries — the approach is shown to be sys­tem­at­ic and adapt­able. Readers inter­est­ed in inter­ac­tive or incre­men­tal lan­guage learn­ing, par­tic­u­lar­ly those with domain-specific con­straints, will like­ly find this dis­cus­sion instruc­tive and thorough.

Download this paper

The identification and modelling of a percussion ‘language’

James Kippen & Bernard Bel

Computers and the Humanities (1989), 23, 3: 119-214

Abstract

In exper­i­men­tal research into per­cus­sion ‘lan­guages', an inter­ac­tive com­put­er sys­tem, the Bol Processor, has been devel­oped by the authors to analyse the per­for­mances of expert musi­cians and gen­er­ate its own musi­cal items that were assessed for qual­i­ty and accu­ra­cy by the infor­mants. The prob­lem of trans­fer­ring knowl­edge from a human expert to a machine in this con­text is the focus of this paper. A pro­to­typ­i­cal gram­mat­i­cal infer­encer named QAVAID (Question Answer Validated Analytical Inference Device, an acronym also mean­ing ‘gram­mar' in Arabic/Urdu) is described and its oper­a­tion in a real exper­i­men­tal sit­u­a­tion is demon­strat­ed. The paper con­cludes on the nature of the knowl­edge acquired and the scope and lim­i­ta­tions of a cognitive-computational approach to music.

Excerpts of an AI review of this paper (Academia, June 2025)

Summary

This paper explores a nov­el approach to mod­el­ing North Indian tabla drum­ming as a “per­cus­sion lan­guage” by apply­ing for­mal lan­guage the­o­ry, machine learn­ing, and inter­ac­tive generative/analytic com­put­er meth­ods. The authors dis­cuss two sys­tems— Bol Processor and QAVAID — that each plays a dis­tinct role in ana­lyz­ing and gen­er­at­ing rhyth­mic pat­terns (termed “sen­tences”) under the guid­ance of expert infor­mants. They exam­ine how knowl­edge is incre­men­tal­ly acquired and for­mal­ized as a gram­mar, how alter­na­tive seg­men­ta­tions can be eval­u­at­ed, and how prob­a­bilis­tic mod­el­ing may be employed to gen­er­ate orig­i­nal musi­cal sen­tences for expert eval­u­a­tion. The work’s eth­no­mu­si­co­log­i­cal per­spec­tive unites com­pu­ta­tion­al for­mal­iza­tion with the real-world prac­tice of tabla impro­vi­sa­tion and teach­ing, rais­ing broad­er ques­tions about the nature of knowl­edge trans­fer between human expert, machine learn­er, and cul­tur­al context.

Contribution and Strengths

Interdisciplinary Framework

The paper posi­tions itself at the inter­sec­tion of musi­col­o­gy, cog­ni­tive sci­ence, com­pu­ta­tion­al lin­guis­tics, and ethnog­ra­phy. This breadth under­scores the com­plex­i­ty of “music as lan­guage” and effec­tive­ly high­lights the idea that music may be for­mal­ly scru­ti­nized with meth­ods akin to those in com­put­er science.

Formal Language Techniques

By ground­ing the analy­sis in the Chomskian hier­ar­chy (reg­u­lar and context-free gram­mars) and ref­er­enc­ing Gold’s con­cept of “iden­ti­fi­ca­tion in the lim­it,” the authors tie their eth­no­mu­si­co­log­i­cal obser­va­tions to well-established the­o­ret­i­cal under­pin­nings. These con­nec­tions help clar­i­fy why a sys­tem­at­ic, incre­men­tal approach to gram­mar infer­ence is suit­able for mod­el­ing the impro­vi­sa­tion­al com­po­nents of North Indian tabla drumming.

Attention to Vocabulary and Segmentation

The dis­cus­sion on how the sys­tem learns seg­men­ta­tion and defines “words” in the drum­ming lex­i­con is illu­mi­nat­ing. Though seg­ment­ing tabla phras­es is not anal­o­gous to seg­ment­ing words in spo­ken lan­guages, the authors show how incre­men­tal analy­sis can pro­pose, refine, or dis­card poten­tial lex­i­cal bound­aries in a prin­ci­pled manner.

Interactive and Incremental Learning

A sig­nif­i­cant fea­ture is the inter­ac­tive mod­el: the sys­tem gen­er­ates out­put strings that are val­i­dat­ed or reject­ed by the human infor­mant, there­by trig­ger­ing incre­men­tal adjust­ments to the gram­mar. This mim­ics student-teacher inter­ac­tions and demon­strates a strong attempt to reflect authen­tic learn­ing and teach­ing processes.

Probabilistic Aspect

Introducing sto­chas­tic­i­ty in syn­the­sis breaks from pure­ly deter­min­is­tic meth­ods. It points to a more real­is­tic reflec­tion of the ways in which live per­for­mance might involve cre­ative, non-deterministic choic­es, while main­tain­ing con­straints guid­ed by the learned grammar.

Methodological Observations

Data Representation

The authors clear­ly define the sym­bol inven­to­ry (bols like dha, ge, ti, etc.) and acknowl­edge the com­plex­i­ty of how these sym­bols relate to son­ic events. By lim­it­ing the approach to frequency-based seg­men­ta­tion and gram­mar infer­ence, the sys­tem oper­at­ing with­in a “text pre­sen­ta­tion pro­to­col” remains suit­ably rigorous.

User–System Dialogue

Illustrations of the QAVAID question–answer mech­a­nism high­light prac­ti­cal aspects of gram­mar con­struc­tion. This is valu­able for explain­ing how the sys­tem backs up, mod­i­fies rules, or infers new chunks based on par­tial dis­agree­ments from the expert and how it tests repeat­ed merges or seg­men­ta­tions for consistency.

Scalability Considerations

The exper­i­ments pre­sent­ed involve a lim­it­ed num­ber of exam­ples. The authors note com­pu­ta­tion­al con­straints and care­ful­ly frame how repeat­ed merges, lex­i­cal expan­sions, and neg­a­tive exam­ples (machine out­puts the user rejects) unfold in real­is­tic time on a micro­com­put­er. This trans­paren­cy about per­for­mance con­sid­er­a­tions is commendable.

Comparison to Existing Tools

While the authors ref­er­ence for­mal lan­guage the­o­ry, it could be help­ful to sit­u­ate the QAVAID approach more explic­it­ly along­side oth­er grammar-inference sys­tems (or music cog­ni­tion mod­els) in terms of effi­cien­cy and suc­cess rates. This might pro­vide addi­tion­al con­text about how QAVAID’s tight-fit method­ol­o­gy dif­fers from exist­ing machine-learning strate­gies in music.

Suggestions for Future Work

Integration of Connectionist Approaches

A deep­er inves­ti­ga­tion into how sub-symbolic learn­ing algo­rithms (e.g., neur­al net­works) might coex­ist or com­ple­ment a sym­bol­ic grammar-inference approach could shed light on whether deep­er hier­ar­chi­cal or pattern-based musi­cal struc­tures can be dis­cov­ered automatically.

Temporal and Metric Awareness

Incorporating real-time con­straints, includ­ing an explic­it mod­el of cycle bound­aries and tem­po vari­a­tions, might enable QAVAID or a suc­ces­sor sys­tem to han­dle per­for­mances that devi­ate sub­tly from rig­or­ous­ly mea­sured durations.

Generative Evaluation

Extending the sys­tem to pro­duce longer per­for­mance sequences and eval­u­at­ing how coher­ent or context-appropriate they sound in extend­ed impro­vi­sa­tion might reveal new facets of pat­tern syn­er­gy that short exam­ples do not expose.

Cross-Cultural Applicability

The strate­gies deployed here for tabla might prove adapt­able to oth­er deeply mnemon­ic or impro­visato­ry musi­cal tra­di­tions (e.g., West African drum­ming, Middle Eastern per­cus­sion). Investigating how the mod­el gen­er­al­izes across cul­tures could under­score the method’s ver­sa­til­i­ty and reveal new limitations.

Conclusion

By merg­ing for­mal lan­guage the­o­ry with eth­no­mu­si­co­log­i­cal field­work and machine learn­ing, the authors pro­pose a pow­er­ful mod­el for cap­tur­ing core aspects of tabla impro­vi­sa­tion. The frame­work encour­ages close human–computer col­lab­o­ra­tion through dynam­ic ques­tion­ing and incre­men­tal gram­mar build­ing. This approach not only advances a cognitive-computational per­spec­tive on music but also opens a path­way for fur­ther inquiries into cross-cultural appli­ca­tions, time-sensitive per­for­mance mod­el­ing, and cre­ative com­po­si­tion with­in implic­it musi­cal grammars.

Download this paper