at_grammar_induction.git
3 years agoStassa Normal Form- ie, bollox :P stassa_normal_form
spatsant [Tue, 6 Oct 2015 17:31:44 +0000]
Stassa Normal Form- ie, bollox :P
* Just an experiment to see how I can add more bells and whistles to
  derived_productions/5 without messing up the main loop. It's not _too_
  easy because you still go through the same steps as for the main loop.
  But yeah, it's a bit configurable. If I want to experiment with other
  normal forms though I really need to refactor that main loop thoroughly
  to make it as abstract and configurable as I can. Oh dear.

3 years agoAdded guard against badly spelled production_composition/1 options.
spatsant [Tue, 6 Oct 2015 17:12:58 +0000]
Added guard against badly spelled production_composition/1 options.
* This one in particular because it tends to make things go infinite if
  it's not recognised. To be fair, I should be doing this more formally
  for all options, with some kind of type checking.

3 years agoFixed precision test for "strings"; did some housekeeping.
spatsant [Tue, 6 Oct 2015 16:51:30 +0000]
Fixed precision test for "strings"; did some housekeeping.

3 years agoUsed examples_corpus/1 where I could; still more places to do.
spatsant [Tue, 6 Oct 2015 16:45:54 +0000]
Used examples_corpus/1 where I could; still more places to do.

3 years agoMoved examples_count/1 to utilities; lotsa WIP in evaluation format.
stassa [Tue, 6 Oct 2015 08:39:27 +0000]
Moved examples_count/1 to utilities; lotsa WIP in evaluation format.
* There's two WIP bits:
* One, I need to use examples_corpus/1 and examples_count/1 where I had
  findall/3 and length/1 calls before (boo) (well don't boo- I still use
  those but I wrapped them in examples_corpus/1 and examples_count/1,
  that's all).
* Two precision_test/3 for test protocol "strings" goes bananas. Gotta
  figure out why and fix.

3 years agoMoved examples_corpus/1 to utilities (because it's generally needed).
stassa [Tue, 6 Oct 2015 08:16:23 +0000]
Moved examples_corpus/1 to utilities (because it's generally needed).
* Note that I just noticed two things:
* One, that utilities has some non-ASCII characters, specifically (tm)
  (which I thought was ascii but isn't) so its first character is the
  UTF-8 BOM. It showed up in the diff just now and freaked me out but it's
  perfectly safe (as long as you follow instructions carefully).
* Two, some corpus files (eg, all-destroy one-liners) have some duplicate
  lines in them- product of an error in tokenisation which I'll need to
  fix. Until then though the safest thing to do when gathering examples
  (if you want to avoid duplicates that is) is to get them using setof/3
  rather than findall/3. I'll change that in a bit btw.

3 years agogrammar_evaluation/0 now prints to configured stream.
stassa [Tue, 6 Oct 2015 07:31:07 +0000]
grammar_evaluation/0 now prints to configured stream.

3 years agoWIP on grammar evaluation output formatting.
stassa [Tue, 6 Oct 2015 07:14:24 +0000]
WIP on grammar evaluation output formatting.
* Changed names of testing_protocol/1 options to something more sensible.
  Will need to update notes.

3 years agoWIP on grammar evaluation output formatting.
stassa [Tue, 6 Oct 2015 07:00:19 +0000]
WIP on grammar evaluation output formatting.
* Changed name of configuration option output_stream/2 to
  output_file_name/2 because that's way more accurate and it doesn't hurt
  my brain to translate it anymore.

3 years agoWIP; working on grammar evaluation output formatting.
stassa [Tue, 6 Oct 2015 06:53:15 +0000]
WIP; working on grammar evaluation output formatting.

3 years agoWorking on grammar_evaluation/4 output.
spatsant [Mon, 5 Oct 2015 16:33:40 +0000]
Working on grammar_evaluation/4 output.

3 years agoReduced more boilerplate around precision & recall tests.
spatsant [Mon, 5 Oct 2015 11:52:28 +0000]
Reduced more boilerplate around precision & recall tests.

3 years agoReduced boilerplate in grammar_evaluation/[0,1] (1 is now 3)
spatsant [Mon, 5 Oct 2015 11:40:42 +0000]
Reduced boilerplate in grammar_evaluation/[0,1] (1 is now 3)

3 years agoUnified grammar evaluation formatting options to one predicate.
spatsant [Mon, 5 Oct 2015 11:32:27 +0000]
Unified grammar evaluation formatting options to one predicate.

3 years agoHousekeeping.
stassa [Sun, 4 Oct 2015 20:30:19 +0000]
Housekeeping.
* Removed all the backed-up (read: packrated) versions of
  derived_productions/5. Might be needed again. Remember.

3 years agoActually fixed issues with non-atomic rule names.
stassa [Sun, 4 Oct 2015 20:23:59 +0000]
Actually fixed issues with non-atomic rule names.
* Basically, just atomise them. ZZZZZZAP!

3 years agoRickety lexicalisation; getting there; WIP but works.
stassa [Sun, 4 Oct 2015 17:21:37 +0000]
Rickety lexicalisation; getting there; WIP but works.
* Works in that it does produce a lexicalised grammar and my metrics
  show good recall _and_ precision- though rather crap generalisation
  (ungrammatical generalisation to be precise). I think this is because of
  short-distance dependencies being learned only. I'll need to see what I
  can do to lengthen them a bit.
* The problem is that now, valid_nonterminal/2 causes some invalid ones
  actually. I need to fix that otherwise we get errors and exit before we
  can actually learn the grammar.

3 years agoYet more fiddling with ancestry stack; stable non-lex; WIP.
stassa [Sun, 4 Oct 2015 13:34:23 +0000]
Yet more fiddling with ancestry stack; stable non-lex; WIP.

3 years agoStill fiddling with the production ancestry stack for lexicalisation.
stassa [Sun, 4 Oct 2015 12:30:28 +0000]
Still fiddling with the production ancestry stack for lexicalisation.
* Stable non-lexicalisation version but WIP and going infinite on
  lexicalisation.

3 years agoStabilised non-lexicalised output; rickety hacked lexicalised output.
stassa [Sun, 4 Oct 2015 02:20:42 +0000]
Stabilised non-lexicalised output; rickety hacked lexicalised output.
* Lexicalised output goes infinite when it fails to lexicalise because of
  duplicate branch-head productions. These are actually in the
  non-lexicalised output also, but I'm sorting them (to avoid
  discontiguous rules and also to remove them duplicates). Which I
  shouldn't because I shouldn't be producing duplicates to begin with.
  Il' ya une erreur dans mes calculs.

3 years agoIntermediary WIP state; working on lexicalisation again.
stassa [Sun, 4 Oct 2015 01:54:09 +0000]
Intermediary WIP state; working on lexicalisation again.

3 years agoWent back to non-lexicalised productions again.
stassa [Sun, 4 Oct 2015 01:14:12 +0000]
Went back to non-lexicalised productions again.
* On the one hand because I think I have a better way to do the
  lexicalisation but also because I didn't have a stable version of this
  (non-lexicalised but with a nice stack building up).

3 years agoWIP and rickety lexicalisation.
stassa [Sat, 3 Oct 2015 19:35:22 +0000]
WIP and rickety lexicalisation.
* Also removed duplicate example from pot_puri.
* Still lots of confusion around what gets added where (stack, Ps) in the
  main loop. Will need careful considerations.

3 years agoFixed bug with leaves and branches with common head.
stassa [Sat, 3 Oct 2015 12:37:49 +0000]
Fixed bug with leaves and branches with common head.
* In short it's safer to look for leaves where there is an example with a
  single token even when there are more examples in the same
  branch-corpus, where before I would identify a leaf only when there was
  a single example in the corpus. That certainly makes sense now that I'm
  splitting off node-corpora for each branch.

3 years agoFleshed out sanitising of nonterminal identifiers.
stassa [Sat, 3 Oct 2015 11:34:42 +0000]
Fleshed out sanitising of nonterminal identifiers.
* Also added new option for reusing previously learned productions. The
  goal is to both avoid the usual issue with learning different
  productions that cover the same tokens and also to allow background
  knowledge into the induction process, in the form of a given grammar.
* Also also added mtg_pot_puri.pl, a corpus of various AT examples
  hand-picked for diversity. The idea is to test both how THELEMA does at
  learning productions for different types of ability and also to test how
  it fares against corpora that don't have the kind of regularities I've
  assumed so far (where I was training with corpora of a single type of
  ability etc).

3 years agoExample file with various abilities.
spatsant [Sat, 3 Oct 2015 08:06:02 +0000]
Example file with various abilities.

3 years agoReduced some boilerplate. Just a bit.
spatsant [Fri, 2 Oct 2015 17:00:11 +0000]
Reduced some boilerplate. Just a bit.

3 years agoAdded predicate and options to rename built-in a-likes.
spatsant [Fri, 2 Oct 2015 16:30:25 +0000]
Added predicate and options to rename built-in a-likes.
* Productions like "is --> [is] ... " cause trouble because they compile
  to built-ins when the DCG compiler adds two arguments to their head (and
  body). The new predicate deals with that, but it pollutes my head-space
  (opa re) so I added an option to control it or turn it off. Basically, I
  think it's probably not needed when we do lexicalisation. Probably. So
  I don't want to keep calling it when lexicalising.

3 years agoHousekeeping.
spatsant [Fri, 2 Oct 2015 15:29:53 +0000]
Housekeeping.

3 years agoGot GNF working; and by that I mean: no danglers, no orphans, nothing.
spatsant [Fri, 2 Oct 2015 14:36:41 +0000]
Got GNF working; and by that I mean: no danglers, no orphans, nothing.
* And no broken branches either. Basically the big thing was that I was
  doing two things at once that complicated er, things: splitting the
  corpus along branch-heads and deriving the rules for those branch-heads.
  The upshot of this was that because I immediately derived rules for only
  the first branch-head at a given depth, the remaining branch-heads were
  processed differently. OK, that's a very high-level explanation but
  that's the gist of it.
* Anyway it now works, like I say: no dangling refs, no orphans, no broken
  branches, no nothing- although of course there's still recursion (though
  not left recursion).

3 years agoWIP; going back to basics with Greibach Normal Form.
spatsant [Fri, 2 Oct 2015 10:51:07 +0000]
WIP; going back to basics with Greibach Normal Form.
* This is still part of going back to basics and I'm now trying to get GNF
  output without any of the problems that plagued each previous version 3
  er version (as in minor ones). Added new production_augmentation/1
  option for GNF and assorted augmented_node_head_production/3 clauses. At
  this point I'm getting some broken branches but no orphans or dangling
  pointers, at least not from the handsim dataset.

3 years agoWIP- trying to produce Greibach normal form only.
spatsant [Thu, 1 Oct 2015 17:12:40 +0000]
WIP- trying to produce Greibach normal form only.
* And then: lexicalisation etc.

3 years agoRefactored print_grammar_file/5 to print out compressed stuff correctly.
spatsant [Wed, 30 Sep 2015 18:07:47 +0000]
Refactored print_grammar_file/5 to print out compressed stuff correctly.
* Ish. The problem is that we need to decide what to do with the start
  symbol every time we see it and that complicates things. See notes from
  today.
* This is still WIP and borken. It won't even work correctly I think
  because there's nonterminals getting trapped inside leaves. See notes.

3 years agoNice little kludge to output compressed corpora to corpus/ directory.
spatsant [Wed, 30 Sep 2015 17:26:57 +0000]
Nice little kludge to output compressed corpora to corpus/ directory.
* Which needs renaming to "corpora" btw.

3 years agoHousekeeping mostly; plus configs. WIP and half-borken.
spatsant [Wed, 30 Sep 2015 17:12:00 +0000]
Housekeeping mostly; plus configs. WIP and half-borken.
* Only functional change was to remove an undocumented cut from the
  compression_grammar//0 term; no idea what was doing there but it cut
  useful branches.

3 years agoReplacec "chunks" option with "tags" (in grammar_printing).
spatsant [Wed, 30 Sep 2015 15:34:51 +0000]
Replacec "chunks" option with "tags" (in grammar_printing).
* Because I guess what that does is (try to) print POS tags.

3 years agoPrinting compressed corpus; well, ish.
spatsant [Wed, 30 Sep 2015 15:16:58 +0000]
Printing compressed corpus; well, ish.
* On the one hand, the longer examples go infinite, I think because of
  what I thought was left-recursion (and is instead just plain-old
  recursion due to clause selection). On the other hand, even with the
  handsim dataset, where we don't go infinite, the compressed corpus looks
  like this:
  example_string([destroy]).
  example_string([destroy]).
  example_string([destroy]).

* And that's like I feared- the grammar is hierarchical so everything can
  be reduced to its start symbol. Anyway keeping a version to see if I can
  still do something here.

3 years agoMore boilerplate reduction. Now with less boilerplate.
spatsant [Mon, 28 Sep 2015 17:45:04 +0000]
More boilerplate reduction. Now with less boilerplate.

3 years agoAdded configuration option for (compressed) corpus output stream.
spatsant [Mon, 28 Sep 2015 17:37:30 +0000]
Added configuration option for (compressed) corpus output stream.
* This added a parameter to output_stream/1 (now /2). If something breaks,
  you know where to look.

3 years agoMore comments.
spatsant [Mon, 28 Sep 2015 17:28:23 +0000]
More comments.

3 years agoSome comments on printing compression grammars.
spatsant [Mon, 28 Sep 2015 17:25:39 +0000]
Some comments on printing compression grammars.

3 years agoNow printing compression grammar. With, um, compression.
spatsant [Mon, 28 Sep 2015 17:22:50 +0000]
Now printing compression grammar. With, um, compression.

3 years agoReduced boilerplate. We won't go down without a fight!
spatsant [Mon, 28 Sep 2015 17:01:01 +0000]
Reduced boilerplate. We won't go down without a fight!

3 years agoAdded printing for chunks grammar; towards printing a comrpession one.
spatsant [Mon, 28 Sep 2015 16:50:46 +0000]
Added printing for chunks grammar; towards printing a comrpession one.

3 years agoSimplified print_grammar/0 and print_grammar_file/4.
spatsant [Mon, 28 Sep 2015 16:04:48 +0000]
Simplified print_grammar/0 and print_grammar_file/4.

3 years agoStill working on printing a compression grammar.
spatsant [Mon, 28 Sep 2015 15:59:50 +0000]
Still working on printing a compression grammar.
* Added some parameters to printing predicates; also some comments etc.

3 years agoAdded option to print different types of grammar.
spatsant [Mon, 28 Sep 2015 15:49:08 +0000]
Added option to print different types of grammar.
* This is in prepration to printing a compression grammar.

3 years agoAdded lexicalisation strategy to output file names.
spatsant [Mon, 28 Sep 2015 15:18:30 +0000]
Added lexicalisation strategy to output file names.

3 years agoAdded more detailed precision & recall metric.
stassa [Tue, 22 Sep 2015 20:25:37 +0000]
Added more detailed precision & recall metric.

3 years agoRefactored bare-bones grammar evaluation; some housekeeping etc.
stassa [Tue, 22 Sep 2015 19:08:11 +0000]
Refactored bare-bones grammar evaluation; some housekeeping etc.
* Preparing grammar evaluation for counting actual parsed/generated
  examples.

3 years agoThink I fixed the bare-bones grammar evaluation reporting.
stassa [Tue, 22 Sep 2015 16:03:12 +0000]
Think I fixed the bare-bones grammar evaluation reporting.
* As in fixed it to report left-recursion (ish).

3 years agoSmall natural language experiment.
stassa [Tue, 22 Sep 2015 16:02:15 +0000]
Small natural language experiment.

3 years agoActually commiting grammar evaluation module. Well done.
stassa [Mon, 21 Sep 2015 20:13:19 +0000]
Actually commiting grammar evaluation module. Well done.

3 years agoAdded <module> comments to a couple of modules.
stassa [Mon, 21 Sep 2015 15:13:35 +0000]
Added <module> comments to a couple of modules.
* Also to production_induction and grammar_evaluation commited in the last
  commit.

3 years agoAdded bare-bones precision & recall grammar evaluation.
stassa [Mon, 21 Sep 2015 15:12:33 +0000]
Added bare-bones precision & recall grammar evaluation.
* Still a bit of an issue with reporting left-recursion, because I'm an
  idiot mostly.

3 years agoGrammar evaluation module (empty).
stassa [Sun, 20 Sep 2015 22:09:06 +0000]
Grammar evaluation module (empty).

3 years agoEnsured everything runs OK after reverting.
stassa [Sun, 20 Sep 2015 14:10:34 +0000]
Ensured everything runs OK after reverting.
* Minor ish with option in augmented_node_head_production/3 that was not
  supported yet at the commit I reverted to; also added
  production_augmentation/1 option for future use and replaced the
  previous option in augmented_node_head_production/3 with the new one.

3 years agoRevert "Working on lexicalisation; WIP and borken."
stassa [Sun, 20 Sep 2015 11:01:37 +0000]
Revert "Working on lexicalisation; WIP and borken."

This reverts commit a535b14b1a1c675f8bc8976746e672b7a3fe8515.

* This actuall reverts all commits from
  ac7dda19f4163ad06061e45a8f401465e4a016e3 to
  a535b14b1a1c675f8bc8976746e672b7a3fe8515. The auto-gen message says that
  a535b is being reverted because there was a merge conflict at that
  point, between 1535b and its parent,
  da186aa833cbe5a7b93d021ab1b07c8dbb4d1cf2. I'm guessing this commit is
  just a record of the conflict resolution merge and the revert will
  continue after that, so there will probably be another commit once this
  is done. Which is to say, right about no-

3 years agoHousekeeping, mostly.
stassa [Sat, 19 Sep 2015 19:25:20 +0000]
Housekeeping, mostly.
* Also removed two lines that are probably vestigial, checking the
  lexicalisation_strategy/1 value in the body of clause 3 of
  derived_productions/5. I think I left this behind after briefly
  considering adding more derived_productions/5 clauses to branch on
  strategy. Truth be told: the way I've done it now that predicate is a
  bit of a mess.

3 years agoAdded set_spy_points/0 to duh.
stassa [Sat, 19 Sep 2015 19:23:01 +0000]
Added set_spy_points/0 to duh.
* Usually need to set spy points on you_are_here/1. I need some logging...

3 years agoAlso done right branches with node-head production stack.
stassa [Sat, 19 Sep 2015 12:07:48 +0000]
Also done right branches with node-head production stack.
* Done and tested also, with destroy-short dataset. Should test with
  shorter one also.

3 years agoStems and leaves with node-head production stack done right.
stassa [Sat, 19 Sep 2015 11:42:16 +0000]
Stems and leaves with node-head production stack done right.

3 years agoYeah, like this one for example.
stassa [Sat, 19 Sep 2015 10:43:22 +0000]
Yeah, like this one for example.
* See previous commit- I said if I found a better way to keep the original
  functionality of lexicalisation_strategy/1 = none, I would implement it.
  Added a clause to parameterised_node_head_production/3 insted, which
  makes more sense. This just checks the lexicalisation strategy value and
  if it's "none" it simply binds the input production to the output. We'll
  sort it later. Literally.

3 years agoSaving WIP version that works with lexicalisation strategy = none.
stassa [Sat, 19 Sep 2015 10:37:05 +0000]
Saving WIP version that works with lexicalisation strategy = none.
* This was borken yesterday as I was working on lexicalisation. Added a
  new clause to fire only when lexicalisation_strategy/1 option is set to
  "none". Kind of sucks how clauses are beginning to proliferate but if I
  can think of something better, I'll do it.

3 years agoWIP and borken; working on lexicalisation.
spatsant [Fri, 18 Sep 2015 16:03:10 +0000]
WIP and borken; working on lexicalisation.

3 years agoAdded node-production stack argument to derived_productions/5; WIP
spatsant [Fri, 18 Sep 2015 15:22:32 +0000]
Added node-production stack argument to derived_productions/5; WIP
* This is still on the road to lexicalisation. The stack is supposed to
  keep track of the last node-head production for this node that was
  actually augmented, so that when we find out the next token of the
  _current_ node-head production we can look back and update its mommy
  with a reference to the correct lexical parameter as a ground term (or
  more than one).
* Tested this with the findall-test also.

3 years agoWorking on lexicalisation; WIP and borken.
spatsant [Fri, 18 Sep 2015 14:52:53 +0000]
Working on lexicalisation; WIP and borken.
* But committing a version anyway to keep track of stuff and keep it under
  control. Note that the borken stuff is around lexicalisation- I keep
  testing with the destroy-short dataset bfore each commit to make sure
  the non-lexicalised grammar's performance doesn't degrade... or at least
  that it can still parse the whole corpus.

3 years agoSmall correction to configuration options; housekeeping.
spatsant [Fri, 18 Sep 2015 10:53:08 +0000]
Small correction to configuration options; housekeeping.
* Last commit added the configuration option for augmented productions to
  node-head production creation by mistake; this corrects that and keeps
  the ability to configure node-head production creation, which we might
  want to change in the future.
* Also added new configuration option, production_composition/1 as above.
* Also also added some comments etc.

3 years agoAdded lexicalisation_strategy/1 option; some housekeeping.
spatsant [Fri, 18 Sep 2015 10:20:34 +0000]
Added lexicalisation_strategy/1 option; some housekeeping.
* Also added logic to use the new option in production_induction module
  (and tested that it works with both handsim and destroy-short example
  sets).
* Also also added new derived_productions/4 predicate, as an interface to
  derived_productions/5. Because why not.

3 years agoHousekeeping: comments, soft-cuts, cleanups.
spatsant [Fri, 18 Sep 2015 09:53:54 +0000]
Housekeeping: comments, soft-cuts, cleanups.
* Softcuts added to clause 4 of derived_productions/5 to stop unproductive
  backtracking when the beheaded node-corpus is not []. At that point we
  want to fall-through to the next clause and we may even get some garbage
  results if we backtrack to a different possible result; in fact, there
  shouldn't be any different results.

3 years agoSome experimenting with a compression grammar for v 3. Hint: no.
spatsant [Fri, 18 Sep 2015 09:24:58 +0000]
Some experimenting with a compression grammar for v 3. Hint: no.
* Basically, it doesn't work with v.3 because it already creates
  productions at a higher level than the mere fragments of version 2. I
  could always go back and fix version 2 at some point. Though, you know,
  all that dynamic db madness hurt my head and I'm not too eager to see it
  again too soon.

3 years agoConfig changes; new corpus file; comments etc. disjunctions
stassa [Sat, 12 Sep 2015 21:24:01 +0000]
Config changes; new corpus file; comments etc.

3 years agoStable version; still WIP; looking good, rather.
stassa [Mon, 7 Sep 2015 07:55:15 +0000]
Stable version; still WIP; looking good, rather.
* Still left-recursive and I do get some fragments but generally I like
  it. I'll need some feedback obviously.

3 years agoGoing on with simplified version: makes nice handsim rules.
stassa [Mon, 7 Sep 2015 03:07:02 +0000]
Going on with simplified version: makes nice handsim rules.
* Not sure if it looks as nice with longer examples though. See notes from
  today.

3 years agoChanged checks for atomic/nonatomic augmentation tokens.
stassa [Sat, 5 Sep 2015 14:55:30 +0000]
Changed checks for atomic/nonatomic augmentation tokens.
* Previously used atomic(H) to decide whether a token passed to
  augmented_node_head_production/3 is a terminal or not but that misses
  cases like 'and/or' (as in "target ... and/or target ... "). Changed
  that to use \+ is_list(H). It's just a signal really. Oh btw alg was
  going infinite before it with the short-destroy dataset (because it had
  cases like this and failed forever). It's OK now.

3 years agoSquished bug with beheaded_node_corpus/2 binding empty examples to output.
stassa [Sat, 5 Sep 2015 14:44:56 +0000]
Squished bug with beheaded_node_corpus/2 binding empty examples to output.

3 years agoKeeping WIP version that works ish (only for small handsim corpus)
stassa [Fri, 4 Sep 2015 16:27:13 +0000]
Keeping WIP version that works ish (only for small handsim corpus)
* If you uncomment the longer stem-y example, it fails :/ Need to
  understand why but this is a simple version (including all the commented
  out clauses in derived_productions/5 that I need to work up from).
* Btw, added some stuff in loader module to load v.1 correctly though I
  noticed it gets weird results that mean the stable version is somewhere
  in the past- need to find and restore it probably.

3 years agoTweaks to deal with empty node-heads.
stassa [Thu, 3 Sep 2015 13:00:29 +0000]
Tweaks to deal with empty node-heads.
* Tweaks in beheaded_node_corpus/2 and node_heads/2.

3 years agoAnother WIP version.
stassa [Thu, 3 Sep 2015 12:55:25 +0000]
Another WIP version.
* Still branching incorrectly- the handsim "exile target creature" branch
  (the one with the single example) gets all fragmented unlike what I
  expected.

3 years agoSaving working but WIP version.
stassa [Thu, 3 Sep 2015 12:34:10 +0000]
Saving working but WIP version.
* Working in that a run goes all the way to producing rules, though not
  all the rules I expected.

3 years agoWorking on version 3.0; saving working v.1, v.2 for archiving itch.
stassa [Thu, 3 Sep 2015 11:27:17 +0000]
Working on version 3.0; saving working v.1, v.2 for archiving itch.
* Version 3.0 (unnumbered production_induction.pl source file) is a big
  WIP- currently working on production-composition rules and would like to
  keep a so-far copy handy.

3 years agoNot sure what I did here exactly.
stassa [Mon, 31 Aug 2015 17:56:19 +0000]
Not sure what I did here exactly.
* Seems I tweaked the logic around clauses of branch-composition rules in
  branch_productions/8 to add the branch-head production to the
  productions set in second clause (for leaf productions) and to behead an
  example in the third clause.
* Also the usual config changes and a longer example to see what happens
  with stem nodes; mostly, they make a long string of terminals, which I
  guess is OK.

3 years agoAdded some comments and stuff.
stassa [Tue, 25 Aug 2015 22:53:36 +0000]
Added some comments and stuff.

3 years agoSquished a bug; changed configs.
stassa [Tue, 25 Aug 2015 19:45:47 +0000]
Squished a bug; changed configs.
* Bug was around treatment of examples with a single token left. Note: not
  leaf nodes, which are corpora with a single single-token example left.
  Anyway, fixed by adding relevan clause to beheaded_example/2.

3 years agoMissed a spot last commit.
stassa [Tue, 25 Aug 2015 18:30:33 +0000]
Missed a spot last commit.

3 years agoAdded proper configs; also, printing to listener, file.
stassa [Tue, 25 Aug 2015 18:28:00 +0000]
Added proper configs; also, printing to listener, file.
* Language and example files are copy/pasta'ed from production_learning
  directory; I'll need to move them all to directories under the project
  root directory. Until then, leave other examples and languages in this
  directory uncommitted so as to minimise duplication eh.

3 years agoUnconfused branch head and body/leaf productions.
stassa [Mon, 24 Aug 2015 09:37:59 +0000]
Unconfused branch head and body/leaf productions.
* Will need some more logic to figure out at which point on a branch we
  are and create/augment rules accordingly.

3 years agoTrying to unsnaffu non utf-8 chars in source file.
stassa [Mon, 24 Aug 2015 09:36:13 +0000]
Trying to unsnaffu non utf-8 chars in source file.

3 years agoContinued with disjunctions experiments; actually, went off to v 3.0 now.
stassa [Mon, 24 Aug 2015 09:29:14 +0000]
Continued with disjunctions experiments; actually, went off to v 3.0 now.

3 years agoGot a firts draft of v 2.0 working; weird bug with lopsided branch.
stassa [Sun, 23 Aug 2015 20:24:11 +0000]
Got a firts draft of v 2.0 working; weird bug with lopsided branch.
* Well, not a bug as such- there's a node that should be expanding to a
  nonterminal but stays as a terminal. Uh oh.

3 years agoRenamed experiments directory to Experiments; duh.
stassa [Thu, 13 Aug 2015 21:41:33 +0000]
Renamed experiments directory to Experiments; duh.
* Also added experiment with disjunction augmentations.
* Plus some config changes.

3 years agoForgot this- it's a grammar made up of compressed examples.
stassa [Thu, 13 Aug 2015 19:28:49 +0000]
Forgot this- it's a grammar made up of compressed examples.
* Though it's a one-to-one mapping between an example and a rule it's
  actually useable. Just a bit meh.

3 years agoFixed couple of bugs; added test files and ran experiments.
stassa [Thu, 13 Aug 2015 19:21:47 +0000]
Fixed couple of bugs; added test files and ran experiments.
* Please refer to your notes for details on experiments. And don't ask me
  why.
* Why not?
* Don't ask me why not either.
* Fixed bug in given_productions/1; it was not actually finding given
  productions because it expected them to be asserted by
  assert_given_productions/0, which expected to get them from
  given_productions/1. Ha ha. No, not funny.
* After fixing this, the compression grammar started failing because
  nonterminal references appeared in the right-hand side of nonterminal//1
  clauses (inside the compression grammar file) as ordinary nonterminal
  references. They need to be wrapped in nonterminal//1 terms because they
  are not declared as DCG clauses in the compression grammar source file.
  Added compression_nonterminal/2 to deal with that. Might take a bit of
  renaming.

3 years agoAdded some debugging to scoring rules.
spatsant [Wed, 12 Aug 2015 18:13:18 +0000]
Added some debugging to scoring rules.

3 years agoMore WIP'ery: continuing with printing compressed corpus.
stassa.patsantzis [Wed, 12 Aug 2015 16:17:24 +0000]
More WIP'ery: continuing with printing compressed corpus.
* Printing compressed corpus OK and then learning on it, but I'm not
  getting the results I thought. There's still production fragments and I
  don't get the complete grammar I was expecting. Cause seems to be in
  scoring with strategy set to mode; when learning on the compressed
  corpus this should keep rules that fully explain a single example but it
  seems to be scoring some of them with 0.75, instead of 1. Which means
  they're dropped. That needs fixin'.
* Otherwise, it looks good. As in, it looks like a bloody old mess and I
  need to refactor the hell out of it.

3 years agoBig big hacky WIP. Printing compression grammar module.
stassa.patsantzis [Wed, 12 Aug 2015 14:31:57 +0000]
Big big hacky WIP. Printing compression grammar module.
* It being a bunch of DCG rules that parse a list of tokens to a rule
  name, so we get the compressed tokens in the output (bound to the first
  argument of compression_grammar/1).

3 years agoAdded corpus compression & compressed corpus printing.
stassa.patsantzis [Wed, 12 Aug 2015 12:32:37 +0000]
Added corpus compression & compressed corpus printing.
* This turned up a few bugs here and there, particularly in the way
  second order grammars are printed (best to use write_term/3 than
  format/3, to allow for predicate names that need quoting, particularly
  ones with escaped ''s). Fixed those where I found them.
* There's a pair of new config options in configuration module; that's in
  preparation to moving the directive calling register_world/2 out of this
  module and just leaving clear config options in there. Will probably
  have to move output_stream/1 and the new
  compressed_corpus_output_stream/1 also.
* Btw, compression is borked now- terminals are added to the compressed
  examples unbracketed, so when we try to parse we treat them as
  nonterminals and fail with errors. Will fix.

* Amending to add that I also added a bit to
  production_compressed_string/4 to make sure that we can tell fully
  unparsed strings apart from everything else.

3 years agoRefactored production_compressed_string/4.
stassa.patsantzis [Wed, 12 Aug 2015 10:48:54 +0000]
Refactored production_compressed_string/4.
* Removed third clause - the one that was specific to a single token being
  left in the uncompressed string. This doesn't seem to be necessary; the
  second clause will deal with it anyway. compressed_string/4 was now also
  redundant so that's gone also.

3 years agoAdded productions_compressed_strings/3 to duh.
stassa.patsantzis [Wed, 12 Aug 2015 10:48:23 +0000]
Added productions_compressed_strings/3 to duh.

3 years agoBig pile o' WIP, but printing out first & second order grammar pair.
stassa.patsantzis [Tue, 11 Aug 2015 16:41:25 +0000]
Big pile o' WIP, but printing out first & second order grammar pair.
* Added bunch of configuration options; config is now getting nicely
  overcomplicated and hard to read. Aaaw. It's just like web dev :D
* Added two clauses to print_grammar/3 for each of first and second order
  grammar files in the output. Second order one is rather a mess, due to
  the two big chunks of spaghetti used to print out first_order_phrase/3
  and second_order_phrase/2 (the phrase/[2,3] extentions for those
  grammars). These will need new predicates. In fact, the whole
  print_grammar/3 stuff should go to its own module. I guess.
* Och aye: there's still a bug in production_induction module, where it
  keeps terminal//0 terms from previous runs -and so prints them in the
  grammar output file. Need to fix that.