at_grammar_induction.git
2 years agoMerge branch 'probabilities' old_versions
YeGoblynQueenne@splinter [Sun, 31 Jul 2016 15:40:00 +0000]
Merge branch 'probabilities'

2 years agoAdded small note. graph_fitting needs some tlc. probabilities
YeGoblynQueenne@splinter [Sun, 31 Jul 2016 15:38:56 +0000]
Added small note. graph_fitting needs some tlc.

2 years agoAdded another short mtg examples file.
YeGoblynQueenne@splinter [Sun, 31 Jul 2016 15:35:35 +0000]
Added another short mtg examples file.
* This one has just a handful of abilities with a bit of vartiety,
  unlike the handsim dataset.

2 years agoSmall tweak to dynamic_configuration/0 to unclutter listing.
YeGoblynQueenne@splinter [Fri, 29 Jul 2016 10:42:23 +0000]
Small tweak to dynamic_configuration/0 to unclutter listing.
* dynamic_configuration/0 will now only list production_arity/3 value
  when transformation_format/1 is set to gnf. In right-regular grammars
  production_arity is always 1 for nonterminals and determined by
  graph_arity/1 for terminals.

2 years agoAdded lexicalisation suffix to grammar output file names.
YeGoblynQueenne@splinter [Fri, 29 Jul 2016 10:27:12 +0000]
Added lexicalisation suffix to grammar output file names.

2 years agoAdded bags directory for bootstrapped grammars.
YeGoblynQueenne@splinter [Mon, 18 Jul 2016 20:35:51 +0000]
Added bags directory for bootstrapped grammars.

2 years agoTweaked bootstrap grammar & bags naming format.
YeGoblynQueenne@splinter [Sun, 17 Jul 2016 22:04:29 +0000]
Tweaked bootstrap grammar & bags naming format.

2 years agoRefactored sample/5: split to set_sample/5, sample/4.
YeGoblynQueenne@splinter [Sun, 17 Jul 2016 14:42:53 +0000]
Refactored sample/5: split to set_sample/5, sample/4.
* set_sample/5 expects a set and gives a sample and a set remainder (the
  set remainder of the input and the sample). sample/4 instead makes no
  assumption about ordering or uniqueness and does not construct a
  remainder.
* There's a tradeoff: set_sample/5 needs to ensure that the sample and
  the remainder are in the same order as in the input. This means we
  need to do some list reversing at the end of processing. This happens
  in one of the business-end predicates (sample_/5) and at that point it
  would be faster to just sort the results, but then we wouldn't be able
  to use sample_/5 with sample/4, since sample/4 must preserve
  duplicates and ordering of the data. Expect some optimisation around
  that in the next commit.

2 years agoAdded start symbol arity config option.
YeGoblynQueenne@splinter [Sat, 16 Jul 2016 21:52:52 +0000]
Added start symbol arity config option.
* This is a rule so it's not meant to be modified on-the-fly, or at all
  even. Rules are basically reporting config options, right? This one's
  used to print the start symbol of grammars with the correct arity so I
  don't have to keep fixing it byhand, stupidly.

2 years agoStopped sorting examples in examples_corpus/1.
YeGoblynQueenne@splinter [Sat, 16 Jul 2016 20:20:57 +0000]
Stopped sorting examples in examples_corpus/1.
* I'm pretty sure this gives us better probabilities.

2 years agoAdded slimmed-down names to config; made new output file name config.
YeGoblynQueenne@splinter [Sat, 16 Jul 2016 14:43:48 +0000]
Added slimmed-down names to config; made new output file name config.
* New file naming predicate includes graph and production arities, which
  are basically algorithm hyperparams and are useful to have, basically
  because they make it much easier to know what the difference is
  between two grammar files.

2 years agoSlimmed-down corpus file names. Like, at long last.
YeGoblynQueenne@splinter [Sat, 16 Jul 2016 14:41:35 +0000]
Slimmed-down corpus file names. Like, at long last.

2 years agoFixed bug in let_configuration_options/2
YeGoblynQueenne@splinter [Thu, 14 Jul 2016 21:53:19 +0000]
Fixed bug in let_configuration_options/2
* That messed up changing of mult-clause options on the fly. Should work
  better now (but still keep an eye out 'cause today I'm knackered).

2 years agoSome maintenance; also, added Lovecraft corpus stuff (but not itself)
YeGoblynQueenne@splinter [Thu, 14 Jul 2016 21:20:50 +0000]
Some maintenance; also, added Lovecraft corpus stuff (but not itself)
* Refactored config option output_file_name/2 to include graph and
  production arities.
* Printing some information at loadup, mainly dynamic_configuration/0
  output.
* Configured corpus_utilities to tokenise lovecraft corpus; might want
  to give that a config file all of its own. Or not?

2 years agoFixed bug in skeleton transformation (production/8).
YeGoblynQueenne@splinter [Thu, 14 Jul 2016 21:18:20 +0000]
Fixed bug in skeleton transformation (production/8).
* This one stopped me from making a stochastic gnf with a certain arity
  (3 in this case, but it's corpus-depednent) because it nested
  nonterminals inside trees, like so: ((nt1, nt2),nt3). This failed when
  it hit uneev in weighted_productions/3 (I think). The bug was in
  production/8. Er, that.
* Oh and- I got stochastic gnfs alright now.

2 years agoGot stochastic gnf working; WIP but working
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 22:20:04 +0000]
Got stochastic gnf working; WIP but working

2 years agoMerge branch 'reconfigure' into probabilities
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 08:43:54 +0000]
Merge branch 'reconfigure' into probabilities

2 years agoMade cnf_split_divisor/1 static as foretold in the prophecies of old.
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 08:43:17 +0000]
Made cnf_split_divisor/1 static as foretold in the prophecies of old.

2 years agoClarified config option split_divisor/1 as cnf_split_divisor/1.
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 08:41:27 +0000]
Clarified config option split_divisor/1 as cnf_split_divisor/1.
* This is used in CNF induction, which doesn't really work currently
  anyway, but I'm keeping it in for now, although might change it to
  static (so it's not reported by dynamic_configuration/0) in next
  commit.

2 years agoAdded dynamic_configuration/0 to report options modifieable on-the-fly
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 08:38:40 +0000]
Added dynamic_configuration/0 to report options modifieable on-the-fly

2 years agoMerge branch 'reconfigure' into probabilities
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 07:05:31 +0000]
Merge branch 'reconfigure' into probabilities

2 years agoCleanups - added comments, moved bits around etc.
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 06:59:30 +0000]
Cleanups - added comments, moved bits around etc.

2 years agoRefactored graph_probabilities/2 to allow variable edge arities; WIP.
YeGoblynQueenne@splinter [Wed, 13 Jul 2016 06:30:53 +0000]
Refactored graph_probabilities/2 to allow variable edge arities; WIP.
* This is like, considerably slower than the previous version, except of
  course it allows for any graph_arity/1 value. It also uses the dynamic
  database, whereas the previous version didn't. I'm keeping the old
  version in also and probably using that when graph_arity(1) is true,
  until I find a way to make the new version as fast as the old.

2 years agoAdded some comments to let_configuration_option/[2,3].
YeGoblynQueenne@splinter [Mon, 11 Jul 2016 22:17:36 +0000]
Added some comments to let_configuration_option/[2,3].
* Also a tiny bit of clearer user prompting.

2 years agoReconfigured configuration: added reconfigure/0, traing_set module.
YeGoblynQueenne@splinter [Mon, 11 Jul 2016 21:12:30 +0000]
Reconfigured configuration: added reconfigure/0, traing_set module.
* training_set is the module where examples and language data are loaded
  into (example_string/1, start//0, terminal//0 and nontermina//0). In
  other words, everything that we previously loaded into configuration
  module and accessed through it via explicit module references. This
  was not ideal but it was the easiest thing to do, however it turned up
  a few issues now that I'm trying to do do configuration chanes
  on-the-fly. So, now there's a training_set module and it works as
  configuration module did except it only has training data, not
  configuration predicates. OK?
* OK, don't get mad at me. I just wanted to say: also removed the
  directive calling  register_world/3 from configuration module and
  added a reconfigure/0 predicate in load_configuration that does the
  same thing. This is called from the new training_set module to reload
  training data as before in configuration module. The upside of this is
  that now configuration module doesn't have any functionality in it,
  just configuration options (well, except for the configuration options
  that have a body but yeah).
* Also tested that we can change examples modules by hand (not on the
  fly yet) and found out that there's an error from grammar_evaluation
  module, that comes up when we train, evaluate, switch modules and
  train again. At that point, because we load the examples module by
  hand (and reexport its predicates) in the grammar evaluation module,
  there is a redefinition error. Sucks but it's just the evaluation
  module. I'll have to fix it at some point.

2 years agoRefactored dynamic directives to one. Why did I have two?
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 20:34:55 +0000]
Refactored dynamic directives to one. Why did I have two?

2 years agoMoved register_world/3 directive to load_configuration
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 20:29:43 +0000]
Moved register_world/3 directive to load_configuration
* Not sure what it was doing there. Oh, actually- I do. Having it there
  sort of took care of not having to load configuration and
  load_configuration in a specific order, but I think this horse has
  bolted and anyway moving it to load_configuration makes it even less
  easy for the user to mess up by removing it inadvertently. There's a
  new directive in configuration of course, that calls to a new
  predicate in load_configuration that does exactly the same thing but
  it's a single line (in configuration) so it makes it less easy to mess
  up. I hope. I think. Maybe.

2 years agoHid compression grammar, compression corpus stuff. Need fixing first.
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 20:22:12 +0000]
Hid compression grammar, compression corpus stuff. Need fixing first.
* Specifically need to make sure they work with the new graph
  transformation bits. I think there's also a bit of fiddling to do with
  sanitised graph vertices.

2 years agoconfiguration module is now imported only from project load file.
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 19:34:22 +0000]
configuration module is now imported only from project load file.
* This unifies the way configuration options are accessed from any
  module. Previously some modules accessed config options by explicitly
  referring to the configuration module, while some also called config
  option predicates without the module qualifier but imported the
  configuration module directly. So I'm making sure everything follows
  the same format from now on.
* To be fair this is a bit fiddly and it depends a lot on the order by
  which project files are loaded, but it's workable so.

* Yeah yeah. Forgot to commit the actual files. Mumble mumble.

2 years agoImplemented dynamic configuration options; WIP and untested.
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 04:07:02 +0000]
Implemented dynamic configuration options; WIP and untested.

2 years agoRemoved vestigial config options; cleaned up.
YeGoblynQueenne@splinter [Sun, 10 Jul 2016 00:32:32 +0000]
Removed vestigial config options; cleaned up.
* Note well that cleanup includes:
a) predicates in graph_fitting module that called out to one of the
   vestigial options removed, specifically sort_graph/2 that never
   sorted the graph, because we 're now leaving that up to
   transformation rules to manage.
b) Some attempts at GNF version 1 that didn't quite go as planned.
   Hopefully this won't end like Resident Evil begins.

2 years agoSlightly tweaked gnf transformation to remove broken pointers.
YeGoblynQueenne@splinter [Sat, 9 Jul 2016 21:48:14 +0000]
Slightly tweaked gnf transformation to remove broken pointers.
* Still hapening though, with the full corpus. Will need to investigate.

2 years agoVersion 2 GNF; previous one was fragmented and had various issues. WIP.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 19:46:12 +0000]
Version 2 GNF; previous one was fragmented and had various issues. WIP.
* WIP and unstable still.

2 years agoAdded production_arity/3 config option for GNF and possibly others.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 17:23:42 +0000]
Added production_arity/3 config option for GNF and possibly others.

2 years agoAdded GNF; like, for real this time. WIP and shaky but standing up.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 17:10:17 +0000]
Added GNF; like, for real this time. WIP and shaky but standing up.

2 years agoRemoved vestigial s_rGNF transformation ruleset.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 15:58:41 +0000]
Removed vestigial s_rGNF transformation ruleset.

* Amending to also remove transformation/3 clause for s_rGNF

2 years agoAdded stochastic right-regular grammar format; cleaned up.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 09:12:46 +0000]
Added stochastic right-regular grammar format; cleaned up.
* Stochastic rr format currently only works with graph_arity(1), or
  rather weighing the skeleton graph with probabilities only works if
  k=1. Will need to fix that.

2 years agoAdded lexicalisation; tested it works with whole M:tG corpus- it does.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 08:09:19 +0000]
Added lexicalisation; tested it works with whole M:tG corpus- it does.

2 years agoGeneralised transformation formats to right-regular with variable length.
YeGoblynQueenne@splinter [Wed, 6 Jul 2016 07:02:09 +0000]
Generalised transformation formats to right-regular with variable length.
* This should cover my earlier rGNF and xGNF. Haven't tested much yet
  and may need some optimisation.

2 years agoStabilised graph fitting; addec comments; configuration option.
YeGoblynQueenne@splinter [Tue, 5 Jul 2016 21:02:11 +0000]
Stabilised graph fitting; addec comments; configuration option.
* skeleton_graph/2 is now arity-3, with a new argument for K, the number
  of right-vertices of edges in the skeleton graph (actually, the
  maximum number of such edges). Anything that calls skeleton_graph/3
  should pass it the value of the new configuration option
  graph_arity/1.
* Made sure that's the case with prin_skeleton_graph/1 and
  un/load_skeleton_graph/0 (though I haven't tested with them and they
  might not work).
* Also added configuration option to skeleton_transformation module (in
  corpus_productions/2).
* Also refactored skeleton_graph/2 to make it a bit more tidy and also
  to make sure the order by which edges are created is preserved, to
  allow for lexicalisation.
* Finally, tested that the main production composition formats work
  (rGNF, rGNF_lex, xGNF, xGNF_lex and s_rGNF), all with graph_arity(1).
  I'll need to make some changes probably to make them work with larger
  arities but at that point I might as well generalise them to a
  right-regular production format and that new actual-GNF one.

2 years agoAdded graph fitting, production transformation with k-power; big WIP
YeGoblynQueenne@splinter [Tue, 5 Jul 2016 09:09:39 +0000]
Added graph fitting, production transformation with k-power; big WIP
* Also added subsequence/6 that also binds the elements of a list before
  a split, thus yielding a suffix, infix and prefix. It's a bit
  expensive but it may come in handy.
* K-pow stuff is very WIP and a bit of a mess and on top of all that not
  very Prolog-y. There may be more Prologuish ways to do all that, or
  maybe it's just better to give up on Prolog and use a language that
  does arrays better, like C.
* Yeah right.
* Oh but: made some experiments and I can make graphs and grammars with
  K of my choice. Which is not a miracle but it's what I wanted to do so
  let's see about getting actual GNF power next.

2 years agoAdded subsequence/5, to split lists from an index to another.
YeGoblynQueenne@splinter [Mon, 4 Jul 2016 23:05:41 +0000]
Added subsequence/5, to split lists from an index to another.

2 years agoMerge branch 'cfg_induction' into probabilities
YeGoblynQueenne@splinter [Mon, 4 Jul 2016 21:57:44 +0000]
Merge branch 'cfg_induction' into probabilities

* Had to resolve and merge conflicts by hand. Just a couple of lines but
  keep an eye out.

2 years agoMerge branch 'probabilities' of goblinopera.com:/home1/goblinop/public_html/repos...
stassa [Sun, 3 Jul 2016 14:33:49 +0000]
Merge branch 'probabilities' of goblinopera.com:/home1/goblinop/public_html/repos/at_grammar_induction into probabilities

2 years agoScript to extract TODO tags from project notes file; EATS DATA.
stassa [Sun, 3 Jul 2016 14:27:09 +0000]
Script to extract TODO tags from project notes file; EATS DATA.

2 years agoSupervised learning experiment; can't remember how it went, check notes.
stassa [Sun, 3 Jul 2016 14:26:19 +0000]
Supervised learning experiment; can't remember how it went, check notes.

2 years agoCNF experiment, top-down version. Not sure how it works. See notes?
stassa [Sun, 3 Jul 2016 14:23:48 +0000]
CNF experiment, top-down version. Not sure how it works. See notes?

2 years agoMostly configuration changes.
stassa [Sun, 3 Jul 2016 14:19:36 +0000]
Mostly configuration changes.
* Also, new transformation rules implementing a general GNF (now just a
  right regular grammar...) format as notated in the dissertation report.

2 years agoAdded CFG power; experimental, wip, needs integration.
YeGoblynQueenne@splinter [Sat, 2 Jul 2016 15:04:47 +0000]
Added CFG power; experimental, wip, needs integration.

2 years agoAdded anbn language experiment. Not happy
Stassa Patsantzis [Fri, 3 Jun 2016 22:45:35 +0000]
Added anbn language experiment. Not happy

2 years agoRemoved bias updates and added some notes saying you're doing it wrong.
stassa [Mon, 30 May 2016 22:42:12 +0000]
Removed bias updates and added some notes saying you're doing it wrong.
* Basically, I'm doing something really stupid and I'm not training
  weights for a whole problem, just for a single input at a time. This
  needs more work.

2 years agoNoted that results go down if I try to train the bias.
stassa [Mon, 30 May 2016 22:28:19 +0000]
Noted that results go down if I try to train the bias.
* I think maybe that's because I'm not exactly doing regression (least
  squares or whatever) there, but - a perceptron? Dunno.

2 years agoAdded bias to weight updates in regression grammar.
stassa [Mon, 30 May 2016 22:02:27 +0000]
Added bias to weight updates in regression grammar.

3 years agoRegression experiment: swapped order of X,Ys to get Y more easily.
stassa [Mon, 23 May 2016 13:55:47 +0000]
Regression experiment: swapped order of X,Ys to get Y more easily.

3 years agoAdded regression grammar experiment.
stassa [Mon, 23 May 2016 13:36:51 +0000]
Added regression grammar experiment.
* Even though I'm in the middle of something... sigh...

3 years agoAdded predicates to print n or print to stream.
stassa [Sat, 21 May 2016 15:03:38 +0000]
Added predicates to print n or print to stream.
* Also started to make the name of examples predicates generic but that
  broke a whole bunch of stuff so I'm not commiting all that yet.
  query_interface should not be affected but if all goes to hell, revert
  before this.

3 years agoMade corpus dumper generic (not tied to Brown no more).
spatsant [Tue, 3 May 2016 20:28:45 +0000]
Made corpus dumper generic (not tied to Brown no more).
* Still need to move configs to configuration module.

3 years agoRefactored the way naming of grammar files works a bit.
spatsant [Sat, 30 Apr 2016 19:57:25 +0000]
Refactored the way naming of grammar files works a bit.
* Particularly the bootstrapped grammars' files, to add the number of bags
  and ratio of samples to corpus etc.
* Also removed the '(not_)lexicalised' at the end of grammar filenames
  since this information is er, inherent, in the grammar formalism and
  long names just end up being harder to read.
* Also added a couple of comments and a bit of output to show when a
  bootstrapping bag is done. This will need proper logging though.

3 years agoTrying to fix pdcg_derivation/5 going infinite... again.
spatsant [Fri, 29 Apr 2016 22:13:11 +0000]
Trying to fix pdcg_derivation/5 going infinite... again.
* I put some you_are_here/1s in the code and some cuts and so on, but this
  time the fault is really with list_length/2... probably. Particularly
  the min/1 bit. I'll need to fix that.

3 years agoAdded dcg_derivation/3, deterministic counterpart to pdcg_derivation/5.
spatsant [Fri, 29 Apr 2016 21:40:05 +0000]
Added dcg_derivation/3, deterministic counterpart to pdcg_derivation/5.
* And incidentally fixed the infinite backtracking that plagues the
  stochastic version in this one. And with half my brain asleep!

3 years agoAdded pdcg_derivation/5 to query_interface module; fixed a couple of bugs.
spatsant [Fri, 29 Apr 2016 20:03:39 +0000]
Added pdcg_derivation/5 to query_interface module; fixed a couple of bugs.
* Still a big old mess. Basically, the user interface to the grammar is
  all over the bloody place and spread across two or three different
  modules and works in a whole bunch of different ways. Need to unify a
  bit. But first I need to do the ANLE stuff.

3 years agoTrying to do some python integration. It's a big old mess in here.
spatsant [Fri, 29 Apr 2016 19:54:24 +0000]
Trying to do some python integration. It's a big old mess in here.

3 years agoMissed a spot.
spatsant [Thu, 28 Apr 2016 17:06:29 +0000]
Missed a spot.
* Updated python interface with new name of user interface module
  (query_interface).

3 years agoContinuing with query interface.
spatsant [Thu, 28 Apr 2016 16:51:04 +0000]
Continuing with query interface.
* Renamed user_interface module to query_interface. Because I can.
* Updated project load file with new module name.
* Moved pdcg_print_metrics/3, pdcg_print/5 and pdcg_print/4 from
  grammar_utilities to query_interface module and renamed.
* Added module qualifier to pdcg_parse/4, to point it to the currently
  loaded grammar module. That was handled by pdcg_print/[4,5] before.
  Adding the module qualifier makes it more clear that the predicate
  expects a module to be loaded.
* Added corpus_utilities to load_libs module.
* Moved sentence_completion/2 and friends from project_utilities to
  query_interface.
* Added stochastic_grammar_module/2 to grammar_utilities, stochastic
  counterpart to grammar_module/2. These will need some abstraction/
  configuration to unify back to a single predicate. The problem is
  basically that stochastic grammars' start symbols expect an argument to
  bind probabilities, whereas deterministic grammars' have 0 arity. I
  should make it so the configuration determines the arity of start
  symbols in grammars (depending on anything relevant, including
  stochastic or not).

3 years agoAdded user interface and pythonic interface. still very WIP
spatsant [Thu, 28 Apr 2016 16:09:07 +0000]
Added user interface and pythonic interface. still very WIP
* Also, continued with query interface stuff and realised it's basically
  borked, so looking forward to unborking. Probably move everything to
  user interface so it's less of a mess in utilities.

3 years agoContinuing with unifying the user input interface.
spatsant [Thu, 28 Apr 2016 14:18:56 +0000]
Continuing with unifying the user input interface.

3 years agoReplacing old sanitisation machinery with new. Working WIP
spatsant [Thu, 28 Apr 2016 13:45:09 +0000]
Replacing old sanitisation machinery with new. Working WIP
* The idea is to present a unified interface to the user (duh) regardless
  of the kind of grammar. Basically, I made a different (slightly more
  user-friendly) interface for the probabilistic grammar, so I now need to
  unify it with the old one, for the deterministic ones.

3 years agoMissed a spot. 'End of an era'...
spatsant [Thu, 28 Apr 2016 11:57:13 +0000]
Missed a spot. 'End of an era'...

3 years agoRemoved production_induction module (and precursors). End of an era innit.
spatsant [Thu, 28 Apr 2016 11:56:40 +0000]
Removed production_induction module (and precursors). End of an era innit.

3 years agoTwo supervised learning experiments.
stassa [Thu, 28 Apr 2016 00:51:08 +0000]
Two supervised learning experiments.
* supervised_learning2 seems to be working ok, except the mini grammar it
  learned is a bit shit. Hopefully it's the kind of tree I trained on
  that's screwing it.

3 years agoAdding (un-)sanitisation to pdcg_print/5
spatsant [Wed, 27 Apr 2016 20:15:10 +0000]
Adding (un-)sanitisation to pdcg_print/5
* Basically, add a prefix before sending the user's input to the grammar,
  then remove it again to display it nice and tidy.

3 years agoRemoved sanitisation from skeleton transformation (left to graph fitting)
spatsant [Wed, 27 Apr 2016 15:35:34 +0000]
Removed sanitisation from skeleton transformation (left to graph fitting)
* Also fixed little bug that should have bit me much harder, in
  sanitised_edge/2, plus cleanups and so on.

3 years agosanitised_edge/2 - WIP but working.
stassa [Wed, 27 Apr 2016 11:31:37 +0000]
sanitised_edge/2 - WIP but working.
* Also changed start symbol of english language module, because it seems
  to clash with the token 'sentence' in the corpus, and confuse training.
  The point is that if I declare the english language start symbol to be
  'sentence' then I don't see a rule 'sentence' at the position where I'd
  expect the start symbol, top of the grammar tree. If I give it as
  'english_sentence' as I do with this change, then I do get a normal
  grammar. Which, duh: hints at a bug, but can't be bovvered to identify
  it and correct it right now.

3 years agoTrying to sanitise edges; WIP and broken.
spatsant [Wed, 27 Apr 2016 08:27:14 +0000]
Trying to sanitise edges; WIP and broken.

3 years agoPut back configuration to default ish.
stassa [Mon, 25 Apr 2016 23:24:49 +0000]
Put back configuration to default ish.

3 years agoFleshing out grammar querying with pdcg_print/5 and friends. WIP.
stassa [Mon, 25 Apr 2016 23:20:19 +0000]
Fleshing out grammar querying with pdcg_print/5 and friends. WIP.
* term_utilities module was somehow saved in utf-16 with er, long endian?
  Somethign weird. I hope it's OK now, but we'll see how it goes.
* Also, fixed list_length/2, where passing a maximum length would
  previously make it go infinite. The fix may mean clients will now be
  going infinite, so need to keep an eye out.

3 years agoAdded corpus utilities: tokenisers and corpus-dumping.
stassa [Mon, 25 Apr 2016 22:07:29 +0000]
Added corpus utilities: tokenisers and corpus-dumping.

3 years agoCommented out start of refactoring from previous time; cleanups
stassa [Sun, 24 Apr 2016 21:00:55 +0000]
Commented out start of refactoring from previous time; cleanups
* You are not yet ready to refactor.

3 years agoFixed production duplication bug; some cleanups and stuff.
stassa.patsantzis [Tue, 19 Apr 2016 16:54:29 +0000]
Fixed production duplication bug; some cleanups and stuff.
* Production duplication bug was really edge duplication bug. I was
  sorting skeleton graph edges only on the first argument of edge/2 terms,
  because, er, probably because I was sleepy when I did that :P
* Anyway, the useful sorting is to sort on the whole edge/2 term 'cause
  then we have both a sort on the first argument and on the whole term, as
  required by counting vertices and edges, respectively. So now we're fine
  and dandy and we only count each edge(V1,V2) term once to get
  probabilities.
* Which btw means my scary recurse-in-tandem stuff in edge_counts/5
  worked! Woohoo.
* Lots of nice reasons to woohoo in this commit, but it's also going to
  cause conflicts with what's on the home machine 'cause I didn't push
  from home yesterday. Oh well.

3 years agopdcg_print_metrics/3 and stuff.
spatsant [Mon, 18 Apr 2016 19:55:19 +0000]
pdcg_print_metrics/3 and stuff.

3 years agoContinuing with probabilities as head-arguments rather than pushbacks.
stassa [Mon, 18 Apr 2016 06:27:27 +0000]
Continuing with probabilities as head-arguments rather than pushbacks.
* Added Prolog calls {in curly braces} to bodies of rules, used to
  calculate their probabilities (in s_production/_at_least_7).
* Added pdcg_parse/5 and pdcg_print/4 predicates to grammar_utilities to
  work with the new format.
* Renamed earlier pdcg_parses/3 and pdcg_parse/2 to pushback_pdcg_parse/s
  to distinguish from the newer versions.
* Added s_rGNF transformation_format/1 option to configuration.
* Added numbervars(true) option to write_term/3 calls in body of
  print_term/2 in grammar_printing module. This is so the variables in
  probabilistic productions that bind to the values of rules'
  probabilities can be printed in the output file as named variables
  rather than Prolog auto-generated ones, which are a little hard on the
  eyes.

3 years agoStochastic grammars - very WIP and not really working yet.
stassa [Sun, 17 Apr 2016 08:51:25 +0000]
Stochastic grammars - very WIP and not really working yet.
* Added graph_probabilities/2 and friends, and s_rGNF transformation
  clause and production composition rules to go with it.
* This version of probabilistic DCGs works with an argument at the head of
  the rule denoting its probability. A Prolog call (to is/2) in the end of
  the body of the rule will compute the product of probabilities of the
  rule and its successors (not done yet).
* Tested with the usual corpora and it seems to be either going infinite
  on the full M:tG corpus, or going infinite. Will need to fix ths.
* Anyway tansformation is not complete. Ha ha. Keeping before I mess it
  up to sort it out.
* Also added sanitise_constituent/2 to interface to /3 (since it's now
  called directly from the production composition rules for s_rGNF).
* I started the probabilistic branch with an attempt to use pushback lists
  to do the same, but that proved to be very unwieldy, not least because
  you need to have a probabilistic parser to go with the grammar and
  pushbacks make it very fiddly to get this right (because they push their
  probabilities into the input so it's very hard to know when to stop
  parsing, basically). There's some vestiges of that in grammar_utilities
  which I will have to think of what to do with. At on to from whence. See
  project notes from yesterday for an expose-y.

3 years agoMoved utility predicates to their own files. bootstrapping
stassa [Fri, 15 Apr 2016 19:24:31 +0000]
Moved utility predicates to their own files.
* Also, just noticed I'm still on bootstrapping branch, which might
  explain why I thought a while ago that I had committed some stuff when I
  hadn't. Or thought I hadn't. We'll see when I switch back to master.

3 years agoMoved utilities to its own, lib/ dir; added Shannon entropy module
stassa [Fri, 15 Apr 2016 18:02:20 +0000]
Moved utilities to its own, lib/ dir; added Shannon entropy module
* Shannon module has stuff like run-length encoding and counting relative
  frequency of strings that will come in handy for counting stuff. There
  may be more efficient methods to do them though.

3 years agoAdding back inference limit message to sentence_completion/3
stassa [Mon, 11 Apr 2016 05:47:30 +0000]
Adding back inference limit message to sentence_completion/3

3 years agoMerge branch 'bootstrapping' of goblinopera.com:/home1/goblinop/public_html/repos...
stassa [Thu, 7 Apr 2016 21:47:27 +0000]
Merge branch 'bootstrapping' of goblinopera.com:/home1/goblinop/public_html/repos/at_grammar_induction into bootstrapping

3 years agoActually committing current_configuration/0.
stassa [Thu, 7 Apr 2016 21:46:38 +0000]
Actually committing current_configuration/0.

3 years agoAdded extended GNF; yay, another full CFG.
stassa.patsantzis [Thu, 7 Apr 2016 16:03:12 +0000]
Added extended GNF; yay, another full CFG.
* This one even works with the whole M:tG corpus (which CNF does not).

3 years agoMerge branch 'bootstrapping' of 127.0.0.1:/home1/goblinop/public_html/repos/at_gramma...
stassa.patsantzis [Thu, 7 Apr 2016 08:47:15 +0000]
Merge branch 'bootstrapping' of 127.0.0.1:/home1/goblinop/public_html/repos/at_grammar_induction into bootstrapping

Conflicts:
tree_learning/configuration.pl

* Merge conflict from config option - I thought I'd get one from the
  changes to utilities.pl but no, ok.

3 years agoRe-added failure message to sentence_completion/3.
stassa.patsantzis [Thu, 7 Apr 2016 08:44:41 +0000]
Re-added failure message to sentence_completion/3.

3 years agoAdded current_configuration/0 to stop calling current_configuration(_).
stassa [Thu, 7 Apr 2016 06:56:40 +0000]
Added current_configuration/0 to stop calling current_configuration(_).

3 years agoWorking bootstrapping, but rickety and rather hacky.
stassa.patsantzis [Wed, 6 Apr 2016 15:24:41 +0000]
Working bootstrapping, but rickety and rather hacky.
* Had to hack the way skeleton_transformation module finds the star symbol
  for each bootstrap grammar. See module source for details, you can't
  miss the comments.
* Tested with 4 samples of 0.3 size, I think. That works alright,
  particularly with lexicalised grammars (rGNF_lex) but I tried it with 10
  samples of 0.3 size and I waited for ever for this to terminate. I got
  better, but it still didn't terminate.

3 years agoskeleton_transformation merged from master branch.
stassa.patsantzis [Wed, 6 Apr 2016 15:22:56 +0000]
skeleton_transformation merged from master branch.

3 years agoBootstrapping experiment; WIP, stable but not really working yet.
stassa [Wed, 6 Apr 2016 06:39:12 +0000]
Bootstrapping experiment; WIP, stable but not really working yet.

3 years agoAdded comments.
stassa [Tue, 5 Apr 2016 20:29:03 +0000]
Added comments.

3 years agoSlight refactoring of sample/5 et al, to see if I can shave a ms off it.
stassa [Mon, 4 Apr 2016 22:47:21 +0000]
Slight refactoring of sample/5 et al, to see if I can shave a ms off it.
* Doesn't look like it, but might pay off with larger data.

3 years agoRandom subsampling; WIP but works.
stassa [Mon, 4 Apr 2016 22:39:03 +0000]
Random subsampling; WIP but works.

3 years agoRefactored sentence_completion/3; added CNF experiment; english language.
stassa [Sun, 3 Apr 2016 20:56:11 +0000]
Refactored sentence_completion/3; added CNF experiment; english language.
* The english language module is from way back, not sure why I hadn't
  committed it earlier. It should be in the repo I'm pretty sure.
* Also added a couple of new config options, one for sentence_completion/2
  (to limit inferences) and one for cnf learning (but a bit useless).
  Configuration needs a good spring cleaning.

3 years agoAdded CNF generation; bit borked, totally WIP, needs comments.
stassa [Sat, 2 Apr 2016 12:09:20 +0000]
Added CNF generation; bit borked, totally WIP, needs comments.
* So, this does work, I guess, but it's a bit meh. Essentially you get one
  rule per example, which is not particularly impressive. You also only
  get one _derivation_ per example, at least so far. Er. But, the worst
  thing is that it dies under the weight of the whole M:tG corpus,
  probably because of abuse of name/constituent sanitisation. Need to
  figure out what do do about that.

3 years agoAdded sentence_completion/3 predicate. WIP but generated from full corpus.
stassa [Sat, 2 Apr 2016 12:07:23 +0000]
Added sentence_completion/3 predicate. WIP but generated from full corpus.
* It's WIP in that it's still a bit finicky when you load a different
  grammar module after generating from another one. Still - it works,
  bitches! The secret is to not generate at random, but give it a sentence
  length and some starting string.