Merge lp://staging/~onboard/onboard/word-completion into lp://staging/~onboard/onboard/main

Proposed by marmuta
Status: Merged
Merged at revision: 518
Proposed branch: lp://staging/~onboard/onboard/word-completion
Merge into: lp://staging/~onboard/onboard/main
Diff against target: 111635 lines (+26524/-67988)
124 files modified
.bzrignore (+3/-0)
Onboard/Config.py (+38/-5)
Onboard/KeyCommon.py (+31/-12)
Onboard/KeyGtk.py (+146/-10)
Onboard/Keyboard.py (+308/-27)
Onboard/KeyboardGTK.py (+3/-2)
Onboard/KeyboardSVG.py (+95/-42)
Onboard/Layout.py (+17/-0)
Onboard/OnboardGtk.py (+5/-0)
Onboard/WordPredictor.py (+242/-0)
data/onboard.gschema.xml (+23/-2)
layouts/Full Keyboard-Alpha.svg (+1/-4)
layouts/Full Keyboard.onboard (+184/-170)
po/ace.po (+0/-842)
po/af.po (+264/-791)
po/am.po (+0/-867)
po/ar.po (+230/-781)
po/ast.po (+264/-866)
po/az.po (+0/-883)
po/be.po (+257/-797)
po/bg.po (+264/-848)
po/bn.po (+244/-875)
po/br.po (+264/-790)
po/bs.po (+242/-879)
po/ca.po (+267/-868)
po/ca@valencia.po (+0/-986)
po/cs.po (+240/-878)
po/cy.po (+0/-840)
po/da.po (+266/-801)
po/de.po (+270/-868)
po/el.po (+271/-806)
po/en_AU.po (+272/-868)
po/en_CA.po (+251/-813)
po/en_GB.po (+266/-862)
po/eo.po (+264/-860)
po/es.po (+269/-864)
po/et.po (+228/-769)
po/eu.po (+245/-783)
po/fi.po (+266/-854)
po/fil.po (+226/-722)
po/fo.po (+0/-847)
po/fr.po (+271/-871)
po/ga.po (+226/-724)
po/gl.po (+274/-871)
po/he.po (+272/-810)
po/hi.po (+256/-812)
po/hr.po (+231/-736)
po/hu.po (+262/-860)
po/hy.po (+0/-841)
po/id.po (+238/-824)
po/is.po (+231/-730)
po/it.po (+269/-865)
po/ja.po (+250/-856)
po/kk.po (+260/-808)
po/km.po (+0/-844)
po/kn.po (+231/-738)
po/ko.po (+246/-822)
po/ku.po (+226/-722)
po/ky.po (+0/-840)
po/lt.po (+241/-782)
po/lv.po (+263/-860)
po/ml.po (+233/-749)
po/mr.po (+0/-840)
po/ms.po (+268/-862)
po/my.po (+0/-840)
po/nb.po (+255/-815)
po/ne.po (+0/-885)
po/nl.po (+279/-876)
po/nn.po (+0/-846)
po/oc.po (+263/-864)
po/onboard.pot (+345/-569)
po/pl.po (+253/-882)
po/pms.po (+226/-722)
po/pt.po (+263/-833)
po/pt_BR.po (+277/-872)
po/ro.po (+270/-806)
po/ru.po (+260/-858)
po/si.po (+0/-846)
po/sk.po (+267/-802)
po/sl.po (+253/-880)
po/sn.po (+226/-722)
po/sq.po (+263/-862)
po/sr.po (+268/-874)
po/sv.po (+263/-859)
po/ta.po (+0/-861)
po/te.po (+0/-845)
po/th.po (+239/-766)
po/tl.po (+226/-722)
po/tr.po (+267/-862)
po/ug.po (+262/-860)
po/uk.po (+261/-799)
po/vi.po (+264/-800)
po/zh_CN.po (+262/-800)
po/zh_HK.po (+260/-856)
po/zh_TW.po (+259/-855)
prediction/gpredict (+456/-0)
prediction/makemodels (+233/-0)
prediction/pypredict/Makefile (+31/-0)
prediction/pypredict/README (+119/-0)
prediction/pypredict/__init__.py (+2/-0)
prediction/pypredict/analyze (+337/-0)
prediction/pypredict/entropy (+62/-0)
prediction/pypredict/ksr (+70/-0)
prediction/pypredict/lm.cpp (+409/-0)
prediction/pypredict/lm.h (+263/-0)
prediction/pypredict/lm_dynamic.cpp (+48/-0)
prediction/pypredict/lm_dynamic.h (+756/-0)
prediction/pypredict/lm_dynamic_cached.h (+471/-0)
prediction/pypredict/lm_dynamic_impl.h (+935/-0)
prediction/pypredict/lm_dynamic_kn.h (+393/-0)
prediction/pypredict/lm_merged.cpp (+223/-0)
prediction/pypredict/lm_merged.h (+130/-0)
prediction/pypredict/lm_python.cpp (+1781/-0)
prediction/pypredict/ngram-test (+252/-0)
prediction/pypredict/optimize (+217/-0)
prediction/pypredict/pool_allocator.cpp (+377/-0)
prediction/pypredict/predict (+98/-0)
prediction/pypredict/pypredict.py (+357/-0)
prediction/pypredict/setup.py (+21/-0)
prediction/pypredict/split_corpus (+72/-0)
prediction/pypredict/test_pypredict.py (+295/-0)
prediction/pypredict/train (+66/-0)
prediction/test-client (+44/-0)
setup.py (+1/-0)
To merge this branch: bzr merge lp://staging/~onboard/onboard/word-completion
Reviewer Review Type Date Requested Status
Onboard Devel Team preview Pending
Review via email: mp+12908@code.staging.launchpad.net
To post a comment you must log in.
Revision history for this message
marmuta (marmuta) wrote :

Hi Chris, Fernando et. al.!

I'm working on word completion/prediction in onboard and there is a partially working prototype in this branch now, ready for a first benevolent look. I'd be glad if you could take a moment off of your busy schedules and and try it. Mind you, this is work in progress and far from being mergable, but it should at least give a first impression and help weed out design fails and omissions. Code review is very welcome too, I've learned a lot last time.

If you want to try it, run ./makedicts from the project home and it ought to download training texts and create dictionaries. Then run onboard from the project home as well and select the Word Completion layout. Type away, click the words in the top row: Left click with auto punctuation, right click without. It does completion only, no prediction yet. No learn mode yet either, the Learn-, Punct-, Dict- buttons aren't wired.

Different languages than english are supported, but for now only one at a time and you can switch only by changing the dictionary file in WordPredictor.py. If you need more dictionaries then install additional language packages for aspell and rerun makedicts, although training texts are only downloaded for en, es and de yet.

Cheers, let me know what you think.

Revision history for this message
marmuta (marmuta) wrote :

> Hi Chris, Fernando et. al.!
>
Uh-oh, that would be Francesco, sorry for that <:I

191. By marmuta

Some cleanup, added early support for multiple dictionaries.

192. By marmuta

Added auto-learning (always on) and dictionary saving (still too often, needs timed auto save)

193. By marmuta

Learn and Dict buttons are working now and hopefully properly connected to gconf.
Gconf schema has changed to add a new folder 'word_completion'.

Revision history for this message
Chris Jones (tortoise) wrote :

Sorry this has taken a while for me to look at.

I haven't looked at the code in detail yet but it works well.

A couple of things that occur to me:
1. The dictionaries are quite quite big would it be possible to re-use the firefox or openoffice dictionaries?
2. I know this is just a prototype but wouldn't it be better to keep as much of the code as possible in a separate library? Other applications/input methods might find it useful.

What do you think about a soft dependency on AT-SPI, Yuk I know but people are working on it, that would allow the word completion engine to detect widget focus change and caret movement?

Cheers, Chris

> Hi Chris, Fernando et. al.!
>
> I'm working on word completion/prediction in onboard and there is a partially
> working prototype in this branch now, ready for a first benevolent look. I'd
> be glad if you could take a moment off of your busy schedules and and try it.
> Mind you, this is work in progress and far from being mergable, but it should
> at least give a first impression and help weed out design fails and omissions.
> Code review is very welcome too, I've learned a lot last time.
>
> If you want to try it, run ./makedicts from the project home and it ought to
> download training texts and create dictionaries. Then run onboard from the
> project home as well and select the Word Completion layout. Type away, click
> the words in the top row: Left click with auto punctuation, right click
> without. It does completion only, no prediction yet. No learn mode yet either,
> the Learn-, Punct-, Dict- buttons aren't wired.
>
> Different languages than english are supported, but for now only one at a time
> and you can switch only by changing the dictionary file in WordPredictor.py.
> If you need more dictionaries then install additional language packages for
> aspell and rerun makedicts, although training texts are only downloaded for
> en, es and de yet.
>
> Cheers, let me know what you think.

Revision history for this message
Francesco Fumanti (frafu) wrote :

> A couple of things that occur to me:
> 1. The dictionaries are quite quite big would it be possible to re-use the
> firefox or openoffice dictionaries?

The dictionaries will probably come in a separated debian package; so size will probably not be a problem for the LiveCD. If there are other reasons for having smaller dictionaries or using those from firefox, than it is another topic.

Revision history for this message
Francesco Fumanti (frafu) wrote :

> 2. I know this is just a prototype but wouldn't it be better to keep as much
> of the code as possible in a separate library? Other applications/input
> methods might find it useful.

In GNOME they are planning to start with a port of GOK to python, so they might be interested in a shared prediction library. (As they are primarily only concentrating on switch users, they excluded onboard as suitable starting point.)

However, I would prefer that at the moment we concentrate on creating a well working word completion/prediction for onboard. Will it not be easier to develop it directly in onboard instead of doing it immediately as an external library!?

194. By marmuta

updated makdicts to support options (see makdicts -h) and a list of languages on the command line.

195. By marmuta

Word completion keeps better track of recent input now and can restart at any point when backspacing.
Added auto-save for saving modified dictionaries, default is every 10min and on exit. New gconf key word_completion/auto_save_interval.

Revision history for this message
marmuta (marmuta) wrote :

> Sorry this has taken a while for me to look at.
No problem,

> I haven't looked at the code in detail yet but it works well.
>
> A couple of things that occur to me:
> 1. The dictionaries are quite quite big would it be possible to re-use the
> firefox or openoffice dictionaries?
Word completion needs word frequencies to be useful. I had looked at myspell, ispell and aspell and didn't find any frequency based weighting. Aspell has the advantage that it can dump it's dictionaries to stdout, that is why I'm using it as a basis for the frequency counting. So, currently I don't see an alternative for separate dictionaries for onboard. I do believe that the dictionary sizes can be reduced though. Running makedicts with -f gives dictionaries <200kB each and they could even be compressed on disk. They still have 15-20000 words and considering that GOKs dictionary is around 3000 words, that seems like a good enough starting point.

> 2. I know this is just a prototype but wouldn't it be better to keep as much
> of the code as possible in a separate library? Other applications/input
> methods might find it useful.
I believe it is too early for this. I'm constantly changing interfaces so it would just slow things down at the moment. I'm trying to keep the core of the completion and punctuation reasonably separate anyway, so this should be doable later. The only currently not build-in dependency is KeyCommon.

> What do you think about a soft dependency on AT-SPI, Yuk I know but people are
> working on it, that would allow the word completion engine to detect widget
> focus change and caret movement?
I think we should try that. The word completion is currently trying to keep track of what's happening, but there just isn't enough information to get it right. I've enabled AT a while ago just to see how it feels and I hardly see a difference at all. So, I guess I'll look into it at some point

> Cheers, Chris

Cheers

196. By marmuta

Only load dictionaries when the layout has use for them.
Removed depedency KeyCommon from WordPredictor.py.
Fixed spurious "U"s in auto punctuation
WC keeps track of additional editing keys: del, cursor left/right.
Toggling punctuation doesn't reset input line anymore. '

197. By marmuta

Added additional weighting of words based on their usage oder. A new gconf key 'frequency_time_ratio' controls the ratio between the old frequency based weighting and time of last use.

198. By marmuta

- fixed save on exit
- learn button discards input line when turning off, keeps it when turning on
- added Francesco's learning texts for french and italian
- makedicts defaults to "expand affixes", "don't include infrequent words" -> dictionary sizes around 200kB
- exclude Project Gutenberg license headers and footers from training data
- esc key clears input buffer
- don't learn words with more than 3 repeated characters

199. By marmuta

added ability to toggle word completion including its ui via new gconf key enable_word_completion

200. By marmuta

experimental detection of mouse clicks outside of onboard; reset word completion on every detected click.

201. By marmuta

- reworked the punctuation logic again to get key feedback; hopefully fixing the issues with ; and : in the process
- added Francescos fixes to gconf schemas and delete button
- fixed up experimental outside click detection

202. By marmuta

work around memory leak in pangocairo.CairoContext.create_layout() (gnome #599730)

203. By marmuta

- added input history with color highlighting (negotiable ;) blue: ignored, yellow/red: new word to learn
- added stealth button + new gconf key stealth_mode
- fixed wrong default values for auto_save_interval and frequency_time_ratio when gconf keys are missing
- replaced word_completion with word_prediction in gconf schemas and most everywhere every
- increased default frequency_time_ratio from 50 to 75
- another small update of the keyboard logic, potential for slightly less bugs

204. By marmuta

added word prediction test application, n-grams of arbitrary order, various smoothing algorithms incl. kneser-neyinterpolation

205. By marmuta

Merge from main branch

206. By marmuta

fixed crasher on start with unavailable dictionaries

207. By marmuta

- fixed updating problem with the color highlighting of the history line
- allow single letter words into the dictionaries for a, I,...

208. By marmuta

- learning with incremental parameter calculation for kneser-ney smoothing
- around 10 times speed-up of prediction queries

209. By marmuta

switched ngram-test from strings to indices, another 2x speed-up

210. By marmuta

switched ngram-test data structures to python trie as preparation for a C implementation

211. By marmuta

new (temporary) sub-project lm, python extension for a dynamically updatable n-gram language model

212. By marmuta

prediction:
- some clean up of the c++ code and improved code comments
- use python memory manager as often as possible
- added pool allocator for maybe 15% less memory usage
- added save/load to depth-first file format: too little improvement in loading speed, left the old arpa-like one in place
- added python tools split_corpus, entropy/perplexity, ksr (keystroke-savings-rati, see README)
- added a minimal d-bus prediction server + test-client

213. By user <user@dingsdale>

- initial word prediction support for onboard, prediction and learning through D-Bus calls
- prediction service loads, caches and saves language models
- added linear interpolation of language models
- reworked tokenization and moved it into the python extension in pypredict.py
- simplified and fixed all python tools: split_corpus, train, predict, entropy, ksr
- plenty of bug fixes, still more to do

214. By user <user@dingsdale>

- added log-linear interpolation and onboard's simple overlay-algo for merging lms
- comments and cleanups

215. By user <user@dingsdale>

- added two new smoothing options: Witten-Bell and Absolute Discounting
- reworked Kneser-Ney smoothing for more robust normalization
- changed default smoothing to Absolute Discounting
- added new tool analyze for plotting pretty entropy and ksr charts, needs matplotlib
- split_corpus supports additional parameters to influence the size of split texts
- added python unit tests for tokenization and language model normalization
- fixed word insertion bug in onboard; tokenization is fully done via D-Bus now
- fixed a crasher in PoolAllocator
- random fixes and comment updates throughout the code

216. By user <user@dingsdale>

fixed traceback at service startup when the models directory wasn't found'

217. By user <user@dingsdale>

- try multiple encodings in pypredict.read_corpus before giving up, default is [utf-8, latin-1]
- use timeout_add_seconds instead of timeout_add for the autosave timer in gpredict to allow for grouping wakeups

218. By user <user@dingsdale>

- reworked pypredict.split_sentences( )and prettyfied the results of sentence splitting
- fixed erroneous joining of sentences when using texts generated by the split_corpus tool

219. By user <user@dingsdale>

don't commit_input_line() when scrolling with the mouse wheel

220. By user <user@dingsdale>

wrapped DynamicModel and NGramTrie in templates to allow for alternative memory layouts, i.e. recency caching and no kneser-ney parameters if they aren't needed

221. By marmuta

experimental workaround for traceback at reset_clip on lucid

222. By marmuta

Added a new model type CachedDynamicModel for recency based ngram-caching with exponential fall-off over time.
The prediction now remembers recently used ngrams. The current parameters where found by trial and error, need to more thoroughly investigate what works best later.

223. By marmuta

Merge with onboard main

224. By marmuta

Added new D-Bus method lookup_text to get onboards input line display working again.

225. By marmuta

Removed all traces of dictionary auto saving from onboard. The D-Bus service has been saving language models for a while.

226. By Francesco Fumanti

Use utf-8 coding to avoid problems with build_i18n

227. By marmuta

Added makemodels script to create language models for all available aspell dictionaries. Filter models based on the aspell vocabulary.

228. By marmuta

Experimented with loading models in a separate thread with mixed results, disabled again. Python's global interpreter lock complicates things.

229. By marmuta

- Added color feedback to the mouse click buttons
- Fixed old bug in Keyboard.iter_keys() that led to always returning to the main pane when pushing click buttons
- Set color of bright checked buttons to the same as buttons that are "on".

230. By marmuta

- Extended the analyze tool to investigate caching parameters
- Added an optimize tool that tries to find better caching parameters with simulated annealing
- Set new, marginally better caching parameters for recency caching
- Allow floats in addition to ints for recency_halflife property of CachedDynamicModel

231. By marmuta

Merge with onboard main

232. By marmuta

Fixed oversized key labels for small window heights. Fallout from last merge.

233. By marmuta

Merge from main. Needs additional work.

234. By marmuta

Merged with main, additional changes
- Converted to GTK3/gnome introspection
- Moved word prediction gconf keys to gsettings
- Kept mouse click polling for button updates and word learning
- Always convert key labels to unicode to avoid breaking calls to the word predictor
- Regression: input line display disabled because of broken get_char_extents, https://bugzilla.gnome.org/show_bug.cgi?id=654343

235. By marmuta

- Disabled more of the input line; Pango introspection is in bad shape
- Fixed merge mistake in classic layout

236. By marmuta

Merge from trunk. Word prediction technically still works, but the ui needs lots of polishing.

237. By marmuta

Unbreak auto-punctuation.

238. By marmuta

Partially bring back the input line. Introspection of pango attributes is still utterly broken, use parse_markup instead.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: