What are trigrams (machine translation)? Thread poster: Samuel Murray
| Samuel Murray Netherlands Local time: 05:16 Member (2006) English to Afrikaans + ...
G'day everyone! There is an opensource machine translation program called PMST at http://www.geocities.com/bryanmceleney/psmt.htm. It speaks of trigrams, which is a requirement for making the program learn a new language. Does anyone know of some resources on trigrams in machine translation context for me? Thanks! (PS. Where's Jeff?) | | | Gad Harel Israel Local time: 06:16 English to German + ... |
A trigram is a model for training a statistical (corpus-taught) natural language system, in this case MT. The idea is that, given the probability of the first two words in any three-word sequence, the system can better predict the next word. A more traditional approach is the bigram model, which looks only at two-word sequences, whereby the first word in the sequence can help predict the second. For example, in the bigram "laptop computer," you can see how "laptop" clues us in to ... See more A trigram is a model for training a statistical (corpus-taught) natural language system, in this case MT. The idea is that, given the probability of the first two words in any three-word sequence, the system can better predict the next word. A more traditional approach is the bigram model, which looks only at two-word sequences, whereby the first word in the sequence can help predict the second. For example, in the bigram "laptop computer," you can see how "laptop" clues us in to the following word "computer." "Computer" has a high probability of following "laptop" in a text, much more so than "rabbit" or "book," or even other parts of speech - "happy," "quickly," "forever," etc. A trigram model simply goes a step deeper, looking at two words before predicting the third. The probabilities we get from a bigram or trigram model are primarily obvious to humans who know the language, but they are tremendously important in helping a computer to "understand" language and grammar through statistical examples in a corpus. In machine translation, trigrams add a layer of context. Rather than translate word-for-word, software can use trigrams to select the best word (probabilistically), given the previous two words in the trigram sequence. This improves overall accuracy. Offhand, I don't know of any sources online to help you, but I do suggest searching for "bigram" or "n-gram" in addition to "trigram." It's all the same concept. Daniel Jurafsky and James Martin's book "Speech and Language Processing" has a well-written chapter on n-grams. (It's a well-known book in the field, so I would expect most libraries to carry it.) I would expect that the software needs a large training corpus (in the target language) from which it will extract trigrams on its own. ▲ Collapse | | | Lia Fail (X) Spain Local time: 05:16 Spanish to English + ... See this link | Apr 24, 2006 |
Hi Samuela, I came across this program a long time ago when looking for concordancers. I wonder would it be of interest? http://www.kwicfinder.com/kfNgram/kfNgramHelp.html kfNgram is a free stand-alone Windows program for linguistic research which generates lists of n-grams in text and HTML files. Here n-gram is understood as a sequen... See more Hi Samuela, I came across this program a long time ago when looking for concordancers. I wonder would it be of interest? http://www.kwicfinder.com/kfNgram/kfNgramHelp.html kfNgram is a free stand-alone Windows program for linguistic research which generates lists of n-grams in text and HTML files. Here n-gram is understood as a sequence of either n words, where n can be any positive integer, also known as lexical bundles, chains, wordgrams, and, in WordSmith, clusters, or else of n characters, also known as chargrams. When not further specified here, n-gram refers to wordgrams. kfNgram also produces and displays lists of "phrase-frames", i.e. groups of wordgrams identical but for a single word. See also http://www.kwicfinder.com/KWiCFinder.html ▲ Collapse | |
|
|
Jeff Allen France Local time: 05:16 Multiplelanguages + ... n-grams and MT | Apr 24, 2006 |
Samuel Murray wrote: Does anyone know of some resources on trigrams in machine translation context for me? (PS. Where's Jeff?) Hi Samuel, sorry for my absence. I indicated in another forum on ProZ a couple of weeks ago that I've been preparing the company I work for a major audit of the entire R&D division. Also had a bunch of other deadlines recently in parallel. I've been popping in and out of ProZ when I have a few minutes, but that has been little lately. Shame that there isn't a "busy" or "gone" status button on ProZ profiles. Jennifer Baldwin's explanation higher above in this thread is a good into to n-gram analysis. I've used it mainly for speech data processing, and only some for MT systems since most MT packages I work with are rule-based systems and are locked up in a commercial package. Statistical and Example based MT use a lot of this n-gram methods. Search on Andy Way at Dublin City University and Michael Carl at the IAI at Saarbrücken with regard to Example based MT. They surely give plenty examples of their use of n-gram stuff. Jeff ======== Jeff Allen, PhD http://www.geocities.com/jeffallenpubs/ | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » What are trigrams (machine translation)? Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |