What are trigrams (machine translation)?
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 05:16
Member (2006)
English to Afrikaans
+ ...
Apr 24, 2006

G'day everyone!

There is an opensource machine translation program called PMST at http://www.geocities.com/bryanmceleney/psmt.htm. It speaks of trigrams, which is a requirement for making the program learn a new language. Does anyone know of some resources on trigrams in machine translation context for me?

Thanks!

(PS. Where's Jeff?)


 
Gad Harel
Gad Harel
Israel
Local time: 06:16
English to German
+ ...
What are trigrams Apr 24, 2006

Hi Samuel,

a nother time thanks for the last reply of LocStudio

now to your question of tigrams, have a look
at several pages

http://www.cs.cmu.edu/afs/cs/project/link/www/homepage.html
http://citeseer.ist.psu.edu/304481.html
... See more
Hi Samuel,

a nother time thanks for the last reply of LocStudio

now to your question of tigrams, have a look
at several pages

http://www.cs.cmu.edu/afs/cs/project/link/www/homepage.html
http://citeseer.ist.psu.edu/304481.html

http://en.wikipedia.org/wiki/Trigrams

http://www.coli.uni-saarland.de/~thorsten/tnt/

http://recycledknowledge.blogspot.com/2005/09/trigrams.html

http://cslu.cse.ogi.edu/nsf/isgw97/reports/sleator.html


regards
gad
Collapse


 
Jennifer Baldwin
Jennifer Baldwin  Identity Verified
Local time: 20:16
French to English
+ ...
Trigrams Apr 24, 2006

A trigram is a model for training a statistical (corpus-taught) natural language system, in this case MT. The idea is that, given the probability of the first two words in any three-word sequence, the system can better predict the next word. A more traditional approach is the bigram model, which looks only at two-word sequences, whereby the first word in the sequence can help predict the second.

For example, in the bigram "laptop computer," you can see how "laptop" clues us in to
... See more
A trigram is a model for training a statistical (corpus-taught) natural language system, in this case MT. The idea is that, given the probability of the first two words in any three-word sequence, the system can better predict the next word. A more traditional approach is the bigram model, which looks only at two-word sequences, whereby the first word in the sequence can help predict the second.

For example, in the bigram "laptop computer," you can see how "laptop" clues us in to the following word "computer." "Computer" has a high probability of following "laptop" in a text, much more so than "rabbit" or "book," or even other parts of speech - "happy," "quickly," "forever," etc. A trigram model simply goes a step deeper, looking at two words before predicting the third.

The probabilities we get from a bigram or trigram model are primarily obvious to humans who know the language, but they are tremendously important in helping a computer to "understand" language and grammar through statistical examples in a corpus.

In machine translation, trigrams add a layer of context. Rather than translate word-for-word, software can use trigrams to select the best word (probabilistically), given the previous two words in the trigram sequence. This improves overall accuracy.

Offhand, I don't know of any sources online to help you, but I do suggest searching for "bigram" or "n-gram" in addition to "trigram." It's all the same concept. Daniel Jurafsky and James Martin's book "Speech and Language Processing" has a well-written chapter on n-grams. (It's a well-known book in the field, so I would expect most libraries to carry it.)

I would expect that the software needs a large training corpus (in the target language) from which it will extract trigrams on its own.
Collapse


 
Lia Fail (X)
Lia Fail (X)  Identity Verified
Spain
Local time: 05:16
Spanish to English
+ ...
See this link Apr 24, 2006

Hi Samuela,

I came across this program a long time ago when looking for concordancers.

I wonder would it be of interest?

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

kfNgram is a free stand-alone Windows program for linguistic research which generates lists of n-grams in text and HTML files. Here n-gram is understood as a sequen
... See more
Hi Samuela,

I came across this program a long time ago when looking for concordancers.

I wonder would it be of interest?

http://www.kwicfinder.com/kfNgram/kfNgramHelp.html

kfNgram is a free stand-alone Windows program for linguistic research which generates lists of n-grams in text and HTML files. Here n-gram is understood as a sequence of either n words, where n can be any positive integer, also known as lexical bundles, chains, wordgrams, and, in WordSmith, clusters, or else of n characters, also known as chargrams. When not further specified here, n-gram refers to wordgrams. kfNgram also produces and displays lists of "phrase-frames", i.e. groups of wordgrams identical but for a single word.

See also http://www.kwicfinder.com/KWiCFinder.html
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 05:16
Multiplelanguages
+ ...
n-grams and MT Apr 24, 2006

Samuel Murray wrote:
Does anyone know of some resources on trigrams in machine translation context for me?

(PS. Where's Jeff?)


Hi Samuel,
sorry for my absence. I indicated in another forum on ProZ a couple of weeks ago that I've been preparing the company I work for a major audit of the entire R&D division.
Also had a bunch of other deadlines recently in parallel.
I've been popping in and out of ProZ when I have a few minutes, but that has been little lately.
Shame that there isn't a "busy" or "gone" status button on ProZ profiles.

Jennifer Baldwin's explanation higher above in this thread is a good into to n-gram analysis. I've used it mainly for speech data processing, and only some for MT systems since most MT packages I work with are rule-based systems and are locked up in a commercial package. Statistical and Example based MT use a lot of this n-gram methods.
Search on Andy Way at Dublin City University and Michael Carl at the IAI at Saarbrücken with regard to Example based MT. They surely give plenty examples of their use of n-gram stuff.

Jeff

========
Jeff Allen, PhD
http://www.geocities.com/jeffallenpubs/


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What are trigrams (machine translation)?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »