links · people · groups · tags | My: links · tags · groups · watchlists · notes login · sign up now! | help · blog
Simpy simpy
 
era, member since Jun 19, 2006
.
Search Everyone: "language",

Top "language" experts: pablomarx, sheepdog, avatar, cmanifestations, lonita, macroron,

Groups about "language": Learn a Language The Foreign Languages Group, MultiCore Language, Tagalog Language, Language learning, Japanetics, linguisticality,

1 - 33 of 33   Watch era
 
On-line demo of Xerox's language identifier (commercial) 47 languages, not terribly actively maintained. I believe this was originally created by one of their Finnish researchers in XRCE Grenoble once upon a time ... I also got the impression that this one was the first to make a conscious effort at supporting different character set encodings. Fun Observation: the Danish sample Sentence uses ancient German-Style Capitalization Rules (-: ... and the Norwegian is (predictably) unlabelled, although I believe it's Bokmål. And it's incorrectly punctuated.
by era 2006-06-19 01:25 history · language · language.identification · server · tool · 20060619-0123
http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser-ISO-8859-1.en.html - cached - mail it - history
Yet another verb conjugator
by era 2006-06-19 01:25 language · language.generation · server · tool · web · 20060619-0123
http://www.verbix.com/webverbix/index.asp - cached - mail it - history
by era 2006-06-19 01:25 english · language · reference · 20060619-0123
http://www.usingenglish.com/ - cached - mail it - history
by era 2006-06-19 01:25 certification · language · 20060619-0123
http://www.toeic-europe.com/ - cached - mail it - history
Hilarious Chinese restaurant menu
by era 2006-06-19 01:24 blog · erablog · humor · language · language.translation · 20060619-0123
http://www.rahoi.com/2006/03/may-i-take-your-order.php - cached - mail it - history
Very comprehensive site with links to both tools and corpora
by era 2006-06-19 01:24 03a · corpus · download · language · linguistics · reference · 20060619-0123
http://www-nlp.stanford.edu/links/statnlp.html - cached - mail it - history
WWW search interfaces for translators, using Google hacks -- glossary search, parallel text search, idiom search
by era 2006-06-19 01:24 03a · language · search · server · tool · translation · 20060619-0123
http://www.multilingual.ch/search_interfaces.htm - cached - mail it - history
C reimplementation of TextCat, open source More solid language models than TextCat, and they use a similar format, so you can use the mguesser models with TextCat and vice versa. It's written in C, so it's faster, too. The web page is hideous, but the tool is good. This is available as a Debian package as well. See also
by era 2006-06-19 01:24 02a · download · language · language.identification · opensource · tool · 20060619-0123
http://www.mnogosearch.org/guesser/ - cached - mail it - history
Japanese katakana character chart
by era 2006-06-19 01:24 japanese · language · writing · 20060619-0123
http://www.kids-japan.com/kata-chart.htm - cached - mail it - history
Is the language "Persian" or "Farsi"? (Apparently, "Persian," really.)
by era 2006-06-19 01:24 article · language · language.identification · persian · 20060619-0123
http://www.iranian.com/Features/Dec97/Persian/ - cached - mail it - history
Download the SUSANNE, CHRISTINE, and LUCY English corpora
by era 2006-06-19 01:24 corpus · download · language · linguistics · 20060619-0123
http://www.grsampson.net/index.html - cached - mail it - history
The Pirahã, a Brazilian tribe, cannot count beyond three and their language lacks recursion The comments from readers are outrageous, as usual.
by era 2006-06-19 01:24 blog · culture · erablog · language · science · society · 20060619-0123
http://www.damninteresting.com/?p=545 - cached - mail it - history
Compounding errors in Finnish, and hints for how to learn to write compounds correctly
by era 2006-06-19 01:24 03a · advocacy · finnish · language · reference · writing · yucca · 20060619-0123
http://www.cs.tut.fi/~jkorpela/suomi/yhdyssanat.html - cached - mail it - history
ACL SIGLEX Resource Lists - Electronic Dictionaries
by era 2006-06-19 01:24 corpus · download · language · linguistics · reference · 20060619-0123
http://www.clres.com/dict.html - cached - mail it - history
Vector-space -based language identification (commercial) There's a link there to a paper which was also published at the 32nd Hawaii International Conference on System Sciences (1999) -- I'll try to find that and submit to CiteSeer too. The vector-space cosine distance measure makes more theoretical sense to me than the others I've seen, but I haven't had the time to compare their performance head-to-head.
by era 2006-06-19 01:24 language · language.analysis · language.identification · tool · 20060619-0123
http://www-306.ibm.com/software/globalization/topics/linguini/welcome.jsp - cached - mail it - history
Lisää yhdyssanahuumorintajua
by era 2006-06-19 01:24 finnish · humor · language · language.generation · 20060619-0123
http://venko.net/naapuri/oksennuspussilakana.html - cached - mail it - history
Retroactively compiled on-line archive of linguistic nerd humor
by era 2006-06-19 01:23 deepsite · humor · language · language.analysis · satire · science · writing · 20060619-0123
http://specgram.com/ - cached - mail it - history
The software which runs the languid site, apparently
by era 2006-06-19 01:23 02a · download · language · language.identification · module · opensource · perl · tool · 20060619-0123
http://search.cpan.org/~mceglows/Language-Guess-0.01/ - cached - mail it - history
Gertjan van Noord's language identification tool in Perl, with a demo See also the "competitors" page for links to more similar tools.
by era 2006-06-19 01:23 02a · download · language · language.identification · module · opensource · perl · server · tool · 20060619-0123
http://odur.let.rug.nl/~vannoord/TextCat/ - cached - mail it - history
Parallel corpus collected e.g. from Linux manuals
by era 2006-06-19 01:23 corpus · download · language · linguistics · opensource · 20060619-0123
http://logos.uio.no/opus/ - cached - mail it - history
Massive but variable quality && messy
by era 2006-06-19 01:23 corpus · download · language · linguistics · reference · 20060619-0123
http://lingo.lancs.ac.uk/devotedto/corpora/software.htm - cached - mail it - history
UTF-8 language guesser, sort of TextCat-based (?) ... or so it sez on the TextCat site. It also says the code is GPL but I haven't figured out where to download it, and/or the language models. See also
by era 2006-06-19 01:23 02a · language · language.identification · server · tool · 20060619-0123
http://languid.cantbedone.org/ - cached - mail it - history
More on the Pirahã
by era 2006-06-19 01:23 blog · culture · erablog · language · science · society · 20060619-0123
http://itre.cis.upenn.edu/~myl/languagelog/archives/001387.html - cached - mail it - history
Markus Juutin yhdyssanageneraattori Kuulemma ei haittaa jos levittää linkkiä ... joten laitanpa tännekin. luolamieskuoro konekielisuukko viivakoodiorja banaanivaltiotiede järjestötoimintaelokuva yhdyssanaripuli
by era 2006-06-19 01:23 blog · erablog · finnish · humor · language · language.generation · site · 20060619-0123
http://horna.kicks-ass.net/~setae/tmp/yhdyssanahirvio.php - cached - mail it - history
Comments on a TV show about the savant syndrome I saw the show on TV a couple of months ago. Amazing, but maybe leaning basic Icelandic in a week is not *that* exciting. If you got sponsored to do nothing else for a week, I imagine you could learn some Icelandic as well (though probably not as well as this guy, for most values of "you"). Too lazy to create a digg account just to comment on this, but it's incredible what sorts of conceptions some people have about languages: "Icelandic has linguistic connections many thousands of years back to Lithuanian and Estonian, which probably helped him a little." Hello? Icelandic has more in common with English than with either of those two. (And Icelandic didn't exist as a language "thousands of years" ago, although if you go back far enough, Icelandic and Lithuanian have a common ancestor language.)
by era 2006-06-19 01:23 blog · erablog · language · 20060619-0123
http://digg.com/science/Autistic_Savant_Learns_Icelandic_Language_in_Just_7_Days - cached - mail it - history
As the site grows, it will be increasingly useful to be able to focus on languages you understand Ideally, the site would be able to supply a meaningful default guess for every field, and a user preference for which languages to display and/or suggest. See also the Accept-Language HTTP header. Gertjan van Noord's TextCat is a fairly popular Perl-based language identification module. (It's not actually a proper module, but you can get a modularized version e.g. from the SpamAssassin sources.) Samma på svenska. Ja suomeksikin.
by era 2006-06-19 01:23 blog · bugs · deliriousbugs · deliriouswishlist · erablog · language · language.identification · rubric_0.09 · rubric_0.10 · 20060619-0123
http://de.lirio.us/rubric/entry/5407 - cached - mail it - history
Wyard, Rose (1997)
by era 2006-06-19 01:23 article · citeseer · corpus · language · language.analysis · language.identification · science · similarity · theory · 20060619-0123
http://citeseer.ist.psu.edu/wyard97internet.html - cached - mail it - history
Penelope Sibun, Jeffrey C. Reynar (1996)
by era 2006-06-19 01:23 article · citeseer · language · language.analysis · language.identification · science · similarity · theory · 20060619-0123
http://citeseer.ist.psu.edu/sibun96language.html - cached - mail it - history
Kenneth Beesley (1998). Very crude, but hey, it's very old, too
by era 2006-06-19 01:23 article · citeseer · history · language · language.analysis · language.identification · science · similarity · theory · 20060619-0123
http://citeseer.ist.psu.edu/beesley88language.html - cached - mail it - history
Cavnar, Trenkle (1994) - the popular paper behind TextCat et al. The ranking algorithm is kind of screwy, until you think of it as editing distance in an alphabet where each n-gram is a distinct symbol. Maybe it's still screwy.
by era 2006-06-19 01:23 article · citeseer · language · language.analysis · language.identification · science · similarity · theory · 20060619-0123
http://citeseer.ist.psu.edu/68861.html - cached - mail it - history
Carter (1994) Spoken language models, but still
by era 2006-06-19 01:23 article · citeseer · language · language.analysis · language.identification · science · similarity · theory · 20060619-0123
http://citeseer.ist.psu.edu/23437.html - cached - mail it - history
Kevin P. Scannell`s project and software This is what aspell are planning to use for many new languages.
by era 2006-06-19 01:23 03a · corpus · download · language · linguistics · 20060619-0123
http://borel.slu.edu/crubadan/ - cached - mail it - history
Blankenhorn asserts Kannada is "unpopular" How can a language with more speakers than Dutch, Danish, Norwegian, Swedish, and Finnish combined be called "unpopular"? (To select just one largish geographical region in Europe with several national languages. I'm omitting Icelandic, Frisian, Sami, and a few others which would merely obscure the point. Feel free to include them in the count if you're keeping score.) http://www.ethnologue.com/show_language.asp?code=kan
by era 2006-06-19 01:23 advocacy · blog · erablog · language · opensource · 20060619-0123
http://blogs.zdnet.com/open-source/wp-trackback.php?p=521 - cached - mail it - history
1 - 33 of 33  
Related Tags
 
- exclude ~ optional + require
Add Dates