libtextcat - Language guessing by N-Gram-Based Text Categorization

Property Value
Distribution FreeBSD 11
Repository FreeBSD Ports Latest amd64
Package filename libtextcat-2.2_6.txz
Package name libtextcat
Package version 2.2
Package release 6
Package architecture amd64
Package type txz
Category textproc
Download size 132.29 KB
Installed size 441.38 KB
Libtextcat is a library with functions that implement the classification
technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1].
It was primarily developed for language guessing, a task on which it is known to
perform with near-perfect accuracy.
The central idea of the Cavnar & Trenkle technique is to calculate a
"fingerprint" of a document with an unknown category, and compare this with the
fingerprints of a number of documents of which the categories are known. The
categories of the closest matches are output as the classification. A
fingerprint is a list of the most frequent n-grams occurring in a document,
ordered by frequency. Fingerprints are compared with a simple out-of-place
[1] The document that started it all: William B. Cavnar & John M. Trenkle (1994)
N-Gram-Based Text Categorization, <>.
- DOCS: on


Package Version Architecture Repository
libtextcat-2.2_6.txz 2.2 i386 FreeBSD Ports Quarterly
libtextcat-2.2_6.txz 2.2 amd64 FreeBSD Ports Quarterly
libtextcat-2.2_6.txz 2.2 i386 FreeBSD Ports Latest
libtextcat - - -


Name Value -


Type URL
Binary Package libtextcat-2.2_6.txz
Source Package textproc/libtextcat

Install Howto

Install libtextcat txz package:

# pkg install libtextcat

See Also

Package Description
libtextstyle-0.20.1.txz Text styling library
libthai-0.1.28.txz Thai language support library
libtheora-1.1.1_7.txz Theora video codec for the Ogg multimedia streaming system
libthmap-g2019052401_1.txz Concurrent trie-hash map library
libticables2-1.3.5_1.txz TI calculator link cables library
libticalcs2-1.1.9.txz TI calculator library
libticonv-1.1.5.txz TI calculator character set library
libtifiles2-1.1.7.txz TI calculator file types library
libtmcg-1.3.18.txz C++ library for creating secure and fair online card games
libtnl-1.5.0_7.txz Robust, secure, easy to use cross-platform C++ networking API
libtomcrypt-1.18.2_1.txz Comprehensive, modular, and portable cryptographic toolkit
libtommath-1.1.0_2.txz Comprehensive, modular, and portable mathematical routines
libtool-2.4.6_1.txz Generic shared library support script
libtorrent-0.13.8.txz BitTorrent Library written in C++
libtorrent-rasterbar-1.1.10_5.txz C++ library implementing a BitTorrent client