AUTINDEX is a commercial text mining software package based on sophisticated linguistics. AUTINDEX, resulting from research in information extraction, is a product of the Institute of Applied Information Sciences (IAI) which is a non-profit institute that has been researching and developing language technology since its foundation in 1985. IAI is an institute affiliated to Saarland University in Saarbrücken, Germany. AUTINDEX is the result of a number of research projects funded by the EU (Project BINDEX), by Deutsche Forschungsgemeinschaft and the German Ministry for Economy. Amongst the latter there are the projects LinSearch, and WISSMER, see also the reference to IAI-Website. The basic functionality of AUTINDEX is the extraction of key words from a document to represent the semantics of the document. Ideally the system is integrated with a thesaurus that defines the standardised terms to be used for key word assignment. AUTINDEX is used in library applications (e.g. integrated in dandelon.com) as well as in high quality (expert) information systems, and in document management and content management environments. Together with AUTINDEX a number of additional software comes along such as an integration with Apache Solr / Lucene to provide a complete information retrieval environment, a classification and categorisation system on the basis of a machine learning software that assigns domains to the document, and a system for searching with semantically similar terms that are collected in so called tag clouds.
Trazzler
Trazzler is a travel destination app that specializes in unique and local destinations. The initial concept was developed by Adam Rugel and Biz Stone in 2006 at Twitter's original offices under the name "71 miles". More than 10,000 writers and photographers have contributed and more than $350,000 in freelance contracts have been issued as a result of Trazzeler's weekly writing and photography contests. Investors in the company include SV Angel, AOL Founder Steve Case, and the Twitter founders, Evan Williams, Jack Dorsey, and Biz Stone. The company's partners are the City of Chicago, Hawaii Tourism Authority, Fairmont Hotels & Resorts, Salon.com, and Air New Zealand. Trazzler is designed for use on the iOS, Android, and Facebook.
Inbenta
Inbenta is an AI company that originated in Barcelona, Spain, in 2005. Inbenta is currently headquartered in Allen, Texas, with additional offices in Spain, São Paulo, Brazil, Toulouse, France, and Tokyo, Japan. Inbenta provides natural language processing and semantic search through artificial intelligence. == History == Inbenta raised $12 Million in their Series B funding round to extend the reach of their artificial intelligence for business solutions. In 2023 Inbenta's new chief executive officer Melissa Solis moved Inbenta's headquarters to One Bethany West in Allen, Texas from Foster City, California. == Controversy == On 23 June 2018, Ticketmaster UK identified malicious software on a customer support product hosted by Inbenta Technologies, compromising personal data and payment details for thousands of Ticketmaster customers. Three days later, Inbenta's CEO Issued a message about the incident to convey the full scope of the breach. Also on its FAQ section, Inbenta claimed that "After a careful analysis of all clues and snapshots from our systems, the technical team at Inbenta discovered that the script had been implemented on the payment page. We were unaware of this, and would have advised against doing so had we known, as it presents a point of vulnerability". On November 13, 2020, the Information Commissioner's Office fined Ticketmaster UK Limited £1.25 million for failing to protect customers' payment details. According to the ICO, "It was because of Ticketmaster's business decision to include the [Inbenta] chat bot on its payment page that the chat bot was able to unlawfully process the personal data of customers."
Sense Networks
Sense Networks is a New York City based company with a focus on applications that analyze big data from mobile phones, carrier networks, and taxicabs, particularly by using machine learning technology to make sense of large amounts of location (latitude/longitude) data. In 2009, Sense was named one of "The 25 Most Intriguing Startups in the World" by Bloomberg Businessweek and was called "The Next Google" on the cover of Newsweek. In 2014, Sense Networks was acquired by YP, "the local search and advertising company owned by Cerberus Capital Management and AT&T." It was subsequently sold off to Verve in 2017 == History == Sense Networks was founded by Greg Skibiski in February 2006 (2003?) near his home in Northampton, Massachusetts. After establishing an office in NoHo, New York City near Silicon Alley, Skibiski recruited Alex Pentland, Director of Human Dynamics Research and former Academic Head of the MIT Media Lab, Tony Jebara, Associate Professor and Head of the Machine Learning Laboratory at Columbia University, and Christine Lemke, who would later become co-founders. Sense Networks investors include Intel Capital, Javelin Venture Partners, and Kenan Altunis. Founder Greg Skibiski was pushed out by lead investor Intel Capital in November 2009 following the company's B round of financing. During the same week, the company won the Emerging Communications Conference "Company to Watch" Award. The company has three published patent applications for analyzing sensor data streams: System and Method of Performing Location Analytics (US 20090307263), Comparing Spatial-Temporal Trails in Location Analytics (US 20100079336), and Anomaly Detection in Sensor Analytics (US 20100082301). The company was acquired by the Yellow Pages in 2014. This is a marketing conglomerate under AT&T and Cerberus Capital Management. == Products and services == The Citysense consumer application that shows hotspots of human activity in real-time from mobile phone location and taxicab GPS data was named by ReadWriteWeb (in The New York Times) as "Top 10 Internet of Things Products of 2009". The Cabsense consumer application that shows the best place to catch a New York City taxicab based on GPS data from the vehicle was launched in March 2010. The Macrosense platform is for mobile application providers and mobile phone carriers to analyze billions of customer location data points for predictive analytics in advertising and churn management applications. == Privacy and data ownership == The company allows users to opt-out of their service through their website, and users may monitor their profile through their application. The company does not collect identifiable data (such as phone numbers or names); it collects data received from cellphone to construct anonymous profiles of consumers. This anonymous data/profiles may then be sold to third parties. The company's privacy and data ownership policies are based on The New Deal on Data, as advocated by Alex "Sandy" Pentland, head of the Human Dynamics group at MIT.
Liveness test
A liveness test, liveness check or liveness detection is an automated method for determining whether a subject is a real person or part of a spoofing attack. The technique is used as part of know your customer checks in financial services and during facial age estimation. Liveness detection is a cornerstone of digital safety. == Test process == The threat in face spoofing attacks is that "the attacker only needs to find a good face swap library on Github and understand how to inject the model into the camera feed during the KYC process". Fraudsters usually buy stolen IDs on the dark web to start a deepfake attack. An AI-powered generative adversarial network (GAN) can then generate the face swapping model that many online verification services fail to detect. Low level hackers may use face swapping apps such as SwapFace, DeepFaceLive, and Swapstream (increasing interest for those apps in 2023 according to Google Trends). In a video liveness test, users are typically asked to look into a camera and to move, smile or blink, and features of their moving face may then be compared to that of a still image. Artificial intelligence is used to counter presentation attacks such as deepfakes or users wearing hyperrealistic masks, or video injection attacks. Other forms of liveness test include checking for a pulse when using a fingerprint scanner or checking that a person's voice is not a recording or artificially generated during speaker recognition. == Adoption and certification == In a 2022 report published by the security firm Sensity, it was demonstrated that the liveness test of most US banks was easily cheated with new and publicly-available AI-powered techniques. Many of these banks disregarded the results of the report. In the first half of 2023, the security firm iProov detected a 704% increase in face-swap attacks. In 2023, in the UK, many customers of Ryanair were upset to have to go through many ID verification checks, including liveness tests, before boarding, as the airline was using it as a mean to deter customers to buy tickets through third-party websites. In the first half of 2024 iBeta Quality Assurance issued 18 new ISO/IEC 30107-3 Presentation Attack Detection certificates, raising the cumulative total to 85 since 2018. In January 2024, the Department of Homeland Security (DHS) opened applications from vendors to test their Liveness test. Identity frauds peaked during the COVID-19 lockdown, leading government agencies to take reinforced measures to secure their digital applications.
Trigram
Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "Letter Frequencies in the English Language". == Frequency == Context is very important, varying analysis rankings and percentages are easily derived by drawing from different sample sizes, different authors; or different document types: poetry, science-fiction, technology documentation; and writing levels: stories for children versus adults, military orders, and recipes. Typical cryptanalytic frequency analysis finds that the 16 most common character-level trigrams in English are: Because encrypted messages sent by telegraph often omit punctuation and spaces, cryptographic frequency analysis of such messages includes trigrams that straddle word boundaries. This causes trigrams such as "edt" to occur frequently, even though it may never occur in any one word of those messages. == Examples == The sentence "the quick red fox jumps over the lazy brown dog" has the following word-level trigrams: the quick red quick red fox red fox jumps fox jumps over jumps over the over the lazy the lazy brown lazy brown dog And the word-level trigram "the quick red" has the following character-level trigrams (where an underscore "_" marks a space): the he_ e_q _qu qui uic ick ck_ k_r _re red
GermaNet
GermaNet is a semantic network for the German language. It relates nouns, verbs, and adjectives semantically by grouping lexical units that express the same concept into synsets and by defining semantic relations between these synsets. GermaNet is free for academic use, after signing a license. GermaNet shares much in common with the English WordNet and can be viewed as an online thesaurus or a light-weight ontology. GermaNet has been developed and maintained at the University of Tübingen since 1997 within the research group for General and Computational Linguistics. It has been integrated into the EuroWordNet, a multilingual lexical-semantic database. == Database == === Contents === GermaNet partitions the lexical space into a set of concepts that are interlinked by semantic relations. A semantic concept is modeled by a synset. A synset is a set of words (called lexical units) where all the words are taken to have the same or almost the same meaning. Thus, a synset is a set of synonyms grouped under one definition, or "gloss". In addition to the gloss, synsets are labeled with their syntactic function and accompanied by example sentences for each distinct meaning in the synset. Just as in WordNet, for each word category the semantic space is divided into a number of semantic fields closely related to major nodes in the semantic network: Ort, or "location", Körper, or "body", etc. As of version 20.0 (release November 2025), GermaNet contains: Synsets: 179438 Lexical units: 231500 Literals: 216517 1.29 lexical units per synset Number of conceptual relations: 194367 Number of lexical relations: 13602 (synonymy excluded) Number of split compounds: 130901 Number of Interlingual Index (ILI) records: 28561 Number of Wiktionary sense descriptions: 29539 === Format === All GermaNet data is stored in a PostgreSQL relational database. The database schema follows the internal structure of GermaNet: there are tables to store synsets, lexical units, conceptual and lexical relations, etc. GermaNet data is distributed both in this database format and as XML files. In the XML data, two types of files, one for synsets and the other for relations, represent all data available in the GermaNet database. == Interfaces == There are software libraries and APIs available for Java and Python. These programs are distributed under free-software licenses and provide easy access to all information in various versions of GermaNet. GermaNet Rover is an on-line application that can be used to search for synsets in GermaNet, explore the data associated with them, and calculate the semantic similarity of pairs of synsets. It features visualizations of the hypernym relation and advanced filtering options for synset searching. == Licenses == GermaNet 20.0 (released November 2025) can be distributed under one of the following types of license agreements: Academic Research License Agreement: for the purpose of research at academic institutions. There is no license fee for academic use. Licenses are not given to individual students, and those seeking a license are required to talk to an academic advisor. Research and Development License Agreement: applies to non-academic institutions and research consortia. To be used strictly for technology development and internal research. Commercial License Agreement: applies to non-academic institutions and commercial enterprises. It permits technology development and internal research, as well as giving the non-exclusive right to distribute and market any derived product or service. == Alternatives == Open-de-WordNet is a freely available alternative to GermaNet which is compatible with WordNet. == Linguistic applications == GermaNet has been used for a variety of applications, including: semantic analysis shallow recognition of implicit document structure compound analysis analyzing sectional preferences word sense disambiguation