AI Face Upscale

AI Face Upscale — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • IWork

    IWork

    iWork is an office suite of applications created by Apple for its macOS, iPadOS, and iOS operating systems, and also available cross-platform through the iCloud website. iWork includes the presentation application Keynote, the word-processing and desktop-publishing application Pages, and the spreadsheet application Numbers. Apple's design goals in creating iWork have been to allow Mac users to easily create attractive documents and spreadsheets, making use of macOS's extensive font library, integrated spelling checker, sophisticated graphics APIs and its AppleScript automation framework. The equivalent Microsoft Office applications to Pages, Numbers, and Keynote are Word, Excel, and PowerPoint, respectively. Although Microsoft Office applications cannot open iWork documents, iWork applications can open Office documents for editing, and export documents from iWork's native formats (.pages, .numbers, .key) to Microsoft Office formats (.docx, .xlsx, .pptx, etc.) as well as to PDF files. The oldest application in iWork is Keynote, first released as a standalone application in 2003 for use by Steve Jobs in his presentations. Steve Jobs announced Keynote saying "It's for when your presentation really matters". Pages was released with the first iWork bundle in 2004; Numbers was added in 2007 with the release of iWork '08. The next release, iWork '09, also included beta access to iWork.com, an online service that allowed users to upload and share documents on the web, now integrated into Apple's iCloud service. A version of iWork for iOS was released in 2010 with the first iPad, and the apps have been regularly updated since, including the addition of iPhone support. In 2013, Apple launched iWork web apps in iCloud; even years later, however, their functionality is somewhat limited compared to equivalents on the desktop. iWork was initially sold as a suite for $79, then later at $19.99 per app on OS X and $9.99 per app on iOS. Apple announced in October 2013 that all iOS and OS X devices purchased onwards, whether new or refurbished, would be eligible for a free download of all three iWork apps: after device setup, the user can "claim" the apps on the App Store, after which they are permanently linked to the user’s Apple ID. iWork for iCloud, which also incorporates a document hosting service, is free to all iCloud users. iWork was released for free on macOS and iOS (including older or resold devices) in April 2017. In September 2016, Apple announced that the real-time collaboration feature would be available for all iWork apps. == History == The first version of iWork, iWork '05, was announced on January 11, 2005 at the Macworld Conference & Expo and made available on January 22 in the United States and on January 29 worldwide. iWork '05 comprised two applications: Keynote 2, a presentation creation program, and Pages, a word processor. iWork '05 was sold for US$79. A 30-day trial was also made available for download on Apple's website. Originally IGG Software held the rights to the name iWork. While iWork was billed by Apple as "a successor to AppleWorks", it does not replicate AppleWorks's database and drawing tools. However, iWork integrates with existing applications from Apple's iLife suite through the Media Browser, which allows users to drag and drop music from iTunes, movies from iMovie, and photos from iPhoto and Aperture directly into iWork documents. iWork '06 was released on January 10, 2006 and contained updated versions of both Keynote and Pages. Both programs were released as universal binaries for the first time, allowing them to run natively on both PowerPC processors and the Intel processors used in the new iMac desktop computers and MacBook Pro notebooks which had been announced on the same day as the new iWork suite. The next version of the suite, iWork '08, was announced and released on August 7, 2007 at a special media event at Apple's campus in Cupertino, California. iWork '08, like previous updates, contained updated versions of Keynote and Pages. A new spreadsheet application, Numbers, was also introduced. Numbers differed from other spreadsheet applications, including Microsoft Excel, in that it allowed users to create documents containing multiple spreadsheets on a flexible canvas using a number of built-in templates. iWork '09, was announced on January 6, 2009 and released the same day. It contains updated versions of all three applications in the suite. iWork '09 also included access to a beta version of the iWork.com service, which allowed users to share documents online until that service was decommissioned at the end of July 2012. Users of iWork '09 could upload a document directly from Pages, Keynote, or Numbers and invite others to view it online. Viewers could write notes and comments in the document, and download a copy in iWork, Microsoft Office, or PDF formats. iWork '09 was also released with the Mac App Store on January 6, 2011 at $19.99 per application, and received regular updates after this point, including links to iCloud and a high-DPI version designed to match Apple's MacBook Pro with Retina Display. On January 27, 2010, Apple announced iWork for iPad, to be available as three separate $9.99 applications from the App Store. This version has also received regular updates including a version for pocket iPhone and iPod Touch devices, and an update to take advantage of Retina Display devices and the larger screens of recent iPhones. On October 22, 2013, Apple announced an overhaul of the iWork software for both the Mac and iOS. Both suites were made available via the respective App Stores. The update is free for current iWork owners and was also made available free of charge for anyone purchasing an OS X or iOS device after October 1, 2013. Any user activating the newly free iWork apps on a qualifying device can download the same apps on another iOS or OS X device logged into the same App Store account. The new OS X versions have been criticized for losing features such as multiple selection, linked text boxes, bookmarks, 2-up page views, mail merge, searchable comments, ability to read/export RTF files, default zoom and page count, integration with AppleScript. Apple has provided a road-map for feature re-introduction, stating that it hopes to reintroduce some missing features within the next six months. As of April 1, 2014 a few features—e.g., the ability to set the default zoom—had been reintroduced, though scores had not. Due to using a completely new file format that can work across macOS, Windows, and in most web browsers by using the online iCloud web apps, versions of iWork beginning with iWork 13 and later do not open or allow editing of documents created in versions prior to iWork '09, with users who attempt to open older iWork files being given a pop-up in the new iWork 13 app versions telling them to use the previous iWork '09 (which users may or may not have on their machine) in order to open and edit such files. Accordingly, the current version for OS X (which was initially only compatible with OS X Mavericks 10.9 onwards) moves any previously installed iWork '09 apps to an iWork '09 folder on the users machine (in /Applications/iWork '09/), as a work-around to allow users continued use of the earlier suite in order to open and edit older iWork documents locally on their machine. In October 2015, Apple released an update to mitigate this issue, allowing users to open documents saved in iWork '06 and iWork '08 formats in the latest version of Pages. In 2016, Apple announced that the real-time collaboration feature would be available for all iWork apps, instead of being constrained to using iWork for iCloud. The feature is comparable to Google Docs. == Versions == === Major releases === === Updates === iWork '09 received several updates: iWork 9.0.3 DVD (for Mac OS X 10.5.6 "Leopard" or newer; released August 26, 2010) iWork 9.0.4 (for Mac OS X 10.5.6 "Leopard" or newer; released August 26, 2010) iWork 9.1 (for Mac OS X 10.6.6 "Snow Leopard" or newer; released July 20, 2011) iWork 9.3 (for Mac OS X 10.7.4 "Lion" or newer; released December 4, 2012) The Mac App Store version of iWork was updated on October 15, 2015 for 10.10 "Yosemite" or newer. It is the final release to support 10.10 "Yosemite" and 10.11 "El Capitan". Keynote 6.6, Pages 5.6 and Numbers 3.6 are included. iWork received a major update again on March 28, 2019 with Keynote 9.0, Pages 8.0 and Numbers 6.0. == Components == === Common components === Products in the iWork suite share a number of components, largely as a result of sharing underlying code from the Cocoa and similar shared application programming interfaces (APIs). Among these are the well known universal multilingual spell checker, which can also be found in products like Safari and Mail. Grammar checking, find and replace, style and color pickers are similar examples of design features found throughout the Apple application space. Moreover, the applications

    Read more →
  • Machine translation software usability

    Machine translation software usability

    The sections below give objective criteria for evaluating the usability of machine translation software output. == Stationarity or canonical form == Do repeated translations converge on a single expression in both languages? I.e. does the translation method show stationarity or produce a canonical form? Does the translation become stationary without losing the original meaning? This metric has been criticized as not being well correlated with BLEU (BiLingual Evaluation Understudy) scores. == Adaptive to colloquialism, argot or slang == Is the system adaptive to colloquialism, argot or slang? The French language has many rules for creating words in the speech and writing of popular culture. Two such rules are: (a) The reverse spelling of words such as femme to meuf. (This is called verlan.) (b) The attachment of the suffix -ard to a noun or verb to form a proper noun. For example, the noun faluche means "student hat". The word faluchard formed from faluche colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here: Il y a une chorale falucharde mercredi, venez nombreux, les faluchards chantent des paillardes! ==> There is a choral society falucharde Wednesday, come many, the faluchards sing loose-living women! French argot has three levels of usage: familier or friendly, acceptable among friends, family and peers but not at work grossier or swear words, acceptable among friends and peers but not at work or in family verlan or ghetto slang, acceptable among lower classes but not among middle or upper classes The United States National Institute of Standards and Technology conducts annual evaluations [1] Archived 2009-03-22 at the Wayback Machine of machine translation systems based on the BLEU-4 criterion [2]. A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemented by Gimenez and Amigo [3]. == Well-formed output == Is the output grammatical or well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the following Arabic language input and English language translation result from the Google translator as of 27 December 2006 [4]. This Google translator output doesn't parse using a reasonable English grammar: وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم". ==> And incidents at the push Carbuncles-throwing ritual, which often fall where many of the victims - Prince Nayef pointed to the introduction of "many improvements in bridge Carbuncles God would stop the occurrence of any competing." == Semantics preservation == Do repeated re-translations preserve the semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006: Better a day earlier than a day late. ==> Améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. ==> Pour améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. As noted above and in, this kind of round-trip translation is a very unreliable method of evaluation. == Trustworthiness and security == An interesting peculiarity of Google Translate as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded joke in the English-Spanish dictionary which has some added poignancy given recent events: Heath Ledger is dead ==> Tom Cruise está muerto This raises the issue of trustworthiness when relying on a machine translation system embedded in a Life-critical system in which the translation system has input to a Safety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers. It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as statistical machine translation. Reporters from CNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate". The mistranslation was the subject of much hilarity and speculation on the Internet. If it is an unintended consequence of the use of a method such as statistical machine translation, and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method. In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages. This leads to the issue of whether a particular translation could be considered verifiable. In this case, a converging round-trip translation would be a kind of verification.

    Read more →
  • Language resource

    Language resource

    In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications." According to Bird & Simons (2003), this includes data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar", tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data", and advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards". In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management". == Typology == As of May 2020, no widely used standard typology of language resources has been established (current proposals include the LREMap, METASHARE, and, for data, the LLOD classification). Important classes of language resources include data lexical resources, e.g., machine-readable dictionaries, linguistic corpora, i.e., digital collections of natural language data, linguistic data bases such as the Cross-Linguistic Linked Data collection, tools linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion (e.g., tools for annotating interlinear glossed text such as Toolbox and FLEx, or other language documentation tools), applications for search and retrieval over such data (corpus management systems), for automated annotation (part-of-speech tagging, syntactic parsing, semantic parsing, etc.), metadata and vocabularies vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare (for language resource metadata), the ISO 12620 data category registry (for linguistic features, data structures and annotations within a language resource), or the Glottolog database (identifiers for language varieties and bibliographical database). == Language resource publication, dissemination and creation == A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include: a series of International Conferences on Language Resources and Evaluation (LREC), the European Language Resources Association (ELRA, EU-based), and the Linguistic Data Consortium (LDC, US-based), which represent commercial hosting and dissemination platforms for language resources, the Open Languages Archives Community (OLAC), which provides and aggregates language resource metadata, the Language Resources and Evaluation Journal (LREJ), the European Language Grid is a European platform for language technologies (eg services), data and resources. As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including ISO Technical Committee 37: Terminology and other language and content resources (ISO/TC 37), developing standards for all aspects of language resources, W3C Community Group Best Practices for Multilingual Linked Open Data (BPMLOD), working on best practice recommendations for publishing language resources as Linked Data or in RDF, W3C Community Group Linked Data for Language Technology (LD4LT), working on linguistic annotations on the web and language resource metadata, W3C Community Group Ontology-Lexica (OntoLex), working on lexical resources, the Open Linguistics working group of the Open Knowledge Foundation, working on conventions for publishing and linking open language resources, developing the Linguistic Linked Open Data cloud, the Text Encoding Initiative (TEI), working on XML-based specifications for language resources and digitally edited text.

    Read more →
  • Deep Learning Indaba

    Deep Learning Indaba

    The Deep Learning Indaba is an annual conference and educational event that aims to strengthen machine learning and artificial intelligence (AI) capacity across Africa. Launched in 2017, it brings together students, researchers, industry practitioners, and policymakers from across the African continent. == History == The Deep Learning Indaba began in 2017 at the University of the Witwatersrand with over 300 participants from 23 African countries, offering tutorials in advanced AI topics and featuring notable speakers like Nando de Freitas. In 2018, it expanded to 650 delegates at Stellenbosch University, introducing parallel sessions to encourage collaboration. The 2019 edition in Nairobi, Kenya, reflected further growth, with increasing sponsorship and support from major tech companies like Google and Microsoft. === Deep Learning IndabaX ===

    Read more →
  • Couchbase Server

    Couchbase Server

    Couchbase Server, originally known as Membase, is a source-available, distributed (shared-nothing architecture) multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines. Couchbase Server provided client protocol compatibility with memcached, but added disk persistence, data replication, live cluster reconfiguration, rebalancing and multitenancy with data partitioning. == Product history == Membase was developed by several leaders of the memcached project, who had founded a company, NorthScale, to develop a key-value store with the simplicity, speed, and scalability of memcached, but also the storage, persistence and querying capabilities of a database. The original membase source code was contributed by NorthScale, and project co-sponsors Zynga and Naver Corporation (then known as NHN) to a new project on membase.org in June 2010. On February 8, 2011, the Membase project founders and Membase, Inc. announced a merger with CouchOne (a company with many of the principal players behind CouchDB) with an associated project merger. The merged company was called Couchbase, Inc. In January 2012, Couchbase released Couchbase Server 1.8. In September of 2012, Orbitz said it had changed some of its systems to use Couchbase. In December of 2012, Couchbase Server 2.0 (announced in July 2011) was released and included a new JSON document store, indexing and querying, incremental MapReduce and replication across data centers. == Architecture == Every Couchbase node consists of a data service, index service, query service, and cluster manager component. Starting with the 4.0 release, the three services can be distributed to run on separate nodes of the cluster if needed. In the parlance of Eric Brewer's CAP theorem, Couchbase is normally a CP type system meaning it provides consistency and partition tolerance, or it can be set up as an AP system with multiple clusters. === Cluster manager === The cluster manager supervises the configuration and behavior of all the servers in a Couchbase cluster. It configures and supervises inter-node behavior like managing replication streams and re-balancing operations. It also provides metric aggregation and consensus functions for the cluster, and a RESTful cluster management interface. The cluster manager uses the Erlang programming language and the Open Telecom Platform. ==== Replication and fail-over ==== Data replication within the nodes of a cluster can be controlled with several parameters. In December of 2012, support was added for replication between different data centers. === Data manager === The data manager stores and retrieves documents in response to data operations from applications. It asynchronously writes data to disk after acknowledging to the client. In version 1.7 and later, applications can optionally ensure data is written to more than one server or to disk before acknowledging a write to the client. Parameters define item ages that affect when data is persisted, and how max memory and migration from main-memory to disk is handled. It supports working sets greater than a memory quota per "node" or "bucket". External systems can subscribe to filtered data streams, supporting, for example, full text search indexing, data analytics or archiving. ==== Data format ==== A document is the most basic unit of data manipulation in Couchbase Server. Documents are stored in JSON document format with no predefined schemas. Non-JSON documents can also be stored in Couchbase Server (binary, serialized values, XML, etc.) ==== Object-managed cache ==== Couchbase Server includes a built-in multi-threaded object-managed cache that implements memcached compatible APIs such as get, set, delete, append, prepend etc. ==== Storage engine ==== Couchbase Server has a tail-append storage design that is immune to data corruption, OOM killers or sudden loss of power. Data is written to the data file in an append-only manner, which enables Couchbase to do mostly sequential writes for update, and provide an optimized access patterns for disk I/O. === Performance === A performance benchmark done by Altoros in 2012, compared Couchbase Server with other technologies. Cisco Systems published a benchmark that measured the latency and throughput of Couchbase Server with a mixed workload in 2012. == Licensing and support == Couchbase Server is a packaged version of Couchbase's open source software technology and is available in a community edition without recent bug fixes with an Apache 2.0 license and an edition for commercial use. Couchbase Server builds are available for Ubuntu, Debian, Red Hat, SUSE, Oracle Linux, Microsoft Windows and macOS operating systems. Couchbase has supported software developers' kits for the programming languages .NET, PHP, Ruby, Python, C, Node.js, Java, Go, and Scala. == SQL++ == A query language called SQL++ (formerly called N1QL), is used for manipulating the JSON data in Couchbase, just like SQL manipulates data in RDBMS. It has SELECT, INSERT, UPDATE, DELETE, MERGE statements to operate on JSON data. It was initially announced in March 2015 as "SQL for documents". The SQL++ data model is non-first normal form (N1NF) with support for nested attributes and domain-oriented normalization. The SQL++ data model is also a proper superset and generalization of the relational model. === Example === Like query SELECT FROM `bucket` WHERE email LIKE "%@example.org"; Array query SELECT FROM `bucket` WHERE ANY x IN friends SATISFIES x.name = "Pavan" END; == Couchbase Mobile == Couchbase Mobile / Couchbase Lite is a mobile database providing data replication. Couchbase Lite (originally TouchDB) provides native libraries for offline-first NoSQL databases with built-in peer-to-peer or client-server replication mechanisms. Sync Gateway manages secure access and synchronization of data between Couchbase Lite and Couchbase Server. Couchbase Lite added support for Vector Search in version 3.2, allowing cloud to edge support for vector search in mobile applications. == Uses == Couchbase began as an evolution of Memcached, a high-speed data cache, and can be used as a drop-in replacement for Memcached, providing high availability for memcached application without code changes. Couchbase is used to support applications where a flexible data model, easy scalability, and consistent high performance are required, such as tracking real-time user activity or providing a store of user preferences or online applications. Couchbase Mobile, which stores data locally on devices (usually mobile devices) is used to create “offline-first” applications that can operate when a device is not connected to a network and synchronize with Couchbase Server once a network connection is re-established. The Catalyst Lab at Northwestern University uses Couchbase Mobile to support the Evo application, a healthy lifestyle research program where data is used to help participants improve dietary quality, physical activity, stress, or sleep. Amadeus uses Couchbase with Apache Kafka to support their “open, simple, and agile” strategy to consume and integrate data on loyalty programs for airline and other travel partners. High scalability is needed when disruptive travel events create a need to recognize and compensate high value customers. Starting in 2012, it played a role in LinkedIn's caching systems, including backend caching for recruiter and jobs products, counters for security defense mechanisms, for internal applications. == Alternatives == For caching, Couchbase competes with Memcached and Redis. For document databases, Couchbase competes with other document-oriented database systems. It is commonly compared with MongoDB, Amazon DynamoDB, Oracle RDBMS, DataStax, Google Bigtable, MariaDB, IBM Cloudant, Redis Enterprise, SingleStore, and MarkLogic.

    Read more →
  • Uncertain data

    Uncertain data

    In computer science, uncertain data is data that contains noise that makes it deviate from the correct, intended or original values. In the age of big data, uncertainty or data veracity is one of the defining characteristics of data. Data is constantly growing in volume, variety, velocity and uncertainty (1/veracity). Uncertain data is found in abundance today on the web, in sensor networks, within enterprises both in their structured and unstructured sources. For example, there may be uncertainty regarding the address of a customer in an enterprise dataset, or the temperature readings captured by a sensor due to aging of the sensor. In 2012 IBM called out managing uncertain data at scale in its global technology outlook report that presents a comprehensive analysis looking three to ten years into the future seeking to identify significant, disruptive technologies that will change the world. In order to make confident business decisions based on real-world data, analyses must necessarily account for many different kinds of uncertainty present in very large amounts of data. Analyses based on uncertain data will have an effect on the quality of subsequent decisions, so the degree and types of inaccuracies in this uncertain data cannot be ignored. Uncertain data is found in the area of sensor networks; text where noisy text is found in abundance on social media, web and within enterprises where the structured and unstructured data may be old, outdated, or plain incorrect; in modeling where the mathematical model may only be an approximation of the actual process. When representing such data in a database, an appropriate uncertain database model needs to be selected. == Example data model for uncertain data == One way to represent uncertain data is through probability distributions. Let us take the example of a relational database. There are three main ways to do represent uncertainty as probability distributions in such a database model. In attribute uncertainty, each uncertain attribute in a tuple is subject to its own independent probability distribution. For example, if readings are taken of temperature and wind speed, each would be described by its own probability distribution, as knowing the reading for one measurement would not provide any information about the other. In correlated uncertainty, multiple attributes may be described by a joint probability distribution. For example, if readings are taken of the position of an object, and the x- and y-coordinates stored, the probability of different values may depend on the distance from the recorded coordinates. As distance depends on both coordinates, it may be appropriate to use a joint distribution for these coordinates, as they are not independent. In tuple uncertainty, all the attributes of a tuple are subject to a joint probability distribution. This covers the case of correlated uncertainty, but also includes the case where there is a probability of a tuple not belonging in the relevant relation, which is indicated by all the probabilities not summing to one. For example, assume we have the following tuple from a probabilistic database: Then, the tuple has 10% chance of not existing in the database.

    Read more →
  • PropBank

    PropBank

    PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments. The PropBank project has played a role in research in natural language processing, and has been used in semantic role labelling. == Comparison == PropBank differs from FrameNet, the resource to which it is most frequently compared, in several ways. PropBank is a verb-oriented resource, while FrameNet is centered on the more abstract notion of frames, which generalizes descriptions across similar verbs (e.g. "describe" and "characterize") as well as nouns and other words (e.g. "description"). PropBank does not annotate events or states of affairs described using nouns. PropBank commits to annotating all verbs in a corpus, whereas the FrameNet project chooses sets of example sentences from a large corpus and only in a few cases has annotated longer continuous stretches of text. PropBank-style annotations often remain close to the syntactic level, while FrameNet-style annotations are sometimes more semantically motivated. From the start, PropBank was developed with the idea of serving as training data for machine learning-based semantic role labeling systems in mind. It requires that all arguments to a verb be syntactic constituents and different senses of a word are only distinguished if the differences bear on the arguments. Due to such differences, semantic role labeling with respect to PropBank is often a somewhat easier task than producing FrameNet-style annotations.

    Read more →
  • Visual descriptor

    Visual descriptor

    In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. == Introduction == As a result of the new communication technologies and the massive use of Internet in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or audio and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the search engines for textual contents. Although it is relatively easy to find text with a computer, it is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7 (Motion Picture Expert Group - 7). == Types == Descriptors are the first step to find out the connection between pixels contained in a digital image and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be face recognition. === General information descriptors === General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of signal processing. ==== Color ==== It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of images: Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of frame (GoF) or group-of-pictures (GoP) ==== Texture ==== It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by: Homogeneous texture descriptor (HTD) Texture browsing descriptor (TBD) Edge histogram descriptor (EHD) ==== Shape ==== It contains important semantic information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The shape descriptors are the following ones: Region-based shape descriptor (RSD) Contour-based shape descriptor (CSD) 3-D shape descriptor (3-D SD) ==== Motion ==== It's defined by four different descriptors which describe motion in video sequence. Motion is related to the objects motion in the sequence and to the camera motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing. The descriptor set is the following one: Motion activity descriptor (MAD) Camera motion descriptor (CMD) Motion trajectory descriptor (MTD) Warping and parametric motion descriptor (WMD and PMD) ==== Location ==== Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: Region locator descriptor (RLD) Spatio temporal locator descriptor (STLD) === Specific domain information descriptors === These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information. == Descriptors applications == Among all applications, the most important ones are: Multimedia documents search engines and classifiers. Digital library: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. Personalized electronic news service. Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. Control and filtering of concrete audiovisual content, like violent or pornographic material. Also, authorization for some multimedia content.

    Read more →
  • AsoSoft text corpus

    AsoSoft text corpus

    The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.

    Read more →
  • Natural Language Toolkit

    Natural Language Toolkit

    The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It supports classification, tokenization, stemming, tagging, parsing, and semantic reasoning functionalities. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. == Library highlights == Discourse representation Lexical analysis: Word and text tokenizer n-gram and collocations Part-of-speech tagger Tree model and Text chunker for capturing Named-entity recognition

    Read more →
  • Ugly duckling theorem

    Ugly duckling theorem

    The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias. More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts that any two different objects share the same number of (extensional) properties. The theorem is named after Hans Christian Andersen's 1843 story "The Ugly Duckling", because it shows that a duckling is just as similar to a swan as two swans are to each other. It was derived by Satosi Watanabe in 1969. == Mathematical formula == Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects. There are 2 n {\displaystyle 2^{n}} such ways, the size of the power set of n objects. One can use that to measure the similarity between two objects, and one would see how many sets they have in common. However, one cannot. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2 n − 1 {\displaystyle 2^{n-1}} (half the total number of classes there are). To see this is so, one may imagine each class is represented by an n-bit string (or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2 n {\displaystyle 2^{n}} such strings. As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2 n / 2 {\displaystyle 2^{n}/2} numbers will have bit #1 set to zero, and the second 2 n / 2 {\displaystyle 2^{n}/2} will have it set to one. Within each of those blocks, the top 2 n / 4 {\displaystyle 2^{n}/4} will have bit #2 set to zero and the other 2 n / 4 {\displaystyle 2^{n}/4} will have it as one, so they agree on two blocks of 2 n / 4 {\displaystyle 2^{n}/4} or on half of all the cases, no matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs. Thus, some kind of inductive bias is needed to make judgements to prefer certain categories over others. === Boolean functions === Let x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} be a set of vectors of k {\displaystyle k} booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using Hamming distance. However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as boolean functions of the k {\displaystyle k} original features. The only canonical way to do this is to extend it with all possible Boolean functions. The resulting completed vectors have 2 k {\displaystyle 2^{k}} features. The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features. Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i {\displaystyle i} where the i {\displaystyle i} -th coordinate of x {\displaystyle x} differs from the i {\displaystyle i} -th coordinate of y {\displaystyle y} . Now the completed features contain every Boolean function on k {\displaystyle k} Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k {\displaystyle k} variables over GF(2), segregate the functions into pairs ( f , g ) {\displaystyle (f,g)} where f {\displaystyle f} contains the i {\displaystyle i} -th coordinate as a linear term and g {\displaystyle g} is f {\displaystyle f} without that linear term. Now, for every such pair ( f , g ) {\displaystyle (f,g)} , x {\displaystyle x} and y {\displaystyle y} will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.) == Discussion == A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another". For example, "a barberpole and a zebra would be more similar than a horse and a zebra if the feature striped had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous". Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: "Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic... If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten. In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success." Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar". In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: "Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity. It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on. Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute." According to Woodward, the ugly duckling theorem is related to Schaffer's Conservation Law for Generalization Performance, which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing. The latter result is generalized by Woodward to functions on countably infinite domains.

    Read more →
  • Mark V. Shaney

    Mark V. Shaney

    Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words "Markov chain". Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person. The system was designed by Rob Pike with coding by Bruce Ellis. Don P. Mitchell wrote the Markov chain code, initially demonstrating it to Pike and Ellis using the Tao Te Ching as a basis. They chose to apply it to the net.singles netnews group. The program is fairly simple. It ingests the sample text (the Tao Te Ching, or the posts of a Usenet group) and creates a massive list of every sequence of three successive words (triplet) which occurs in the text. It then chooses two words at random, and looks for a word which follows those two in one of the triplets in its massive list. If there is more than one, it picks at random (identical triplets count separately, so a sequence which occurs twice is twice as likely to be picked as one which only occurs once). It then adds that word to the generated text. Then, in the same way, it picks a triplet that starts with the second and third words in the generated text, and that gives a fourth word. It adds the fourth word, then repeats with the third and fourth words, and so on. This algorithm is called a third-order Markov chain (because it uses sequences of three words). == Examples == A classic example, from 1984, originally sent as a mail message, later posted to net.singles is reproduced here: >From mvs Fri Nov 16 17:11 EST 1984 remote from alice It looks like Reagan is going to say? Ummm... Oh yes, I was looking for. I'm so glad I remembered it. Yeah, what I have wondered if I had committed a crime. Don't eat with your assessment of Reagon and Mondale. Up your nose with a guy from a firm that specifically researches the teen-age market. As a friend of mine would say, "It really doesn't matter"... It looks like Reagan is holding back the arms of the American eating public have changed dramatically, and it got pretty boring after about 300 games. People, having a much larger number of varieties, and are very different from what one can find in Chinatowns across the country (things like pork buns, steamed dumplings, etc.) They can be cheap, being sold for around 30 to 75 cents apiece (depending on size), are generally not greasy, can be adequately explained by stupidity. Singles have felt insecure since we came down from the Conservative world at large. But Chuqui is the way it happened and the prices are VERY reasonable. Can anyone think of myself as a third sex. Yes, I am expected to have. People often get used to me knowing these things and then a cover is placed over all of them. Along the side of the $$ are spent by (or at least for ) the girls. You can't settle the issue. It seems I've forgotten what it is, but I don't. I know about violence against women, and I really doubt they will ever join together into a large number of jokes. It showed Adam, just after being created. He has a modem and an autodial routine. He calls my number 1440 times a day. So I will conclude by saying that I can well understand that she might soon have the time, it makes sense, again, to get the gist of my argument, I was in that (though it's a Republican administration). _-_-_-_-Mark Other quotations from Mark's Usenet posts are: "I spent an interesting evening recently with a grain of salt." (Alternatively reported as "While at a conference a few weeks back, I spent an interesting evening with a grain of salt.") "I hope that there are sour apples in every bushel." (see also sour grapes) == History == In The Usenet Handbook Mark Harrison writes that after September 1981, students joined Usenet en masse, "creating the USENET we know today: endless dumb questions, endless idiots posing as savants, and (of course) endless victims for practical jokes." In December, Rob Pike created the netnews group net.suicide as prank, "a forum for bad jokes". Some users thought it was a legitimate forum, some discussed "riding motorcycles without helmets". At first, most posters were "real people", but soon "characters" began posting. Pike created a "vicious" character named Bimmler. At its peak, net.suicide had ten frequent posters; nine were "known to be characters." But ultimately, Pike deleted the newsgroup because it was too much work to maintain; Bimmler messages were created "by hand". The "obvious alternative" was software, running on a Bell Labs computer created by Bruce Ellis, based on the Markov code by Don Mitchell, which became the online character Mark V. Shaney. Kernighan and Pike listed Mark V. Shaney in the acknowledgements in The Practice of Programming, noting its roots in Mitchell's markov, which, adapted as shaney, was used for "humorous deconstructionist activities" in the 1980s. Dewdney pointed out "perhaps Mark V. Shaney's magnum opus: a 20-page commentary on the deconstructionist philosophy of Jean Baudrillard" directed by Pike, with assistance from Henry S. Baird and Catherine Richards, to be distributed by email. The piece was based on Jean Baudrillard's "The Precession of Simulacra", published in Simulacra and Simulation (1981). == Reception == The program was discussed by A. K. Dewdney in the Scientific American "Computer Recreations" column in 1989, by Penn Jillette in his PC Computing column in 1991, and in several books, including the Usenet Handbook, Bots: the Origin of New Species, Hippo Eats Dwarf: A Field Guide to Hoaxes and Other B.S., and non-computer-related journals such as Texas Studies in Literature and Language. Dewdney wrote about the program's output, "The overall impression is not unlike what remains in the brain of an inattentive student after a late-night study session. Indeed, after reading the output of Mark V. Shaney, I find ordinary writing almost equally strange and incomprehensible!" He noted the reactions of newsgroup users, who have "shuddered at Mark V. Shaney's reflections, some with rage and others with laughter:" The opinions of the new net.singles correspondent drew mixed reviews. Serious users of the bulletin board's services sensed satire. Outraged, they urged that someone "pull the plug" on Mark V. Shaney's monstrous rantings. Others inquired almost admiringly whether the program was a secret artificial intelligence project that was being tested in a human conversational environment. A few may even have thought that Mark V. Shaney was a real person, a tortured schizophrenic desperately seeking a like-minded companion. Concluding, Dewdney wrote, "If the purpose of computer prose is to fool people into thinking that it was written by a sane person, Mark V. Shaney probably falls short." A 2012 article in Observer compared Mark V. Shaney's "strangely beautiful" postings to the Horse_ebooks account on Twitter and music reviews at Pitchfork, saying that "this mash-up of gibberish and human sentiment" is what "made Mark V. Shaney so endlessly fascinating".

    Read more →
  • Amaryllo

    Amaryllo

    Amaryllo Inc. is a multinational company founded in Amsterdam, the Netherlands, and now headquartered in the United States. It operates as a cloud service platform, providing cloud storage and cloud computing solutions to enterprises and brand companies. Amaryllo began with Skype IP camera development, pioneering biometric robotic technologies, encrypted P2P network, and secure cloud storage. Amaryllo was founded by Band of Angels member, Marcus Yang to develop patents for a new type of robotic cameras that is claimed to "talk, hear, sense, recognize human faces, and track intruders". It also claims to have made the world's first security robot based on the WebRTC protocol, Icam PRO FHD, and won the 2015 CES Best of Innovation Award under Embedded Technology category. Its home security robots claim to employ 256-bit encryption and run on the WebRTC protocol. Amaryllo products are sold in over 100 Countries across 6 Continents. == History == Amaryllo revealed its first smart home security products at Internationale Funkausstellung Berlin (IFA) 2013 with a Skype-enabled IP camera called iCam HD. Amaryllo announced its second Skype-certified smart home product, iBabi HD, at CES 2014. The company was chosen as a "Cool Vendor" by Gartner in Connected Home 2014. Amaryllo introduced WebRTC-based smart home products after Microsoft terminated embedded Skype services in mid 2014. Since then, Amaryllo has been developing camera robots with auto-tracking and facial recognition technologies. Its camera robots, ATOM AR3 and ATOM AR3S, were introduced in late 2016. It focuses on wired and wireless technology based on AI services. == Cloud Service Platform == Amaryllo offers prepaid cloud storage through digital codes and gift cards, distributed via InComm Payments, Blackhawk Network, and other partners. It provides high-performance cloud computing service through Rescale partnership. Amaryllo provides free cameras under an annual cloud storage subscription on its website. == Global Supercomputing Network (GSN) == The Global Supercomputing Network (GSN) is a distributed high-performance computing (HPC) platform developed by Amaryllo. The network is designed to provide scalable Infrastructure as a Service (IaaS) by connecting a global array of data centers to offer GPU computing resources for specialized industrial and scientific applications. === Architecture and Technology === GSN operates as a decentralized distributed network of servers rather than a single centralized supercomputer. The platform integrates an artificial intelligence assistant named Genie, also developed by Amaryllo. Genie's primary function is to manage computing allocation, helping users identify and connect to available resources across the network’s various nodes based on the specific requirements of their tasks. === Services === The network primarily focuses on the rental of GPU processing resources, catering to fields that require massive parallel processing capabilities, including: Artificial Intelligence and Machine Learning: Training large language models (LLMs) and neural networks. Scientific Simulations: Executing complex calculations in physics, chemistry, and bioinformatics. Data Analytics: Processing large-scale datasets. By utilizing a rental model, GSN allows organizations to access high-end hardware without the capital expenditure associated with purchasing and maintaining physical server infrastructure. === Infrastructure and Partnerships === The network’s physical footprint is expanded through strategic partnerships with data center operators. GSN collaborates with MettaDC and Cyber DC to provide colocation services. These partnerships facilitate the deployment of Nvidia server clusters within secure, Tier-rated facilities, ensuring high availability and connectivity for GSN users. == Official Brand Licensee of HP == Amaryllo Inc. is an official licensee of HP Inc., managing both B2B and B2C cloud services under the HP brand. Through this partnership, Amaryllo offers a range of secure and scalable cloud solutions, including HP Cloud, which provides subscription and one-time payment storage for reliable data backup and storage for individuals, families, and businesses. HP Cloud employs cloud computing technologies to create smart albums for users.

    Read more →
  • XLNet

    XLNet

    The XLNet was an autoregressive Transformer designed as an improvement over BERT, with 340M parameters and trained on 33 billion words. It was released on 19 June 2019, under the Apache 2.0 license. It achieved state-of-the-art results on a variety of natural language processing tasks, including language modeling, question answering, and natural language inference. == Architecture == The main idea of XLNet is to model language autoregressively like the GPT models, but allow for all possible permutations of a sentence. Concretely, consider the following sentence:My dog is cute.In standard autoregressive language modeling, the model would be tasked with predicting the probability of each word, conditioned on the previous words as its context: We factorize the joint probability of a sequence of words x 1 , … , x T {\displaystyle x_{1},\ldots ,x_{T}} using the chain rule: Pr ( x 1 , … , x T ) = Pr ( x 1 ) Pr ( x 2 | x 1 ) Pr ( x 3 | x 1 , x 2 ) … Pr ( x T | x 1 , … , x T − 1 ) . {\displaystyle \Pr(x_{1},\ldots ,x_{T})=\Pr(x_{1})\Pr(x_{2}|x_{1})\Pr(x_{3}|x_{1},x_{2})\ldots \Pr(x_{T}|x_{1},\ldots ,x_{T-1}).} For example, the sentence "My dog is cute" is factorized as: Pr ( My , dog , is , cute ) = Pr ( My ) Pr ( dog | My ) Pr ( is | My , dog ) Pr ( cute | My , dog , is ) . {\displaystyle \Pr({\text{My}},{\text{dog}},{\text{is}},{\text{cute}})=\Pr({\text{My}})\Pr({\text{dog}}|{\text{My}})\Pr({\text{is}}|{\text{My}},{\text{dog}})\Pr({\text{cute}}|{\text{My}},{\text{dog}},{\text{is}}).} Schematically, we can write it as → My → My dog → My dog is → My dog is cute . {\displaystyle {\texttt {}}{\texttt {}}{\texttt {}}{\texttt {}}\to {\text{My }}{\texttt {}}{\texttt {}}{\texttt {}}\to {\text{My dog }}{\texttt {}}{\texttt {}}\to {\text{My dog is }}{\texttt {}}\to {\text{My dog is cute}}.} However, for XLNet, the model is required to predict the words in a randomly generated order. Suppose we have sampled a randomly generated order 3241, then schematically, the model is required to perform the following prediction task: is dog is dog is cute → My dog is cute {\displaystyle {\texttt {}}{\texttt {}}{\texttt {}}{\texttt {}}\to {\texttt {}}{\texttt {}}{\text{is }}{\texttt {}}\to {\texttt {}}{\text{dog is }}{\texttt {}}\to {\texttt {}}{\text{dog is cute}}\to {\text{My dog is cute}}} By considering all permutations, XLNet is able to capture longer-range dependencies and better model the bidirectional context of words. === Two-Stream Self-Attention === To implement permutation language modeling, XLNet uses a two-stream self-attention mechanism. The two streams are: Content stream: This stream encodes the content of each word, as in standard causally masked self-attention. Query stream: This stream encodes the content of each word in the context of what has gone before. In more detail, it is a masked cross-attention mechanism, where the queries are from the query stream, and the key-value pairs are from the content stream. The content stream uses the causal mask M causal = [ 0 − ∞ − ∞ … − ∞ 0 0 − ∞ … − ∞ 0 0 0 … − ∞ ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 … 0 ] {\displaystyle M_{\text{causal}}={\begin{bmatrix}0&-\infty &-\infty &\dots &-\infty \\0&0&-\infty &\dots &-\infty \\0&0&0&\dots &-\infty \\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\dots &0\end{bmatrix}}} permuted by a random permutation matrix to P M causal P − 1 {\displaystyle PM_{\text{causal}}P^{-1}} . The query stream uses the cross-attention mask P ( M causal − ∞ I ) P − 1 {\displaystyle P(M_{\text{causal}}-\infty I)P^{-1}} , where the diagonal is subtracted away specifically to avoid the model "cheating" by looking at the content stream for what the current masked token is. Like the causal masking for GPT models, this two-stream masked architecture allows the model to train on all tokens in one forward pass. == Training == Two models were released: XLNet-Large, cased: 110M parameters, 24-layer, 1024-hidden, 16-heads XLNet-Base, cased: 340M parameters, 12-layer, 768-hidden, 12-heads. It was trained on a dataset that amounted to 32.89 billion tokens after tokenization with SentencePiece. The dataset was composed of BooksCorpus, and English Wikipedia, Giga5, ClueWeb 2012-B, and Common Crawl. It was trained on 512 TPU v3 chips, for 5.5 days. At the end of training, it still under-fitted the data, meaning it could have achieved lower loss with more training. It took 0.5 million steps with an Adam optimizer, linear learning rate decay, and a batch size of 8192.

    Read more →
  • Graph cuts in computer vision and artificial intelligence

    Graph cuts in computer vision and artificial intelligence

    As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, numerous military applications (eg Automatic target recognition) and many other problems that can be formulated in terms of energy minimization (eg Climate Science and Environmental modelling). Graph cut techniques are now increasingly being used in combination with more general spatial Artificial intelligence techniques (eg to enforce structure in Large language model output to sharpen tumour boundaries and similarly for various Augmented reality, Self-driving car, Robotics, Google Maps applications etc). Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g. normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms). "Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum. == History == The foundational theory of graph cuts in computer vision was first developed by Margaret Greig, Bruce Porteous and Allan Seheult (GPS) of Durham University in a now legendary discussion contribution to Julian Besag's 1986 paper and a more detailed follow on paper in 1989. In the Bayesian statistical context of smoothing noisy images, using a Markov random field as the image prior distribution, they showed with a mathematically beautiful proof how the maximum a posteriori estimate of a binary image can be obtained exactly by maximizing the flow through an associated image network, or graph, involving the introduction of a source and sink and Log-likelihood ratios. The problem was shown to be efficiently solvable exactly, an unexpected result as the problem was believed to be computationally intractable (NP hard). GPS also addressed the computational cost of the max-flow algorithm on large graphs, a significant concern at the time. They proposed a partitioning algorithm (see Section 4 of GPS) involving the recursive amalgamation of non-overlapping blocks, or tiles, which gave a 12X increase in speed. This approach recursively solved and amalgamated independent sub-graphs until the whole graph was solved. While contemporaries like Geman and Geman had advocated Parallel computing in the context of Simulated annealing, the GPS blocking strategy offered a deterministic structure amenable to parallelisation and anticipated modern artificial intelligence design across multiple GPUs. However, until recently, this aspect of the paper was largely ignored and subsequent research focused on Serial computer global search trees, such as the Boykov-Kolmogorov algorithm. Although the general k {\displaystyle k} -colour problem is NP hard for k > 2 , {\displaystyle k>2,} the GPS approach has turned out to have very wide applicability in general computer vision problems. This was first demonstrated by Boykov, Veksler and Zabih who, in a seminal paper published more than 10 years after the original GPS paper, and in other important works, lit the blue touch paper for the general adoption of graph cut techniques in computer vision. They showed that, for general problems, the GPS approach can be applied iteratively to sequences of binary problems, using their now ubiquitous alpha-expansion algorithm, yielding near optimal solutions. Prior to these results, approximate local optimisation techniques such as simulated annealing (as proposed by the Geman brothers) or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) were used to solve such image smoothing problems. Building on these advancements, GPS graph cut optimization was subsequently adapted for interactive image segmentation, most notably through the "GrabCut" algorithm introduced by Carsten Rother, Vladimir Kolmogorov, and Andrew Blake of Microsoft Research, Cambridge. GrabCut extended earlier interactive graph cut methods by replacing monochrome image histograms with Gaussian mixture models to estimate colour distributions, and by employing an iterative GPS energy minimisation scheme. This approach significantly simplified user interaction, requiring only a rough bounding box around the target object rather than detailed user-drawn strokes, and it quickly became a standard tool in both academic research and commercial image editing software. The GPS paper connected and bridged profound ideas from Mathematical statistics (Bayes' theorem, Markov random field), Physics (Ising model), Optimisation (Energy function) and Computer science (Network flow problem) and led the move away from approximate local and slow optimisation approaches (eg simulated annealing) to more powerful exact, or near exact, faster global optimisation techniques. It is now recognised as seminal as it was well ahead of its time and, in particular, was published years before the computing power revolution of Moore's law and GPUs. Significantly, GPS was published in a mathematical statistics (rather than a computer vision) journal, and this led to it being overlooked by the computer vision community for many years. It is unofficially known as "The Velvet Underground" paper of computer vision (ie although very few computer vision people read the paper [bought the record], those that did, most importantly Boykov, Veksler and Zabih, started new and important research [formed a band]). This is confirmed by GPS' very large amplification ratio (2nd order citations/first order citations), estimated at well in excess of 100. Despite the foundational nature of the GPS work, formal recognition from the computer vision community has predominantly gone to the researchers who followed to extend and popularise the graph cut method. For example, Boykov, Veksler and Zabih deservedly received a Helmholtz Prize from the ICCV in 2011. This prize recognises ICCV papers from 10 or more years earlier that have had a significant impact on computer vision research. In 2011, Couprie et al. proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent p {\displaystyle p} . When p = 1 {\displaystyle p=1} , the Power Watershed is optimized by graph cuts, when p = 0 {\displaystyle p=0} the Power Watershed is optimized by shortest paths, p = 2 {\displaystyle p=2} is optimized by the random walker algorithm and p = ∞ {\displaystyle p=\infty } is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms. == Binary segmentation of images == === Notation === Image: x ∈ { R , G , B } N {\displaystyle x\in \{R,G,B\}^{N}} Output: Segmentation (also called opacity) S ∈ R N {\displaystyle S\in R^{N}} (soft segmentation). For hard segmentation S ∈ { 0 for background , 1 for foreground/object to be detected } N {\displaystyle S\in \{0{\text{ for background}},1{\text{ for foreground/object to be detected}}\}^{N}} Energy function: E ( x , S , C , λ ) {\displaystyle E(x,S,C,\lambda )} where C is the color parameter and λ is the coherence parameter. E ( x , S , C , λ ) = E c o l o r + E c o h e r e n c e {\displaystyle E(x,S,C,\lambda )=E_{\rm {color}}+E_{\rm {coherence}}} Optimization: The segmentation can be estimated as a global minimum over S: arg ⁡ min S E ( x , S , C , λ ) {\displaystyle {\arg \min }_{S}E(x,S,C,\lambda )} === Existing methods === Standard Graph cuts: optimize energy function over the segmentation (unknown S value). Iterated Graph cuts: First step optimizes over the color parameters using K-means. Second step performs the usual graph cuts algorithm. These 2 steps are repeated recursively until convergence Dynamic graph cuts:Allows to re-run the algorithm much faster after modifying the problem (e.g. after new seeds have been added by a user). === Energy function === Pr ( x ∣ S ) = K − E {\displaystyle \Pr(x\mid S)=K^{-E}} where the energy E {\displaystyle E} is composed of two different mod

    Read more →