AI Data Flow Diagram Generator

AI Data Flow Diagram Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Security of the Java software platform

    Security of the Java software platform

    The Java software platform provides a number of features designed for improving the security of Java applications. This includes enforcing runtime constraints through the use of the Java Virtual Machine (JVM), a security manager that sandboxes untrusted code from the rest of the operating system, and a suite of security APIs that Java developers can utilise. Despite this, criticism has been directed at the programming language, and Oracle, due to an increase in malicious programs that revealed security vulnerabilities in the JVM, which were subsequently not properly addressed by Oracle in a timely manner. == Security features == === The JVM === The binary form of programs running on the Java platform is not native machine code but an intermediate bytecode. The JVM performs verification on this bytecode before running it to prevent the program from performing unsafe operations such as branching to incorrect locations, which may contain data rather than instructions. It also allows the JVM to enforce runtime constraints such as array bounds checking. This means that Java programs are significantly less likely to suffer from memory safety flaws such as buffer overflow than programs written in languages such as C which do not provide such memory safety guarantees. The platform does not allow programs to perform certain potentially unsafe operations such as pointer arithmetic or unchecked type casts. It manages memory allocation and initialization and provides automatic garbage collection which in many cases (but not all) relieves the developer from manual memory management. This contributes to type safety and memory safety. === Security manager === The platform provides a security manager which allows users to run untrusted bytecode in a "sandboxed" environment designed to protect them from malicious or poorly written software by preventing the untrusted code from accessing certain platform features and APIs. For example, untrusted code might be prevented from reading or writing files on the local filesystem, running arbitrary commands with the current user's privileges, accessing communication networks, accessing the internal private state of objects using reflection, or causing the JVM to exit. The security manager also allows Java programs to be cryptographically signed; users can choose to allow code with a valid digital signature from a trusted entity to run with full privileges in circumstances where it would otherwise be untrusted. Users can also set fine-grained access control policies for programs from different sources. For example, a user may decide that only system classes should be fully trusted, that code from certain trusted entities may be allowed to read certain specific files, and that all other code should be fully sandboxed. === Security APIs === The Java Class Library provides a number of APIs related to security, such as standard cryptographic algorithms, authentication, and secure communication protocols. === The sun.misc.Unsafe class === sun.misc.Unsafe is an internal utility class in the Java programming language which is a collection of low-level unsafe operations. While it is not a part of the official Java Class Library, it is called internally by the Java libraries. It resides in an unofficial Java module named jdk.unsupported. Beginning in Java 11, it has been partially migrated to jdk.internal.misc.Unsafe (which resides in module java.base). Its primary feature is to allow direct memory management (similar to C memory management) and memory address manipulation, manipulating objects and fields, thread manipulation, and concurrency primitives. Its declaration is: public final class Unsafe;, and it is a singleton class with a private constructor. It contains the following methods, many of which are declared native (invoking Java Native Interface): static Unsafe getUnsafe(): retrieves the Unsafe instance. It uses sun.reflect.Reflection to do so. int getInt(Object o, long offset): fetches a value (a field or array element) in the object at the given offset. (There are corresponding getBoolean(), getByte(), getShort(), getChar(), getLong(), getFloat(), and getDouble() methods as well.) void putInt(Object o, long offset, int x): stores a value into an object at the given offset. (There are corresponding putBoolean(), putByte(), putShort(), putChar(), putLong(), putFloat(), and putDouble() methods as well.) Object getObject(Object o, long offset): fetches a reference value from an object at the given offset. void putObject(Object o, long offset, Object x): stores a reference value into an object at the given offset. int getInt(long address): fetches a value at the given address. (There are corresponding getBoolean(), getByte(), getShort(), getChar(), getLong(), getFloat(), and getDouble() methods as well.) void putInt(long address, int x): stores a value into the given address. (There are corresponding putBoolean(), putByte(), putShort(), putChar(), putLong(), putFloat(), and putDouble() methods as well.) long getAddress(long address): fetches a native pointer from a given address. void putAddress(long address, long x): stores a native pointer into a given address. long allocateMemory(long bytes): allocates a block of native memory of the given size (similar to malloc()). long reallocateMemory(long address, long bytes): resizes a block of native memory to the given size (similar to realloc()). void setMemory(Object o, long offset, long bytes, byte value), void setMemory(long address, long bytes, byte value): sets all bytes in a block of memory to a fixed value (similar to memset()). void copyMemory(Object srcBase, long srcOffset, Object destBase, long destOffset, long bytes), void copyMemory(long srcAddress, long destAddress, long bytes): sets all bytes in a given block of memory to a copy of another block (similar to memcpy()). void freeMemory(long address): deallocates a block of native memory obtained from allocateMemory() or reallocateMemory(), similar to free()). long staticFieldOffset(Field f): obtains the location of a given field in the storage allocation of its class. long objectFieldOffset(Field f): obtains the location of a given static field in conjunction with staticFieldBase(). Object staticFieldBase(Field f): obtains the location of a given static field in conjunction with staticFieldOffset(). void ensureClassInitialized(Class c): ensures the given class has been initialized. int arrayBaseOffset(Class arrayClass): obtains the offset of the first element in the storage allocation of a given array class. int arrayIndexScale(Class arrayClass): obtains the scale factor for addressing elements in the storage allocation of a given array class. static int addressSize(): obtains the size (in bytes) of a native pointer. int pageSize(): obtains the size (in bytes) of a native memory page. Class defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain): signals to the JVM to define a class without security checks. Class defineAnonymousClass(Class hostClass, byte[] data, Object[] cpPatches): signals to the JVM to define a class but do not make it known to the class loader or system directory. Object allocateInstance(Class cls) throws InstantiationException: allocates an instance of a class without running its constructor. void monitorEnter(Object o): locks an object. void monitorExit(Object o): unlocks an object. boolean tryMonitorEnter(Object o): tries to lock an object, returning whether the lock succeeded. void throwException(Throwable ee): throws an exception without telling the verifier. final boolean compareAndSwapInt(Object o, long offset, int expected, int x): updates a variable to x if it is holding expected, returning whether the operation succeeded. (There are corresponding compareAndSwapLong() and compareAndSwapObject() methods as well.) int getIntVolatile(Object o, long offset): volatile version of getInt(). (There are corresponding getBooleanVolatile(), getByteVolatile(), getShortVolatile(), getCharVolatile(), getLongVolatile(), getFloatVolatile(), getDoubleVolatile(), and getObjectVolatile() methods as well.) void putIntVolatile(Object o, long offset, int x): volatile version of putInt(). (There are corresponding putBooleanVolatile(), putByteVolatile(), putShortVolatile(), putCharVolatile(), putLongVolatile(), putFloatVolatile(), putDoubleVolatile(), and putObjectVolatile() methods as well.) void putOrderedInt(Object o, long offset, int x): version of putIntVolatile() not guaranteeing immediate visibility of storage to other threads. (There are corresponding putOrderedLong() and putOrderedObject() methods as well.) void unpark(Object thread): unblocks a thread. void park(boolean isAbsolute, long time): blocks the current thread. int getLoadAverage(double[] loadavg, int nelems): gets the load average in the system run queue assigned to available processors averaged over various periods of time. void invokeCleaner(ByteBuffe

    Read more →
  • Meta AI

    Meta AI

    Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies. == History == Meta AI was founded in 2013 as Facebook Artificial Intelligence Research (FAIR). It has workspaces in Menlo Park, London, New York City, Paris, Seattle, Pittsburgh, Tel Aviv, and Montreal as of 2025. In 2016, FAIR partnered with Google, Amazon, IBM, and Microsoft in creating the Partnership on Artificial Intelligence to Benefit People and Society. Meta AI was directed by Yann LeCun until 2018, when Jérôme Pesenti succeeded the role. Pesenti is formerly the CTO of IBM's big data group. FAIR's research includes self-supervised learning, generative adversarial networks, document classification and translation, and computer vision. FAIR released Torch deep-learning modules as well as PyTorch in 2017, an open-source machine learning framework, which was subsequently used in several deep learning technologies, such as Tesla's autopilot and Uber's Pyro. That same year, a pair of chatbots were falsely rumored to be discontinued for developing a language that was unintelligible to humans. FAIR clarified that the research had been shut down because they had accomplished their initial goal to understand how languages are generated by their models, rather than out of fear. FAIR was renamed Meta AI following the rebranding that changed Facebook, Inc. to Meta Platforms Inc. On October 1, 2025, Facebook announced "We will soon use your interactions with AI at Meta to personalize the content and ads you see". == Virtual assistant == Meta AI is also the name of the virtual assistant developed by the team, now integrated as a chatbot into Meta's social networking products. It is also available as a subscription-based stand-alone app. The virtual assistant was pre-installed on the second generation of Ray-Ban Meta smartglasses, and can incorporate inputs from the glasses' cameras after an update. It is also available on Quest 2 and newer HMDs. Since May 2024, the chatbot has summarized news from various outlets without linking directly to original articles, including in Canada, where news links are banned on its platforms. This use of news content without compensation and attribution has raised ethical and legal concerns, especially as Meta continues to reduce news visibility on its platforms. == Current research == === Natural language processing and chatbot === Natural language processing is the ability for machines to understand and generate natural language. The team is also researching unsupervised machine translation and multilingual chatbots. ==== Galactica ==== Galactica is a large language model (LLM) designed for generating scientific text. It was available for three days from 15 November 2022, before being withdrawn for generating racist and inaccurate content. ==== Llama ==== Llama is an LLM released in February 2023. As of January 2026, the most recent release is the Llama 4. === Hardware === Meta used CPUs and in-house custom chips before 2022; they switched to Nvidia GPUs since then. MTIA v1, one of their early chips, is designed for the company's content recommendation algorithms. It was fabricated on TSMC's 7 nm process technology and consumed 25W, capable of 51.2 TFlops FP16. == Controversy == The French media outlet Mediapart reports that in 2022, Facebook's parent company illegally used works accumulated by the pirate site LibGen to train its artificial intelligence.

    Read more →
  • Ed (chatbot)

    Ed (chatbot)

    Ed was a chatbot co-developed by the Los Angeles Unified School District and AllHere Education. Described as a learning acceleration platform, it was the first personal assistant for students in the United States. Part of the district's Individual Acceleration Plan, it was able to interact with students both verbally and visually, offering support in 100 languages. The chatbot was launched on March 20, 2024, as part of the district's plan for academic recovery from the COVID-19 pandemic and to improve overall academic performance. Utilizing artificial intelligence, Ed organizes data and reports on grades, test scores, and attendance, creating individualized plans for each student. After the company behind it, AllHere, collapsed, the district shuttered operations of the chatbot on June 14, 2024. The firm is under investigation by the US Federal Bureau of Investigation. == History == On February 14, 2022, Alberto M. Carvalho became the Superintendent of the Los Angeles Unified School District, pledging to give the district a full academic recovery from the COVID-19 pandemic. In December 2022, he announced the Individual Acceleration Plan for the district, which aimed to provide each student with a unique progress report and help them determine if they were on track to graduate. The district faced criticism from disability advocates for its management of Individualized Education Programs, and in April 2022, the United States Department of Education announced that the district had failed to provide appropriate educational services to students with disabilities during the pandemic. The district had been grappling with significant absenteeism issues since the pandemic, which led to declining academic performance and disengagement among students. On February 17, 2023, the district issued a request for proposals to develop a fully integrated portal system. Later that year, they signed a $6 million, five-year contract with AllHere Education, a Boston-based company founded in 2016. The introduction of Ed follows the public launch of ChatGPT, which has been utilized by both teachers and students in educational settings. On August 4, 2023, during an annual address at the Walt Disney Concert Hall, Carvalho and the Los Angeles Unified School District announced the launch of Ed. The district invested $4 million into the chatbot, with Carvalho noting that this cost would be halved thanks to donor and grant funding. The chatbot was launched on March 20, 2024. Following its launch, a press conference was held to address security and technology concerns. Carvalho stated that the district had collaborated with security companies and incorporated filters to screen for threatening language. Months after its launch, AllHere Education furloughed most of its staff on June 14, citing their “current financial position” on its website as the reason. After learning about the furlough, the district terminated its dealings with AllHere Education. However, it stated its intention to bring the chatbot back in the future once officials determine the best course of action. Carvalho announced that he would appoint an independent task force to review what went wrong with AllHere Education and the chatbot. On February 25, 2026, the FBI served a search warrant on Carvalho’s home and office in connection with AllHere. The FBI also raided the LAUSD's headquarters. == Service == The chatbot was described as a personal assistant and a "one-stop shop for parents and students" who want to see information about a student's attendance and grades, as well as other resources from the district. Additionally, the application can function as an alarm clock, provide daily lunch menus from the school cafeteria, and offer updates on the location of school buses. The chatbot also helps students and parents who do not speak English as their first language by translating displayed information into approximately 100 different languages. The application can also help with submitting applications and give updates on progress and upcoming assignments. The district stated that the primary goal of Ed was to actively motivate students to complete homework and other tasks. == Reception == The chatbot received a mostly positive reception among parents and observers upon its launch. Some parents and teachers expressed caution about the technology, voicing concerns that the district's push for its implementation lacked public accountability. Rob Nelson from the University of Pennsylvania described the district's strategy as risky, saying that the release felt "like the beginning of a Clippy-level disaster". After the chatbot's shutdown, The 74 criticized it for misusing student data. Chris Whiteley, a former software engineer at AllHere Education, alleged that the data collected by the chatbot likely violated the district's data privacy rules.

    Read more →
  • Seq2seq

    Seq2seq

    Seq2seq is a family of machine learning approaches used for natural language processing. Originally developed by Lê Viết Quốc, a Vietnamese computer scientist and a machine learning pioneer at Google Brain, this framework has become foundational in many modern AI systems. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. == History == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. seq2seq is an approach to machine translation (or more generally, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the encoder), and then maps it back to an output sequence using another neural network (the decoder). The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014. In the seq2seq as proposed by them, both the encoder and the decoder were LSTMs. This had the "bottleneck" problem, since the encoding vector has a fixed size, so for long input sequences, information would tend to be lost, as they are difficult to fit into the fixed-length encoding vector. The attention mechanism, proposed in 2014, resolved the bottleneck problem. They called their model RNNsearch, as it "emulates searching through a source sentence during decoding a translation". A problem with seq2seq models at this point was that recurrent neural networks are difficult to parallelize. The 2017 publication of Transformers resolved the problem by replacing the encoding RNN with self-attention Transformer blocks ("encoder blocks"), and the decoding RNN with cross-attention causally-masked Transformer blocks ("decoder blocks"). === Priority dispute === One of the papers cited as the originator for seq2seq is (Sutskever et al 2014), published at Google Brain while they were on Google's machine translation project. The research allowed Google to overhaul Google Translate into Google Neural Machine Translation in 2016. Tomáš Mikolov claims to have developed the idea (before joining Google Brain) of using a "neural language model on pairs of sentences... and then [generating] translation after seeing the first sentence"—which he equates with seq2seq machine translation, and to have mentioned the idea to Ilya Sutskever and Quoc Le (while at Google Brain), who failed to acknowledge him in their paper. Mikolov had worked on RNNLM (using RNN for language modelling) for his PhD thesis, and is more notable for developing word2vec. == Architecture == The main reference for this section is. === Encoder === The encoder is responsible for processing the input sequence and capturing its essential information, which is stored as the hidden state of the network and, in a model with attention mechanism, a context vector. The context vector is the weighted sum of the input hidden states and is generated for every time instance in the output sequences. === Decoder === The decoder takes the context vector and hidden states from the encoder and generates the final output sequence. The decoder operates in an autoregressive manner, producing one element of the output sequence at a time. At each step, it considers the previously generated elements, the context vector, and the input sequence information to make predictions for the next element in the output sequence. Specifically, in a model with attention mechanism, the context vector and the hidden state are concatenated together to form an attention hidden vector, which is used as an input for the decoder. The seq2seq method developed in the early 2010s uses two neural networks: an encoder network converts an input sentence into numerical vectors, and a decoder network converts those vectors to sentences in the target language. The Attention mechanism was grafted onto this structure in 2014 and is shown below. Later it was refined into the encoder-decoder Transformer architecture of 2017. === Training vs prediction === There is a subtle difference between training and prediction. During training time, both the input and the output sequences are known. During prediction time, only the input sequence is known, and the output sequence must be decoded by the network itself. Specifically, consider an input sequence x 1 : n {\displaystyle x_{1:n}} and output sequence y 1 : m {\displaystyle y_{1:m}} . The encoder would process the input x 1 : n {\displaystyle x_{1:n}} step by step. After that, the decoder would take the output from the encoder, as well as the as input, and produce a prediction y ^ 1 {\displaystyle {\hat {y}}_{1}} . Now, the question is: what should be input to the decoder in the next step? A standard method for training is "teacher forcing". In teacher forcing, no matter what is output by the decoder, the next input to the decoder is always the reference. That is, even if y ^ 1 ≠ y 1 {\displaystyle {\hat {y}}_{1}\neq y_{1}} , the next input to the decoder is still y 1 {\displaystyle y_{1}} , and so on. During prediction time, the "teacher" y 1 : m {\displaystyle y_{1:m}} would be unavailable. Therefore, the input to the decoder must be y ^ 1 {\displaystyle {\hat {y}}_{1}} , then y ^ 2 {\displaystyle {\hat {y}}_{2}} , and so on. It is found that if a model is trained purely by teacher forcing, its performance would degrade during prediction time, since generation based on the model's own output is different from generation based on the teacher's output. This is called exposure bias or a train/test distribution shift. A 2015 paper recommends that, during training, randomly switch between teacher forcing and no teacher forcing. === Attention for seq2seq === The attention mechanism is an enhancement introduced by Bahdanau et al. in 2014 to address limitations in the basic Seq2Seq architecture where a longer input sequence results in the hidden state output of the encoder becoming irrelevant for the decoder. It enables the model to selectively focus on different parts of the input sequence during the decoding process. At each decoder step, an alignment model calculates the attention score using the current decoder state and all of the attention hidden vectors as input. An alignment model is another neural network model that is trained jointly with the seq2seq model used to calculate how well an input, represented by the hidden state, matches with the previous output, represented by attention hidden state. A softmax function is then applied to the attention score to get the attention weight. In some models, the encoder states are directly fed into an activation function, removing the need for alignment model. An activation function receives one decoder state and one encoder state and returns a scalar value of their relevance. Consider the seq2seq language English-to-French translation task. To be concrete, let us consider the translation of "the zone of international control ", which should translate to "la zone de contrôle international ". Here, we use the special token as a control character to delimit the end of input for both the encoder and the decoder. An input sequence of text x 0 , x 1 , … {\displaystyle x_{0},x_{1},\dots } is processed by a neural network (which can be an LSTM, a Transformer encoder, or some other network) into a sequence of real-valued vectors h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , where h {\displaystyle h} stands for "hidden vector". After the encoder has finished processing, the decoder starts operating over the hidden vectors, to produce an output sequence y 0 , y 1 , … {\displaystyle y_{0},y_{1},\dots } , autoregressively. That is, it always takes as input both the hidden vectors produced by the encoder, and what the decoder itself has produced before, to produce the next output word: ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , "") → "la" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la") → "la zone" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone") → "la zone de" ... ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone de contrôle international") → "la zone de contrôle international " Here, we use the special token as a control character to delimit the start of input for the decoder. The decoding terminates as soon as "" appears in the decoder output. ==

    Read more →
  • Bitstrips

    Bitstrips

    Bitstrips, Inc. was a Canadian media and technology company based in Toronto, founded in 2007 by Jacob Blackstock, David Kennedy, Shahan Panth, Dorian Baldwin, and Jesse Brown. The company created and offered a web application, Bitstrips.com, which allowed users to create comic strips using personalized avatars, and preset templates and poses. Brown and Blackstock explained that the service was meant to enable self-expression without the need to have artistic skills. Bitstrips was first presented in 2008 at South by Southwest in Austin, Texas, and the service later piloted and launched a version designed for use as educational software. The service achieved increasing prominence following the launch of versions for Facebook and mobile platforms. In 2014, Bitstrips launched a spin-off app known as Bitmoji, which allows users to create personalized stickers for use in instant messaging. In July 2016, Snapchat Inc. announced that it had acquired the company; the Bitstrips comic service was shut down, but Bitmoji remains operational, and has subsequently been given greater prominence within Snapchat's overall platform. == History == Bitstrips was co-developed by Toronto-based comic artist Jacob Blackstock and his high school friend, journalist Jesse Brown. The service was originally envisioned as a means to allow anyone to create their own comic strip without needing artistic skills. Brown explained that "it's so difficult and time-consuming to tell a story in comic book form, drawing the same characters again and again in these tiny little panels, and just the amount of craftsmanship required. And even if you can do it well, which I never could, it takes years to make a story." Brown stated that the service would be "groundwork for a whole new way to communicate", and went as far as describing the service as being a "YouTube for comics". Blackstock explained that the concept of Bitstrips was influenced by his own use of comics as a form of socialization; a student, Blackstock and his friends drew comics featuring each other and shared them during classes. He felt that Bitstrips was a "medium for self-expression", stating that "It's not just about you making the comics, but since you and your friends star in these comics, it's like you're the medium. The visual nature of comics just speaks so much louder than text." The service was publicly unveiled at South by Southwest in 2008. In 2009, the service introduced a version oriented towards the educational market, Bitstrips for Schools, which was initially piloted at a number of schools in Ontario. The service was praised by educators for being engaging to students, especially within language classes. Brown noted that students were using the service to create comics outside of class as well, stating that it was "so gratifying and shocking what people do with your tool to make their own stories in ways that you never would have anticipated. Some of them are just brilliant." In December 2012, Bitstrips launched a version for Facebook; by July 2013, Bitstrips had 10 million unique users on Facebook, having created over 50 million comics. In October 2013, Bitstrips launched a mobile app; in two months, Bitstrips became a top-downloaded app in 40 countries, and over 30 million avatars had been created with it. In November 2013, Bitstrips secured a round of funding from Horizons Ventures and Li Ka-shing. In October 2014, Bitstrips launched Bitmoji, a spin-off app that allows users to create stickers featuring Bitstrips characters in various templates. In July 2016, following unconfirmed reports earlier in the year, Snapchat Inc. announced that it had acquired Bitstrips. The company's staff continue to operate out of Toronto, but the original Bitstrips comic service was shut down in favour of focusing exclusively on Bitmoji, leaving many Bitstrips users to call for a reboot of the comic service.

    Read more →
  • Version space learning

    Version space learning

    Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space is a disjunction H 1 ∨ H 2 ∨ . . . ∨ H n {\displaystyle H_{1}\lor H_{2}\lor ...\lor H_{n}} (i.e., one or more of hypotheses 1 through n are true). A version space learning algorithm is presented with examples, which it will use to restrict its hypothesis space; for each example x, the hypotheses that are inconsistent with x are removed from the space. This iterative refining of the hypothesis space is called the candidate elimination algorithm, the hypothesis space maintained inside the algorithm, its version space. == The version space algorithm == In settings where there is a generality-ordering on hypotheses, it is possible to represent the version space by two sets of hypotheses: (1) the most specific consistent hypotheses, and (2) the most general consistent hypotheses, where "consistent" indicates agreement with observed data. The most specific hypotheses (i.e., the specific boundary SB) cover the observed positive training examples, and as little of the remaining feature space as possible. These hypotheses, if reduced any further, exclude a positive training example, and hence become inconsistent. These minimal hypotheses essentially constitute a (pessimistic) claim that the true concept is defined just by the positive data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be negative. (I.e., if data has not previously been ruled in, then it's ruled out.) The most general hypotheses (i.e., the general boundary GB) cover the observed positive training examples, but also cover as much of the remaining feature space without including any negative training examples. These, if enlarged any further, include a negative training example, and hence become inconsistent. These maximal hypotheses essentially constitute a (optimistic) claim that the true concept is defined just by the negative data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be positive. (I.e., if data has not previously been ruled out, then it's ruled in.) Thus, during learning, the version space (which itself is a set – possibly infinite – containing all consistent hypotheses) can be represented by just its lower and upper bounds (maximally general and maximally specific hypothesis sets), and learning operations can be performed just on these representative sets. After learning, classification can be performed on unseen examples by testing the hypothesis learned by the algorithm. If the example is consistent with multiple hypotheses, a majority vote rule can be applied. == Historical background == The notion of version spaces was introduced by Mitchell in the early 1980s as a framework for understanding the basic problem of supervised learning within the context of solution search. Although the basic "candidate elimination" search method that accompanies the version space framework is not a popular learning algorithm, there are some practical implementations that have been developed (e.g., Sverdlik & Reynolds 1992, Hong & Tsang 1997, Dubois & Quafafou 2002). A major drawback of version space learning is its inability to deal with noise: any pair of inconsistent examples can cause the version space to collapse, i.e., become empty, so that classification becomes impossible. One solution of this problem is proposed by Dubois and Quafafou that proposed the Rough Version Space, where rough sets based approximations are used to learn certain and possible hypothesis in the presence of inconsistent data.

    Read more →
  • Phase congruency

    Phase congruency

    Phase congruency is a measure of feature significance in computer images, a method of edge detection that is particularly robust against changes in illumination and contrast. == Foundations == Phase congruency reflects the behaviour of the image in the frequency domain. It has been noted that edgelike features have many of their frequency components in the same phase. The concept is similar to coherence, except that it applies to functions of different wavelength. For example, the Fourier decomposition of a square wave consists of sine functions, whose frequencies are odd multiples of the fundamental frequency. At the rising edges of the square wave, each sinusoidal component has a rising phase; the phases have maximal congruency at the edges. This corresponds to the human-perceived edges in an image where there are sharp changes between light and dark. == Definition == Phase congruency compares the weighted alignment of the Fourier components of a signal A n {\displaystyle A_{\rm {n}}} with the sum of the Fourier components. P C ( t ) = max ϕ ¯ ∑ n A n cos ⁡ ( ϕ n ( t ) − ϕ ¯ ) ∑ n A n {\displaystyle PC(t)=\max _{\bar {\phi }}{\frac {\sum _{\rm {n}}A_{\rm {n}}\cos(\phi _{\rm {n}}(t)-{\bar {\phi }})}{\sum _{\rm {n}}A_{n}}}} where ϕ n {\displaystyle \phi _{\rm {n}}} is the local or instantaneous phase as can be calculated using the Hilbert transform and A n {\displaystyle A_{\rm {n}}} are the local amplitude, or energy, of the signal. When all the phases are aligned, this is equal to 1. Several ways of implementing phase congruency have been developed, of which two versions are available in open source, one written for MATLAB and the other written in Java as a plugin for the ImageJ software. Given the different notations used for its formulation, a unified version has been recently presented, where a methodology for the parameter tuning is also presented. == Advantages == The square-wave example is naive in that most edge detection methods deal with it equally well. For example, the first derivative has a maximal magnitude at the edges. However, there are cases where the perceived edge does not have a sharp step or a large derivative. The method of phase congruency applies to many cases where other methods fail. A notable example is an image feature consisting of a single line, such as the letter "l". Many edge-detection algorithms will pick up two adjacent edges: the transitions from white to black, and black to white. On the other hand, the phase congruency map has a single line. A simple Fourier analogy of this case is a triangle wave. In each of its crests there is a congruency of crests from different sinusoidal functions. == Disadvantages == Calculating the phase congruency map of an image is very computationally intensive, and sensitive to image noise. Techniques of noise reduction are usually applied prior to the calculation.

    Read more →
  • Open information extraction

    Open information extraction

    In natural language processing, open information extraction (OIE) is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary propositions. == Overview == A proposition can be understood as truth-bearer, a textual expression of a potential fact (e.g., "Dante wrote the Divine Comedy"), represented in an amenable structure for computers [e.g., ("Dante", "wrote", "Divine Comedy")]. An OIE extraction normally consists of a relation and a set of arguments. For instance, ("Dante", "passed away in" "Ravenna") is a proposition formed by the relation "passed away in" and the arguments "Dante" and "Ravenna". The first argument is usually referred as the subject while the second is considered to be the object. The extraction is said to be a textual representation of a potential fact because its elements are not linked to a knowledge base. Furthermore, the factual nature of the proposition has not yet been established. In the above example, transforming the extraction into a full fledged fact would first require linking, if possible, the relation and the arguments to a knowledge base. Second, the truth of the extraction would need to be determined. In computer science transforming OIE extractions into ontological facts is known as relation extraction. In fact, OIE can be seen as the first step to a wide range of deeper text understanding tasks such as relation extraction, knowledge-base construction, question answering, semantic role labeling. The extracted propositions can also be directly used for end-user applications such as structured search (e.g., retrieve all propositions with "Dante" as subject). OIE was first introduced by TextRunner developed at the University of Washington Turing Center headed by Oren Etzioni. Other methods introduced later such as Reverb, OLLIE, ClausIE or CSD helped to shape the OIE task by characterizing some of its aspects. At a high level, all of these approaches make use of a set of patterns to generate the extractions. Depending on the particular approach, these patterns are either hand-crafted or learned. == OIE systems and contributions == Reverb suggested the necessity to produce meaningful relations to more accurately capture the information in the input text. For instance, given the sentence "Faust made a pact with the devil", it would be erroneous to just produce the extraction ("Faust", "made", "a pact") since it would not be adequately informative. A more precise extraction would be ("Faust", "made a pact with", "the devil"). Reverb also argued against the generation of overspecific relations. OLLIE stressed two important aspects for OIE. First, it pointed to the lack of factuality of the propositions. For instance, in a sentence like "If John studies hard, he will pass the exam", it would be inaccurate to consider ("John", "will pass", "the exam") as a fact. Additionally, the authors indicated that an OIE system should be able to extract non-verb mediated relations, which account for significant portion of the information expressed in natural language text. For instance, in the sentence "Obama, the former US president, was born in Hawaii", an OIE system should be able to recognize a proposition ("Obama", "is", "former US president"). ClausIE introduced the connection between grammatical clauses, propositions, and OIE extractions. The authors stated that as each grammatical clause expresses a proposition, each verb mediated proposition can be identified by solely recognizing the set of clauses expressed in each sentence. This implies that to correctly recognize the set of propositions in an input sentence, it is necessary to understand its grammatical structure. The authors studied the case in the English language that only admits seven clause types, meaning that the identification of each proposition only requires defining seven grammatical patterns. The finding also established a separation between the recognition of the propositions and its materialization. In a first step, the proposition can be identified without any consideration of its final form, in a domain-independent and unsupervised way, mostly based on linguistic principles. In a second step, the information can be represented according to the requirements of the underlying application, without conditioning the identification phase. Consider the sentence "Albert Einstein was born in Ulm and died in Princeton". The first step will recognize the two propositions ("Albert Einstein", "was born", "in Ulm") and ("Albert Einstein", "died", "in Princeton"). Once the information has been correctly identified, the propositions can take the particular form required by the underlying application [e.g., ("Albert Einstein", "was born in", "Ulm") and ("Albert Einstein", "died in", "Princeton")]. CSD introduced the idea of minimality in OIE. It considers that computers can make better use of the extractions if they are expressed in a compact way. This is especially important in sentences with subordinate clauses. In these cases, CSD suggests the generation of nested extractions. For example, consider the sentence "The Embassy said that 6,700 Americans were in Pakistan". CSD generates two extractions [i] ("6,700 Americans", "were", "in Pakistan") and [ii] ("The Embassy", "said", "that [i]"). This is usually known as reification.

    Read more →
  • Embodied cognitive science

    Embodied cognitive science

    Embodied cognitive science is an interdisciplinary field of research, the aim of which is to explain the mechanisms underlying intelligent behavior. It comprises three main methodologies: the modeling of psychological and biological systems in a holistic manner that considers the mind and body as a single entity; the formation of a common set of general principles of intelligent behavior; and the experimental use of robotic agents in controlled environments. == Contributors == Embodied cognitive science borrows heavily from embodied philosophy and the related research fields of cognitive science, psychology, neuroscience and artificial intelligence. Contributors to the field include: From the perspective of neuroscience, Gerald Edelman of the Neurosciences Institute at La Jolla, Francisco Varela of CNRS in France, and J. A. Scott Kelso of Florida Atlantic University From the perspective of psychology, Lawrence Barsalou, Michael Turvey, Vittorio Guidano and Eleanor Rosch From the perspective of linguistics, Gilles Fauconnier, George Lakoff, Mark Johnson, Leonard Talmy and Mark Turner From the perspective of language acquisition, Eric Lenneberg and Philip Rubin at Haskins Laboratories From the perspective of anthropology, Edwin Hutchins, Bradd Shore, James Wertsch and Merlin Donald. From the perspective of autonomous agent design, early work is sometimes attributed to Rodney Brooks or Valentino Braitenberg From the perspective of artificial intelligence, Understanding Intelligence by Rolf Pfeifer and Christian Scheier or How the Body Shapes the Way We Think, by Rolf Pfeifer and Josh C. Bongard From the perspective of philosophy, Andy Clark, Dan Zahavi, Shaun Gallagher, and Evan Thompson In 1950, Alan Turing proposed that a machine may need a human-like body to think and speak: It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. That process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again, I do not know what the right answer is, but I think both approaches should be tried. == Traditional cognitive theory == Embodied cognitive science is an alternative theory to cognition in which it minimizes appeals to computational theory of mind in favor of greater emphasis on how an organism's body determines how and what it thinks. Traditional cognitive theory is based mainly around symbol manipulation, in which certain inputs are fed into a processing unit that produces an output. These inputs follow certain rules of syntax, from which the processing unit finds semantic meaning. Thus, an appropriate output is produced. For example, a human's sensory organs are its input devices, and the stimuli obtained from the external environment are fed into the nervous system which serves as the processing unit. From here, the nervous system is able to read the sensory information because it follows a syntactic structure, thus an output is created. This output then creates bodily motions and brings forth behavior and cognition. Of particular note is that cognition is sealed away in the brain, meaning that mental cognition is cut off from the external world and is only possible by the input of sensory information. == The embodied cognitive approach == Embodied cognitive science differs from the traditionalist approach in that it denies the input-output system. This is chiefly due to the problems presented by the Homunculus argument, which concluded that semantic meaning could not be derived from symbols without some kind of inner interpretation. If some little man in a person's head interpreted incoming symbols, then who would interpret the little man's inputs? Because of the specter of an infinite regress, the traditionalist model began to seem less plausible. Thus, embodied cognitive science aims to avoid this problem by defining cognition in three ways. === Physical attributes of the body === The first aspect of embodied cognition examines the role of the physical body, particularly how its properties affect its ability to think. This part attempts to overcome the symbol manipulation component that is a feature of the traditionalist model. Depth perception, for instance, can be better explained under the embodied approach due to the sheer complexity of the action. Depth perception requires that the brain detect the disparate retinal images obtained by the distance of the two eyes. In addition, body and head cues complicate this further. When the head is turned in a given direction, objects in the foreground will appear to move against objects in the background. From this, it is said that some kind of visual processing is occurring without the need of any kind of symbol manipulation. This is because the objects appearing to move the foreground are simply appearing to move. This observation concludes then that depth can be perceived with no intermediate symbol manipulation necessary. A more poignant example exists through examining auditory perception. Generally speaking the greater the distance between the ears, the greater the possible auditory acuity. Also relevant is the amount of density in between the ears, for the strength of the frequency wave alters as it passes through a given medium. The brain's auditory system takes these factors into account as it process information, but again without any need for a symbolic manipulation system. This is because the distance between the ears for example does not need symbols to represent it. The distance itself creates the necessary opportunity for greater auditory acuity. The amount of density between the ears is similar, in that it is the actual amount itself that simply forms the opportunity for frequency alteration. Thus under consideration of the physical properties of the body, a symbolic system is unnecessary and an unhelpful metaphor. === The body's role in the cognitive process === The second aspect draws heavily from George Lakoff's and Mark Johnson's work on concepts. They argued that humans use metaphors whenever possible to better explain their external world. Humans also have a basic stock of concepts in which other concepts can be derived from. These basic concepts include spatial orientations such as up, down, front, and back. Humans can understand what these concepts mean because they can directly experience them from their own bodies. For example, because human movement revolves around standing erect and moving the body in an up-down motion, humans innately have these concepts of up and down. Lakoff and Johnson contend this is similar with other spatial orientations such as front and back too. As mentioned earlier, these basic stocks of spatial concepts are the basis in which other concepts are constructed. Happy and sad for instance are seen now as being up or down respectively. When someone says they are feeling down, what they are really saying is that they feel sad for example. Thus the point here is that true understanding of these concepts is contingent on whether one can have an understanding of the human body. So the argument goes that if one lacked a human body, they could not possibly know what up or down could mean, or how it could relate to emotional states. [I]magine a spherical being living outside of any gravitational field, with no knowledge or imagination of any other kind of experience. What could UP possibly mean to such a being? While this does not mean that such beings would be incapable of expressing emotions in other words, it does mean that they would express emotions differently from humans. Human concepts of happiness and sadness would be different because human would have different bodies. So then an organism's body directly affects how it can think, because it uses metaphors related to its body as the basis of concepts. === Interaction of local environment === A third component of the embodied approach looks at how agents use their immediate environment in cognitive processing. Meaning, the local environment is seen as an actual extension of the body's cognitive process. The example of a personal digital assistant (PDA) is used to better imagine this. Echoing functionalism (philosophy of mind), this point claims that mental states are individuated by their role in a much larger system. So under this premise, the information on a PDA is similar to the information stored in the brain. So then if one thinks information in the brain constitutes mental states, then it must follow that information in the PDA is a cognitive state too. Consider also the role of pen and paper in a complex multiplication problem. The pen and paper are so involved in the cognitive process of solving the problem that it seems ridiculous to say they are somehow different from the process, in very much the same way the PDA is used for information like the brain. Another example examines how humans control and manipulate their environment

    Read more →
  • SemEval

    SemEval

    SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive. This series of evaluations provides a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis). The purpose of the SemEval and Senseval exercises is to evaluate semantic analysis systems. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation. The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation (WSD), each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. Triggered by the conception of the SEM conference, the SemEval community had decided to hold the evaluation workshops yearly in association with the SEM conference. It was also the decision that not every evaluation task will be run every year, e.g. none of the WSD tasks were included in the SemEval-2012 workshop. == History == === Early evaluation of algorithms for word sense disambiguation === From the earliest days, assessing the quality of word sense disambiguation algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words. === Senseval to SemEval === In April 1997, Martha Palmer and Marc Light organized a workshop entitled Tagging with Lexical Semantics: Why, What, and How? in conjunction with the Conference on Applied Natural Language Processing. At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. Kilgarriff recalled that there was "a high degree of consensus that the field needed evaluation", and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises. === SemEval's 3, 2 or 1 year(s) cycle === After SemEval-2010, many participants feel that the 3-year cycle is a long wait. Many other shared tasks such as Conference on Natural Language Learning (CoNLL) and Recognizing Textual Entailments (RTE) run annually. For this reason, the SemEval coordinators gave the opportunity for task organizers to choose between a 2-year or a 3-year cycle. The SemEval community favored the 3-year cycle. Although the votes within the SemEval community favored a 3-year cycle, organizers and coordinators had settled to split the SemEval task into 2 evaluation workshops. This was triggered by the introduction of the new SEM conference. The SemEval organizers thought it would be appropriate to associate our event with the SEM conference and collocate the SemEval workshop with the SEM conference. The organizers got very positive responses (from the task coordinators/organizers and participants) about the association with the yearly SEM, and 8 tasks were willing to switch to 2012. Thus was born SemEval-2012 and SemEval-2013. The current plan is to switch to a yearly SemEval schedule to associate it with the SEM conference but not every task needs to run every year. ==== List of Senseval and SemEval Workshops ==== Senseval-1 took place in the summer of 1998 for English, French, and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2–4. Senseval-2 took place in the summer of 2001, and was followed by a workshop held in July 2001 in Toulouse, in conjunction with ACL 2001. Senseval-2 included tasks for Basque, Chinese, Czech, Danish, Dutch, English, Estonian, Italian, Japanese, Korean, Spanish and Swedish. Senseval-3 took place in March–April 2004, followed by a workshop held in July 2004 in Barcelona, in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition. SemEval-2007 (Senseval-4) took place in 2007, followed by a workshop held in conjunction with ACL in Prague. SemEval-2007 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. A special issue of Language Resources and Evaluation is devoted to the result. SemEval-2010 took place in 2010, followed by a workshop held in conjunction with ACL in Uppsala. SemEval-2010 included 18 different tasks targeting the evaluation of semantic analysis systems. SemEval-2012 took place in 2012; it was associated with the new SEM, First Joint Conference on Lexical and Computational Semantics, and co-located with NAACL, Montreal, Canada. SemEval-2012 included 8 different tasks targeting at evaluating computational semantic systems. However, there was no WSD task involved in SemEval-2012, the WSD related tasks were scheduled in the upcoming SemEval-2013. SemEval-2013 was associated with NAACL 2013, North American Association of Computational Linguistics, Georgia, USA and took place in 2013. It included 13 different tasks targeting at evaluating computational semantic systems. SemEval-2014 took place in 2014. It was co-located with COLING 2014, 25th International Conference on Computational Linguistics and SEM 2014, Second Joint Conference on Lexical and Computational Semantics, Dublin, Ireland. There were 10 different tasks in SemEval-2014 evaluating various computational semantic systems. SemEval-2015 took place in 2015. It was co-located with NAACL-HLT 2015, 2015 Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies and SEM 2015, Third Joint Conference on Lexical and Computational Semantics, Denver, USA. There were 17 different tasks in SemEval-2015 evaluating various computational semantic systems. == SemEval Workshop framework == The framework of the SemEval/Senseval evaluation workshops emulates the Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)). Stages of SemEval/Senseval evaluation workshops Firstly, all likely participants were invited to express their interest and participate in the exercise design. A timetable towards a final workshop was worked out. A plan for selecting evaluation materials was agreed. 'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers (i.e. the correct sense for a given word in a given context) The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers. The organizers then scored the answers and the scores were announced and discussed at a workshop. == Semantic evaluation tasks == Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented

    Read more →
  • Graph cut optimization

    Graph cut optimization

    Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function f {\displaystyle f} , if it is possible to construct a flow network with positive weights such that each cut C {\displaystyle C} of the network can be mapped to an assignment of variables x {\displaystyle \mathbf {x} } to f {\displaystyle f} (and vice versa), and the cost of C {\displaystyle C} equals f ( x ) {\displaystyle f(\mathbf {x} )} (up to an additive constant) then it is possible to find the global optimum of f {\displaystyle f} in polynomial time by computing a minimum cut of the graph. The mapping between cuts and variable assignments is done by representing each variable with one node in the graph and, given a cut, each variable will have a value of 0 if the corresponding node belongs to the component connected to the source, or 1 if it belong to the component connected to the sink. Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration. Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, denoising, registration and stereo matching. == Representability == A pseudo-Boolean function f : { 0 , 1 } n → R {\displaystyle f:\{0,1\}^{n}\to \mathbb {R} } is said to be representable if there exists a graph G = ( V , E ) {\displaystyle G=(V,E)} with non-negative weights and with source and sink nodes s {\displaystyle s} and t {\displaystyle t} respectively, and there exists a set of nodes V 0 = { v 1 , … , v n } ⊂ V − { s , t } {\displaystyle V_{0}=\{v_{1},\dots ,v_{n}\}\subset V-\{s,t\}} such that, for each tuple of values ( x 1 , … , x n ) ∈ { 0 , 1 } n {\displaystyle (x_{1},\dots ,x_{n})\in \{0,1\}^{n}} assigned to the variables, f ( x 1 , … , x n ) {\displaystyle f(x_{1},\dots ,x_{n})} equals (up to a constant) the value of the flow determined by a minimum cut C = ( S , T ) {\displaystyle C=(S,T)} of the graph G {\displaystyle G} such that v i ∈ S {\displaystyle v_{i}\in S} if x i = 0 {\displaystyle x_{i}=0} and v i ∈ T {\displaystyle v_{i}\in T} if x i = 1 {\displaystyle x_{i}=1} . It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions f ( x ) = w 0 + ∑ i w i ( x i ) + ∑ i < j w i j ( x i , x j ) . {\displaystyle f(\mathbf {x} )=w_{0}+\sum _{i}w_{i}(x_{i})+\sum _{i 0 {\displaystyle p>0} then w i j k ( x i , x j , x k ) = w i j k ( 0 , 0 , 0 ) + p 1 ( x i − 1 ) + p 2 ( x j − 1 ) + p 3 ( x k − 1 ) + p 23 ( x j − 1 ) x k + p 31 x i ( x k − 1 ) + p 12 ( x i − 1 ) x j − p x i x j x k {\displaystyle w_{ijk}(x_{i},x_{j},x_{k})=w_{ijk}(0,0,0)+p_{1}(x_{i}-1)+p_{2}(x_{j}-1)+p_{3}(x_{k}-1)+p_{23}(x_{j}-1)x_{k}+p_{31}x_{i}(x_{k}-1)+p_{12}(x_{i}-1)x_{j}-px_{i}x_{j}x_{k}} with p 1 = w i j k ( 1 , 0 , 1 ) − w i j k ( 0 , 0 , 1 ) p 2 = w i j k ( 1 , 1 , 0 ) − w i j k ( 1 , 0 , 1 ) p 3 = w i j k ( 0 , 1 , 1 ) − w i j k ( 0 , 1 , 0 ) p 23 = w i j k ( 0 , 0 , 1 ) + w i j k ( 0 , 1 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 0 , 1 , 1 ) p 31 = w i j k ( 0 , 0 , 1 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 0 , 1 ) p 12 = w i j k ( 0 , 1 , 0 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 1 , 0 ) . {\displaystyle {\begin{aligned}p_{1}&=w_{ijk}(1,0,1)-w_{ijk}(0,0,1)\\p_{2}&=w_{ijk}(1,1,0)-w_{ijk}(1,0,1)\\p_{3}&=w_{ijk}(0,1,1)-w_{ijk}(0,1,0)\\p_{23}&=w_{ijk}(0,0,1)+w_{ijk}(0,1,0)-w_{ijk}(0,0,0)-w_{ijk}(0,1,1)\\p_{31}&=w_{ijk}(0,0,1)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,0,1)\\p_{12}&=w_{ijk}(0,1,0)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,1

    Read more →
  • Racter

    Racter

    Racter is an artificial intelligence program that generates English language prose at random. It was published by Mindscape for IBM PC compatibles in 1984, then for the Apple II, Mac, and Amiga. An expanded version of the software, not the one released through Mindscape, was used to generate the text for the published book The Policeman's Beard Is Half Constructed. == History == Racter, short for raconteur, was written by William Chamberlain and Thomas Etter. Racter's initial creation was the short story Soft Ions, which appeared in the October 1981 issue of Omni (magazine). The publication's editors bought the story in January 1980, before it had even been written. In exchange for the rights, the editors offered financial support to Chamberlain and Etter so the two could refine Racter. In 1983, Racter produced a book called The Policeman's Beard Is Half Constructed (ISBN 0-446-38051-2). The program originally was written for an OSI which only supported file names at most six characters long, causing the name to be shorted to Racter and it was later adapted to run on a CP/M machine where it was written in "compiled ASIC on a Z80 microcomputer with 64K of RAM." This version, the program that allegedly wrote the book, was not released to the general public. The sophistication claimed for the program was likely exaggerated, as could be seen by investigation of the template system of text generation. In 1984, Mindscape released an interactive version of Racter, developed by Inrac Corporation, for IBM PC compatibles, and it was ported to the Apple II, Mac, and Amiga. The published Racter was similar to a chatterbot. The BASIC program that was released by Mindscape was far less sophisticated than anything that could have written the fairly sophisticated prose of The Policeman's Beard. The commercial version of Racter could be likened to a computerized version of Mad Libs, the game in which you fill in the blanks in advance and then plug them into a text template to produce a surrealistic tale. The commercial program attempted to parse text inputs, identifying significant nouns and verbs, which it would then regurgitate to create "conversations", plugging the input from the user into phrase templates which it then combined, along with modules that conjugated English verbs. By contrast, the text in The Policeman's Beard, apart from being edited from a large amount of output, would have been the product of Chamberlain's own specialized templates and modules, which were not included in the commercial release of the program. == Reception == The Boston Phoenix called the story Soft Ions "schematic nonsense. But the scheme is obvious enough and the nonsense accessible enough to an attentive reader that one can almost believe Chamberlain when he predicts that before long Racter will be ready to write for the pulp-reading public." PC Magazine described some of Policeman's Beard's scenes as "surprising for their frankness" and "reflective". It concluded that the book was "whimsical and wise and sometimes fun". Computer Gaming World described Racter as "a diversion into another dimension that might best be seen before paying the price of a ticket. (Try before you buy!)" A 1985 review of the program in The New York Times notes that, "As computers move ever closer to artificial intelligence, Racter is on the edge of artificial insanity." It also states that Racter's "always-changing sentences are grammatically correct, often funny and, for a computer, sometimes profound." The article includes examples showing interaction with Racter, most often Racter asking the user questions. == Reviews == Jeux & Stratégie #47

    Read more →
  • Neural field

    Neural field

    In machine learning, a neural field (also known as implicit neural representation, neural implicit, or coordinate-based neural network), is a mathematical field that is fully or partially parametrized by a neural network. Initially developed to tackle visual computing tasks, such as rendering or reconstruction (e.g., neural radiance fields), neural fields emerged as a promising strategy to deal with a wider range of problems, including surrogate modelling of partial differential equations, such as in physics-informed neural networks. Differently from traditional machine learning algorithms, such as feed-forward neural networks, convolutional neural networks, or transformers, neural fields do not work with discrete data (e.g. sequences, images, tokens), but map continuous inputs (e.g., spatial coordinates, time) to continuous outputs (i.e., scalars, vectors, etc.). This makes neural fields not only discretization independent, but also easily differentiable. Moreover, dealing with continuous data allows for a significant reduction in space complexity, which translates to a much more lightweight network. == Formulation and training == According to the universal approximation theorem, provided adequate learning, sufficient number of hidden units, and the presence of a deterministic relationship between the input and the output, a neural network can approximate any function to any degree of accuracy. Hence, in mathematical terms, given a field y = Φ ( x ) {\textstyle {\boldsymbol {y}}=\Phi ({\boldsymbol {x}})} , with x ∈ R n {\displaystyle {\boldsymbol {x}}\in \mathbb {R} ^{n}} and y ∈ R m {\displaystyle {\boldsymbol {y}}\in \mathbb {R} ^{m}} , a neural field Ψ θ {\displaystyle \Psi _{\theta }} , with parameters θ {\displaystyle {\boldsymbol {\theta }}} , is such that: Ψ θ ( x ) = y ^ ≈ y {\displaystyle \Psi _{\theta }({\boldsymbol {x}})={\hat {\boldsymbol {y}}}\approx {\boldsymbol {y}}} === Training === For supervised tasks, given N {\displaystyle N} examples in the training dataset (i.e., ( x i , y i ) ∈ D t r a i n , i = 1 , … , N {\displaystyle ({\boldsymbol {x_{i}}},{\boldsymbol {y_{i}}})\in {\mathcal {D_{train}}},i=1,\dots ,N} ), the neural field parameters can be learned by minimizing a loss function L {\displaystyle {\mathcal {L}}} (e.g., mean squared error). The parameters θ ~ {\displaystyle {\tilde {\theta }}} that satisfy the optimization problem are found as: θ ~ = argmin θ 1 N ∑ ( x i , y i ) ∈ D t r a i n L ( Ψ θ ( x i ) , y i ) {\displaystyle {\tilde {\boldsymbol {\theta }}}={\underset {\boldsymbol {\theta }}{\text{argmin}}}\;{\frac {1}{N}}\sum _{({\boldsymbol {x_{i}}},{\boldsymbol {y_{i}}})\in {\mathcal {D_{train}}}}{\mathcal {L}}(\Psi _{\theta }({\boldsymbol {x}}_{i}),{\boldsymbol {y}}_{i})} Notably, it is not necessary to know the analytical expression of Φ {\displaystyle \Phi } , for the previously reported training procedure only requires input-output pairs. Indeed, a neural field is able to offer a continuous and differentiable surrogate of the true field, even from purely experimental data. Moreover, neural fields can be used in unsupervised settings, with training objectives that depend on the specific task. For example, physics-informed neural networks may be trained on just the residual. === Spectral bias === As for any artificial neural network, neural fields may be characterized by a spectral bias (i.e., the tendency to preferably learn the low frequency content of a field), possibly leading to a poor representation of the ground truth. In order to overcome this limitation, several strategies have been developed. For example, SIREN uses sinusoidal activations, while the Fourier-features approach embeds the input through sines and cosines. == Conditional neural fields == In many real-world cases, however, learning a single field is not enough. For example, when reconstructing 3D vehicle shapes from Lidar data, it is desirable to have a machine learning model that can work with arbitrary shapes (e.g., a car, a bicycle, a truck, etc.). The solution is to include additional parameters, the latent variables (or latent code) z ∈ R d {\displaystyle {\boldsymbol {z}}\in \mathbb {R} ^{d}} , to vary the field and adapt it to diverse tasks. === Latent code production === When dealing with conditional neural fields, the first design choice is represented by the way in which the latent code is produced. Specifically, two main strategies can be identified: Encoder: the latent code is the output of a second neural network, acting as an encoder. During training, the loss function is the objective used to learn the parameters of both the neural field and the encoder. Auto-decoding: each training example has its own latent code, jointly trained with the neural field parameters. When the model has to process new examples (i.e., not originally present in the training dataset), a small optimization problem is solved, keeping the network parameters fixed and only learning the new latent variables. Since the latter strategy requires additional optimization steps at inference time, it sacrifices speed, but keeps the overall model smaller. Moreover, despite being simpler to implement, an encoder may harm the generalization capabilities of the model. For example, when dealing with a physical scalar field f : R 2 → R {\displaystyle f:\mathbb {R} ^{2}\rightarrow \mathbb {R} } (e.g., the pressure of a 2D fluid), an auto-decoder-based conditional neural field can map a single point to the corresponding value of the field, following a learned latent code z {\displaystyle {\boldsymbol {z}}} . However, if the latent variables were produced by an encoder, it would require access to the entire set of points and corresponding values (e.g. as a regular grid or a mesh graph), leading to a less robust model. === Global and local conditioning === In a neural field with global conditioning, the latent code does not depend on the input and, hence, it offers a global representation (e.g., the overall shape of a vehicle). However, depending on the task, it may be more useful to divide the domain of x {\displaystyle {\boldsymbol {x}}} in several subdomains, and learn different latent codes for each of them (e.g., splitting a large and complex scene in sub-scenes for a more efficient rendering). This is called local conditioning. === Conditioning strategies === There are several strategies to include the conditioning information in the neural field. In the general mathematical framework, conditioning the neural field with the latent variables is equivalent to mapping them to a subset θ ∗ {\displaystyle {\boldsymbol {\theta }}^{}} of the neural field parameters: θ ∗ = Γ ( z ) {\displaystyle {\boldsymbol {\theta }}^{}=\Gamma ({\boldsymbol {z}})} In practice, notable strategies are: Concatenation: the neural field receives, as input, the concatenation of the original input x {\displaystyle {\boldsymbol {x}}} with the latent codes z {\displaystyle {\boldsymbol {z}}} . For feed-forward neural networks, this is equivalent to setting θ ∗ {\displaystyle {\boldsymbol {\theta }}^{}} as the bias of the first layer and Γ ( z ) {\displaystyle \Gamma ({\boldsymbol {z}})} as an affine transformation. Hypernetworks: a hypernetwork is a neural network that outputs the parameters of another neural network. Specifically, it consists of approximating Γ ( z ) {\displaystyle \Gamma ({\boldsymbol {z}})} with a neural network Γ ^ γ ( z ) {\displaystyle {\hat {\Gamma }}_{\gamma }({\boldsymbol {z}})} , where γ {\displaystyle {\boldsymbol {\gamma }}} are the trainable parameters of the hypernetwork. This approach is the most general, as it allows to learn the optimal mapping from latent codes to neural field parameters. However, hypernetworks are associated to larger computational and memory complexity, due to the large number of trainable parameters. Hence, leaner approaches have been developed. For example, in the Feature-wise Linear Modulation (FiLM), the hypernetwork only produces scale and bias coefficients for the neural field layers. === Meta-learning === Instead of relying on the latent code to adapt the neural field to a specific task, it is also possible to exploit gradient-based meta-learning. In this case, the neural field is seen as the specialization of an underlying meta-neural-field, whose parameters are modified to fit the specific task, through a few steps of gradient descent. An extension of this meta-learning framework is the CAVIA algorithm, that splits the trainable parameters in context-specific and shared groups, improving parallelization and interpretability, while reducing meta-overfitting. This strategy is similar to the auto-decoding conditional neural field, but the training procedure is substantially different. == Applications == Thanks to the possibility of efficiently modelling diverse mathematical fields with neural networks, neural fields have been applied to a wide range of problems: 3D scene reconstruction: neural fields can be used to model t

    Read more →
  • Legendre moment

    Legendre moment

    In mathematics, Legendre moments are a type of image moment and are achieved by using the Legendre polynomial. Legendre moments are used in areas of image processing including: pattern and object recognition, image indexing, line fitting, feature extraction, edge detection, and texture analysis. Legendre moments have been studied as a means to reduce image moment calculation complexity by limiting the amount of information redundancy through approximation. == Legendre moments == Source: With order of m + n, and object intensity function f(x,y): L m n = ( 2 m + 1 ) ( 2 n + 1 ) 4 ∫ − 1 1 ∫ − 1 1 P m ( x ) P n ( y ) f ( x , y ) d x d y {\displaystyle L_{mn}={\frac {(2m+1)(2n+1)}{4}}\int \limits _{-1}^{1}\int \limits _{-1}^{1}P_{m}(x)P_{n}(y)f(x,y)\,dx\,dy} where m,n = 1, 2, 3, ...∞ with the nth-order Legendre polynomials being: P n ( x ) = ∑ k = 0 n a k , n x k = ( − 1 ) n 2 n n ! ( d d x ) [ ( 1 − x 2 ) n ] {\displaystyle P_{n}(x)=\sum _{k=0}^{n}a_{k,n}x^{k}={\frac {(-1)^{n}}{2^{n}n!}}\left({\frac {d}{dx}}\right)[(1-x^{2})^{n}]} which can also be written: P n ( x ) = ∑ k = 0 D ( n ) ( − 1 ) k ( 2 n − 2 k ) ! 2 n k ! ( n − k ) ! ( n − 2 k ) ! x n − 2 k = ( 2 n ) ! 2 n ( n ! ) 2 x n − ( 2 n − 2 ) ! 2 n 1 ! ( n − 1 ) ! ( n − 2 ) ! x n − 2 + ⋯ {\displaystyle {\begin{aligned}P_{n}(x)&=\sum _{k=0}^{D(n)}(-1)^{k}{\frac {(2n-2k)!}{2^{n}k!(n-k)!(n-2k)!}}x^{n-2k}\\[5pt]&={\frac {(2n)!}{2^{n}(n!)^{2}}}x^{n}-{\frac {(2n-2)!}{2^{n}1!(n-1)!(n-2)!}}x^{n-2}+\cdots \end{aligned}}} where D(n) = floor(n/2). The set of Legendre polynomials {Pn(x)} form an orthogonal set on the interval [−1,1]: ∫ − 1 1 P n ( x ) P m ( x ) d x = 2 2 n + 1 δ n m {\displaystyle \int _{-1}^{1}P_{n}(x)P_{m}(x)\,dx={\frac {2}{2n+1}}\delta _{nm}} A recurrence relation can be used to compute the Legendre polynomial: ( n + 1 ) P n + 1 ( x ) − ( 2 n + 1 ) x P n ( x ) + n P n − 1 ( x ) = 0 {\displaystyle (n+1)P_{n+1}(x)-(2n+1)xP_{n}(x)+nP_{n-1}(x)=0} f(x,y) can be written as an infinite series expansion in terms of Legendre polynomials [−1 ≤ x,y ≤ 1.]: f ( x , y ) = ∑ m = 0 ∞ ∑ n = 0 ∞ λ m n P m ( x ) P n ( y ) {\displaystyle f(x,y)=\sum _{m=0}^{\infty }\sum _{n=0}^{\infty }\lambda _{mn}P_{m}(x)P_{n}(y)}

    Read more →
  • Natural language understanding

    Natural language understanding

    Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU has been considered an AI-hard problem. There is considerable commercial interest in the field because of its application to automated reasoning, machine translation, question answering, news-gathering, text categorization, voice-activation, archiving, and large-scale content analysis. == History == The program STUDENT, written in 1964 by Daniel Bobrow for his PhD dissertation at MIT, is one of the earliest known attempts at NLU by a computer. Eight years after John McCarthy coined the term artificial intelligence, Bobrow's dissertation (titled Natural Language Input for a Computer Problem Solving System) showed how a computer could understand simple natural language input to solve algebra word problems. A year later, in 1965, Joseph Weizenbaum at MIT wrote ELIZA, an interactive program that carried on a dialogue in English on any topic, the most popular being psychotherapy. ELIZA worked by simple parsing and substitution of key words into canned phrases and Weizenbaum sidestepped the problem of giving the program a database of real-world knowledge or a rich lexicon. Yet ELIZA gained surprising popularity as a toy project and can be seen as a very early precursor to current commercial systems such as those used by Ask.com. In 1969, Roger Schank at Stanford University introduced the conceptual dependency theory for NLU. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. In 1971, Terry Winograd finished writing SHRDLU for his PhD thesis at MIT. SHRDLU could understand simple English sentences in a restricted world of children's blocks to direct a robotic arm to move items. The successful demonstration of SHRDLU provided significant momentum for continued research in the field. Winograd continued to be a major influence in the field with the publication of his book Language as a Cognitive Process. At Stanford, Winograd would later advise Larry Page, who co-founded Google. In the 1970s and 1980s, the natural language processing group at SRI International continued research and development in the field. A number of commercial efforts based on the research were undertaken, e.g., in 1982 Gary Hendrix formed Symantec Corporation originally as a company for developing a natural language interface for database queries on personal computers. However, with the advent of mouse-driven graphical user interfaces, Symantec changed direction. A number of other commercial efforts were started around the same time, e.g., Larry R. Harris at the Artificial Intelligence Corporation and Roger Schank and his students at Cognitive Systems Corp. In 1983, Michael Dyer developed the BORIS system at Yale which bore similarities to the work of Roger Schank and W. G. Lehnert. The third millennium saw the introduction of systems using machine learning for text classification, such as the IBM Watson. However, experts debate how much "understanding" such systems demonstrate: e.g., according to John Searle, Watson did not even understand the questions. John Ball, cognitive scientist and inventor of the Patom Theory, supports this assessment. Natural language processing has made inroads for applications to support human productivity in service and e-commerce, but this has largely been made possible by narrowing the scope of the application. There are thousands of ways to request something in a human language that still defies conventional natural language processing. According to Wibe Wagemans, "To have a meaningful conversation with machines is only possible when we match every word to the correct meaning based on the meanings of the other words in the sentence – just like a 3-year-old does without guesswork." == Scope and context == The umbrella term "natural language understanding" can be applied to a diverse set of computer applications, ranging from small, relatively simple tasks such as short commands issued to robots, to highly complex endeavors such as the full comprehension of newspaper articles or poetry passages. Many real-world applications fall between the two extremes, for instance text classification for the automatic analysis of emails and their routing to a suitable department in a corporation does not require an in-depth understanding of the text, but needs to deal with a much larger vocabulary and more diverse syntax than the management of simple queries to database tables with fixed schemata. Throughout the years various attempts at processing natural language or English-like sentences presented to computers have taken place at varying degrees of complexity. Some attempts have not resulted in systems with deep understanding, but have helped overall system usability. For example, Wayne Ratliff originally developed the Vulcan program with an English-like syntax to mimic the English speaking computer in Star Trek. Vulcan later became the dBase system whose easy-to-use syntax effectively launched the personal computer database industry. Systems with an easy-to-use or English-like syntax are, however, quite distinct from systems that use a rich lexicon and include an internal representation (often as first order logic) of the semantics of natural language sentences. Hence the breadth and depth of "understanding" aimed at by a system determine both the complexity of the system (and the implied challenges) and the types of applications it can deal with. The "breadth" of a system is measured by the sizes of its vocabulary and grammar. The "depth" is measured by the degree to which its understanding approximates that of a fluent native speaker. At the narrowest and shallowest, English-like command interpreters require minimal complexity, but have a small range of applications. Narrow but deep systems explore and model mechanisms of understanding, but they still have limited application. Systems that attempt to understand the contents of a document such as a news release beyond simple keyword matching and to judge its suitability for a user are broader and require significant complexity, but they are still somewhat shallow. Systems that are both very broad and very deep are beyond the current state of the art. == Components and architecture == Regardless of the approach used, most NLU systems share some common components. The system needs a lexicon of the language and a parser and grammar rules to break sentences into an internal representation. The construction of a rich lexicon with a suitable ontology requires significant effort, e.g., the Wordnet lexicon required many person-years of effort. The system also needs theory from semantics to guide the comprehension. The interpretation capabilities of a language-understanding system depend on the semantic theory it uses. Competing semantic theories of language have specific trade-offs in their suitability as the basis of computer-automated semantic interpretation. These range from naive semantics or stochastic semantic analysis to the use of pragmatics to derive meaning from context. Semantic parsers convert natural-language texts into formal meaning representations. Advanced applications of NLU also attempt to incorporate logical inference within their framework. This is generally achieved by mapping the derived meaning into a set of assertions in predicate logic, then using logical deduction to arrive at conclusions. Therefore, systems based on functional languages such as Lisp need to include a subsystem to represent logical assertions, while logic-oriented systems such as those using the language Prolog generally rely on an extension of the built-in logical representation framework. The management of context in NLU can present special challenges. A large variety of examples and counter examples have resulted in multiple approaches to the formal modeling of context, each with specific strengths and weaknesses.

    Read more →