AI Generator App Free

AI Generator App Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Automated negotiation

    Automated negotiation

    Automated negotiation is a form of interaction in systems that are composed of multiple autonomous agents, in which the aim is to reach agreements through an iterative process of making offers. Automated negotiation can be employed for many tasks human negotiators regularly engage in, such as bargaining and joint decision making. The main topics in automated negotiation revolve around the design of protocols and negotiating strategies. == History == Through digitization, the beginning of the 21st century has seen a growing interest in the automation of negotiation and e-negotiation systems, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators, and to find better outcomes than human negotiators. == Examples == Examples of automated negotiation include: Online dispute resolution, in which disagreements between parties are settled. Sponsored search auction, where bids are placed on advertisement keywords. Content negotiation, in which user agents negotiate over HTTP about how to best represent a web resource. Negotiation support systems, in which negotiation decision-making activities are supported by an information system.

    Read more →
  • Chelsea Finn

    Chelsea Finn

    Chelsea Finn (born October 8, 1992) is an American computer scientist and assistant professor at Stanford University. Her research investigates intelligence through the interactions of robots, with the hope to create robotic systems that can learn how to learn. She previously worked for Google and currently is a co-founder of the startup Physical Intelligence. == Early life and education == Finn was an undergraduate student in electrical engineering and computer science at Massachusetts Institute of Technology. She then moved to the University of California, Berkeley, where she earned her Ph.D. in 2018 under Pieter Abbeel and Sergey Levine. Her work in the Berkeley Artificial Intelligence Lab (BAIR) focused on gradient based algorithms . Such algorithms allow machines to 'learn to learn', more akin to human learning than traditional machine learning systems. These “meta-learning” techniques train machines to quickly adapt, such that when they encounter new scenarios they can learn quickly. As a doctoral student she worked as an intern at Google Brain, where she worked on robot learning algorithms from deep predictive models. She delivered a massive open online course on deep reinforcement learning. She was the first woman to win the C.V. & Daulat Ramamoorthy Distinguished Research Award. == Research and career == Finn investigates the capabilities of robots to develop intelligence through learning and interaction. She has made use of deep learning algorithms to simultaneously learn visual perception and control robotic skills. She developed meta-learning approaches to train neural networks to take in student code and output useful feedback. She showed that the system could quickly adapt without too much input from the instructor. She trialled the programme on Code in Place, a 12,000 student course delivered by Stanford University every year. She found that 97.9% of the time the students agreed with the feedback being given. == Awards and honors == 2016 C.V. & Daulat Ramamoorthy Distinguished Research Award 2017 Electrical engineering and computer science rising star 2018 MIT Technology Review 35 Under 35 2018 ACM Doctoral Dissertation Award 2020 Samsung Advanced Institute of Technology AI Researcher of the Year 2020 Intel Rising Star Faculty Award 2021 Office of Naval Research Young Investigator Award 2022 IEEE Robotics and Automation Society Early Academic Career Award == Select publications == Finn, Chelsea; Abbeel, Pieter; Levine, Sergey (2017-07-17). "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks". International Conference on Machine Learning. PMLR: 1126–1135. arXiv:1703.03400. Sergey Levine; Chelsea Finn; Trevor Darrell; Pieter Abbeel (2016). "End-to-End Training of Deep Visuomotor Policies". Journal of Machine Learning Research. 17 (39): 1–40. arXiv:1504.00702. ISSN 1533-7928. Wikidata Q90313375. Chelsea Finn; Ian Goodfellow; Sergey Levine (2016). "Unsupervised Learning for Physical Interaction through Video Prediction" (PDF). Advances in Neural Information Processing Systems 29. Advances in Neural Information Processing Systems. Wikidata Q46993574.

    Read more →
  • Markov partition

    Markov partition

    A Markov partition in mathematics is a tool used in dynamical systems theory, allowing the methods of symbolic dynamics to be applied to the study of hyperbolic dynamics. By using a Markov partition, the system can be made to resemble a discrete-time Markov process, with the long-term dynamical characteristics of the system represented as a Markov shift. The appellation 'Markov' is appropriate because the resulting dynamics of the system obeys the Markov property. The Markov partition thus allows standard techniques from symbolic dynamics to be applied, including the computation of expectation values, correlations, topological entropy, topological zeta functions, Fredholm determinants and the like. == Motivation == Let ( M , φ ) {\displaystyle (M,\varphi )} be a discrete dynamical system. A basic method of studying its dynamics is to find a symbolic representation: a faithful encoding of the points of M {\displaystyle M} by sequences of symbols such that the map φ {\displaystyle \varphi } becomes the shift map. Suppose that M {\displaystyle M} has been divided into a number of pieces E 1 , E 2 , … , E r {\displaystyle E_{1},E_{2},\ldots ,E_{r}} which are thought to be as small and localized, with virtually no overlaps. The behavior of a point x {\displaystyle x} under the iterates of φ {\displaystyle \varphi } can be tracked by recording, for each n {\displaystyle n} , the part E i {\displaystyle E_{i}} which contains φ n ( x ) {\displaystyle \varphi ^{n}(x)} . This results in an infinite sequence on the alphabet { 1 , 2 , … , r } {\displaystyle \{1,2,\ldots ,r\}} which encodes the point. In general, this encoding may be imprecise (the same sequence may represent many different points) and the set of sequences which arise in this way may be difficult to describe. Under certain conditions, which are made explicit in the rigorous definition of a Markov partition, the assignment of the sequence to a point of M {\displaystyle M} becomes an almost one-to-one map whose image is a symbolic dynamical system of a special kind called a shift of finite type. In this case, the symbolic representation is a powerful tool for investigating the properties of the dynamical system ( M , φ ) {\displaystyle (M,\varphi )} . == Formal definition == A Markov partition is a finite cover of the invariant set of the manifold by a set of curvilinear rectangles { E 1 , E 2 , … , E r } {\displaystyle \{E_{1},E_{2},\ldots ,E_{r}\}} such that For any pair of points x , y ∈ E i {\displaystyle x,y\in E_{i}} , that W s ( x ) ∩ W u ( y ) ∈ E i {\displaystyle W_{s}(x)\cap W_{u}(y)\in E_{i}} Int ⁡ E i ∩ Int ⁡ E j = ∅ {\displaystyle \operatorname {Int} E_{i}\cap \operatorname {Int} E_{j}=\emptyset } for i ≠ j {\displaystyle i\neq j} If x ∈ Int ⁡ E i {\displaystyle x\in \operatorname {Int} E_{i}} and φ ( x ) ∈ Int ⁡ E j {\displaystyle \varphi (x)\in \operatorname {Int} E_{j}} , then φ [ W u ( x ) ∩ E i ] ⊃ W u ( φ x ) ∩ E j {\displaystyle \varphi \left[W_{u}(x)\cap E_{i}\right]\supset W_{u}(\varphi x)\cap E_{j}} φ [ W s ( x ) ∩ E i ] ⊂ W s ( φ x ) ∩ E j {\displaystyle \varphi \left[W_{s}(x)\cap E_{i}\right]\subset W_{s}(\varphi x)\cap E_{j}} Here, W u ( x ) {\displaystyle W_{u}(x)} and W s ( x ) {\displaystyle W_{s}(x)} are the unstable and stable manifolds of x, respectively, and Int ⁡ E i {\displaystyle \operatorname {Int} E_{i}} simply denotes the interior of E i {\displaystyle E_{i}} . These last two conditions can be understood as a statement of the Markov property for the symbolic dynamics; that is, the movement of a trajectory from one open cover to the next is determined only by the most recent cover, and not the history of the system. It is this property of the covering that merits the 'Markov' appellation. The resulting dynamics is that of a Markov shift; that this is indeed the case is due to theorems by Yakov Sinai (1968) and Rufus Bowen (1975), thus putting symbolic dynamics on a firm footing. Variants of the definition are found, corresponding to conditions on the geometry of the pieces E i {\displaystyle E_{i}} . == Examples == Markov partitions have been constructed in several situations. Anosov diffeomorphisms of the torus. Dynamical billiards, in which case the covering is countable. Markov partitions make homoclinic and heteroclinic orbits particularly easy to describe. The system ( [ 0 , 1 ) , x ↦ 2 x m o d 1 ) {\displaystyle ([0,1),x\mapsto 2x\ mod\ 1)} has the Markov partition E 0 = ( 0 , 1 / 2 ) , E 1 = ( 1 / 2 , 1 ) {\displaystyle E_{0}=(0,1/2),E_{1}=(1/2,1)} , and in this case the symbolic representation of a real number in [ 0 , 1 ) {\displaystyle [0,1)} is its binary expansion. For example: x ∈ E 0 , T x ∈ E 1 , T 2 x ∈ E 1 , T 3 x ∈ E 1 , T 4 x ∈ E 0 ⇒ x = ( 0.01110... ) 2 {\displaystyle x\in E_{0},Tx\in E_{1},T^{2}x\in E_{1},T^{3}x\in E_{1},T^{4}x\in E_{0}\Rightarrow x=(0.01110...)_{2}} . The assignment of points of [ 0 , 1 ) {\displaystyle [0,1)} to their sequences in the Markov partition is well defined except on the dyadic rationals - morally speaking, this is because ( 0.01111 … ) 2 = ( 0.10000 … ) 2 {\displaystyle (0.01111\dots )_{2}=(0.10000\dots )_{2}} , in the same way as 1 = 0.999 … {\displaystyle 1=0.999\dots } in decimal expansions.

    Read more →
  • Jürgen Schmidhuber

    Jürgen Schmidhuber

    Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He has been described by media outlets as a leading pioneer of modern artificial intelligence. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. He is best known for his work on long short-term memory (LSTM), a type of neural network architecture which was the dominant technique for various natural language processing tasks in research and commercial applications in the 2010s. He also introduced principles of dynamic neural networks, meta-learning, generative adversarial networks and linear transformers, all of which are widespread in modern AI. == Career == Schmidhuber completed his undergraduate (1987) and PhD (1991) studies at the Technical University of Munich in Munich, Germany. His PhD advisors were Wilfried Brauer and Klaus Schulten. He taught there from 2004 until 2009. From 2009 to 2021, he was a professor of artificial intelligence at the Università della Svizzera Italiana in Lugano, Switzerland. He has served as the director of Dalle Molle Institute for Artificial Intelligence Research (IDSIA), a Swiss AI lab, since 1995. Since 2021, he has also been the director of the AI Initiative at the King Abdullah University of Science and Technology (KAUST). In 2014, Schmidhuber formed a company, NNAISENSE, to work on commercial applications of artificial intelligence in fields such as finance, heavy industry and self-driving cars. Sepp Hochreiter, Jaan Tallinn, and Marcus Hutter are advisers to the company. Sales were under US$11 million in 2016; however, Schmidhuber states that the current emphasis is on research and not revenue. NNAISENSE raised its first round of capital funding in January 2017. Schmidhuber's overall goal is to create an all-purpose AI by training a single AI in sequence on a variety of narrow tasks, but as of 2026 he has said that the focus of NNAISENSE has shifted from artificial general intelligence to asset management. == Research == In the 1980s, backpropagation did not work well for deep learning with long credit assignment paths in artificial neural networks. To overcome this problem, Schmidhuber (1991) proposed a hierarchy of recurrent neural networks (RNNs) pre-trained one level at a time by self-supervised learning. It uses predictive coding to learn internal representations at multiple self-organizing time scales, facilitating downstream deep learning. The RNN hierarchy can be collapsed into a single RNN, by distilling a higher level chunker network into a lower level automatizer network. In 1993, a chunker solved a deep learning task whose depth exceeded 1000. In 1991, Schmidhuber published adversarial neural networks that contest with each other in the form of a zero-sum game, where one network's gain is the other network's loss. The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. This was called "artificial curiosity". In 2014, this principle was used in the creation of the generative adversarial network, which Schmidhuber describes as a special case of artificial curiosity where the environmental reaction is 1 or 0 depending on whether the first network's output is in a given set. Schmidhuber supervised the 1991 diploma thesis of his student Sepp Hochreiter which he considered "one of the most important documents in the history of machine learning". It studied the neural history compressor and analyzed and overcame the vanishing gradient problem. This led to the creation of long short-term memory (LSTM), a type of recurrent neural network. The name LSTM was introduced in a tech report in 1995, leading to the most cited LSTM publication, published in 1997 and co-authored by Hochreiter and Schmidhuber. The standard LSTM architecture was introduced in 2000 by Felix Gers, Schmidhuber, and Fred Cummins. Today's "vanilla LSTM" using backpropagation through time was published with his student Alex Graves in 2005, and its connectionist temporal classification (CTC) training algorithm in 2006. CTC was applied to end-to-end speech recognition with LSTM. In 2014, the state of the art was training “very deep neural network” with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In May 2015, Rupesh Kumar Srivastava, Klaus Greff, and Schmidhuber used LSTM principles to create the highway network, a feedforward neural network with hundreds of layers, much deeper than previous networks. In Dec 2015, the residual neural network (ResNet) was published, which is a variant of the highway network. In 1992, Schmidhuber published fast weights programmer, an alternative to recurrent neural networks. It has a slow feedforward neural network that learns by gradient descent to control the fast weights of another neural network through outer products of self-generated activation patterns, and the fast weights network itself operates over inputs. This was later shown to be equivalent to the unnormalized linear transformer. In 2011, Schmidhuber's team at IDSIA with his postdoc Dan Ciresan also achieved dramatic speedups of convolutional neural networks (CNNs) using graphics processing units (GPUs), based on CNN designs introduced much earlier by Kunihiko Fukushima. An earlier CNN on GPU by Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU. The deep CNN of Dan Ciresan et al. (2011) at IDSIA was 60 times faster and achieved the first superhuman performance in a computer vision contest in August 2011. Between 15 May 2011 and 10 September 2012, these CNNs won four more image competitions and improved the state of the art on multiple image benchmarks. The approach has become central to the field of computer vision. == Credit disputes == Schmidhuber has controversially argued that he and other researchers have been denied adequate recognition for their contribution to the field of deep learning, in favour of Geoffrey Hinton, Yoshua Bengio and Yann LeCun, who shared the 2018 Turing Award for their work in deep learning. He wrote a "scathing" 2015 article arguing that Hinton, Bengio and LeCun "heavily cite each other" but "fail to credit the pioneers of the field". In a statement to the New York Times, Yann LeCun wrote that "Jürgen is manically obsessed with recognition and keeps claiming credit he doesn't deserve for many, many things... It causes him to systematically stand up at the end of every talk and claim credit for what was just presented, generally not in a justified manner." Schmidhuber replied that LeCun did this "without any justification, without providing a single example", and published details of numerous priority disputes with Hinton, Bengio and LeCun. The term "schmidhubered" has been jokingly used in the AI community to describe Schmidhuber's habit of publicly challenging the originality of other researchers' work, a practice seen by some in the AI community as a "rite of passage" for young researchers. Some suggest that Schmidhuber's significant accomplishments have been underappreciated due to his confrontational personality. == Recognition == Schmidhuber received the Helmholtz Award of the International Neural Network Society in 2013, and the Neural Networks Pioneer Award of the IEEE Computational Intelligence Society in 2016 for "pioneering contributions to deep learning and neural networks." He is a member of the European Academy of Sciences and Arts. He has been referred to as the "father of modern AI", the "father of generative AI", and the "father of deep learning". Schmidhuber himself, however, has called Alexey Grigorevich Ivakhnenko the "father of deep learning", and gives credit to many even earlier AI pioneers. The New York Times ran a profile under the headline "When A.I. Matures, It May Call Jürgen Schmidhuber 'Dad'", highlighting his early work on deep learning and his long‑term vision for self‑improving AI. == Views == Schmidhuber is a proponent of open source AI, and believes that they will become competitive against commercial closed-source AI. Since the 1970s, Schmidhuber wanted to create "intelligent machines that could learn and improve on their own and become smarter than him within his lifetime." He differentiates between two types of AIs: tool AI, such as those for improving healthcare, and autonomous AIs that set their own goals, perform their own research, and explore the universe. He has worked on both types for de

    Read more →
  • Non-photorealistic rendering

    Non-photorealistic rendering

    Non-photorealistic rendering (NPR) is an area of computer graphics that focuses on enabling a wide variety of expressive styles for digital art, in contrast to traditional computer graphics, which focuses on photorealism. NPR is inspired by other artistic modes such as painting, drawing, technical illustration, and animated cartoons. NPR has appeared in movies and video games in the form of cel-shaded animation (also known as "toon" shading) as well as in scientific visualization, architectural illustration and experimental animation. == History and criticism of the term == The term non-photorealistic rendering is believed to have been coined by the SIGGRAPH 1990 papers committee, who held a session entitled "Non Photo Realistic Rendering". The term has received some criticism: The term "photorealism" has different meanings for graphics researchers (see "photorealistic rendering") and artists. For artists—who are the target consumers of NPR techniques—it refers to a school of painting that focuses on reproducing the effect of a camera lens, with all the distortion and hyper-reflections that it creates. For graphics researchers, however, it refers to an image that is visually indistinguishable from reality. In fact, graphics researchers lump the kinds of visual distortions that are used by photorealist painters into "non-photorealism". Describing something by what it is not is problematic. Equivalent (made-up) comparisons might be "non-elephant biology" or "non-geometric mathematics". NPR researchers have stated that they expect the term will disappear eventually and be replaced by the now more general term "computer graphics", with "photorealistic graphics" being the term used to describe "traditional" computer graphics. Many techniques that are used to create 'non-photorealistic' images are not rendering techniques. They are modelling techniques, or post-processing techniques. While the latter are coming to be known as 'image-based rendering', sketch-based modelling techniques, cannot technically be included under this heading, which is very inconvenient for conference organisers. The first conference on non-photorealistic animation and rendering included a discussion of possible alternative names. Among those suggested were "expressive graphics", "artistic rendering", "non-realistic graphics", "art-based rendering", and "psychographics". All of these terms have been used in various research papers on the topic, but the "non-photorealistic" term seems to have nonetheless taken hold. The first technical meeting dedicated to NPR was the ACM-sponsored Symposium on Non-Photorealistic Rendering and Animation(NPAR) in 2000. NPAR is traditionally co-located with the Annecy Animated Film Festival, running on even numbered years. From 2007 onward, NPAR began to also run on odd-numbered years, co-located with ACM SIGGRAPH. == 3D == Three-dimensional NPR is the style that is most commonly seen in video games and movies. The output from this technique is almost always a 3D model that has been modified from the original input model to portray a new artistic style. In many cases, the geometry of the model is identical to the original geometry, and only the material applied to the surface is modified. With increased availability of programmable GPU's, shaders have allowed NPR effects to be applied to the rasterised image that is to be displayed to the screen. The majority of NPR techniques applied to 3D geometry are intended to make the scene appear two-dimensional. NPR techniques for 3D images include cel shading and Gooch shading. Many methods can be used to draw stylized outlines and strokes from 3D models, including occluding contours and Suggestive contours. For enhanced legibility, the most useful technical illustrations for technical communication are not necessarily photorealistic. Non-photorealistic renderings, such as exploded view diagrams, greatly assist in showing placement of parts in a complex system. Cartoon rendering, also called cel shading or toon shading, is a non-photorealistic rendering technique used to give 3D computer graphics a flat, cartoon-like appearance. Its defining feature is the use of distinct shading colors rather than smooth gradients, producing a look reminiscent of comic books or animated films. This technique is often used to blend 3D objects and environments with 2D hand-animated elements while maintaining a consistent look. Treasure Planet movie by Disney is an example of blending these techniques. == 2D == The input to a two dimensional NPR system is typically an image or video. The output is a typically an artistic rendering of that input imagery (for example in a watercolor, painterly or sketched style) although some 2D NPR serves non-artistic purposes e.g. data visualization. The artistic rendering of images and video (often referred to as image stylization) traditionally focused upon heuristic algorithms that seek to simulate the placement of brush strokes on a digital canvas. Arguably, the earliest example of 2D NPR is Paul Haeberli's 'Paint by Numbers' at SIGGRAPH 1990. This (and similar interactive techniques) provide the user with a canvas that they can "paint" on using the cursor — as the user paints, a stylized version of the image is revealed on the canvas. This is especially useful for people who want to simulate different sizes of brush strokes according to different areas of the image. Subsequently, basic image processing operations using gradient operators or statistical moments were used to automate this process and minimize user interaction in the late nineties (although artistic control remains with the user via setting parameters of the algorithms). This automation enabled practical application of 2D NPR to video, for the first time in the living paintings of the movie What Dreams May Come (1998). More sophisticated image abstractions techniques were developed in the early 2000s harnessing computer vision operators e.g. image salience, or segmentation operators to drive stroke placement. Around this time, machine learning began to influence image stylization algorithms notably image analogy that could learn to mimic the style of an existing artwork. The advent of deep learning has re-kindled activity in image stylization, notably with neural style transfer (NST) algorithms that can mimic a wide gamut of artistic styles from single visual examples. These algorithms underpin mobile apps capable of the same e.g. Prisma In addition to the above stylization methods, a related class of techniques in 2D NPR address the simulation of artistic media. These methods include simulating the diffusion of ink through different kinds of paper, and also of pigments through water for simulation of watercolor. == Artistic rendering == Artistic rendering is the application of visual art styles to rendering. For photorealistic rendering styles, the emphasis is on accurate reproduction of light-and-shadow and the surface properties of the depicted objects, composition, or other more generic qualities. When the emphasis is on unique interpretive rendering styles, visual information is interpreted by the artist and displayed accordingly using the chosen art medium and level of abstraction in abstract art. In computer graphics, interpretive rendering styles are known as non-photorealistic rendering styles, but may be used to simplify technical illustrations. Rendering styles that combine photorealism with non-photorealism are known as hyperrealistic rendering styles. == Notable films and games == This section lists some seminal uses of NPR techniques in films, games and software. See cel-shaded animation for a list of uses of toon-shading in games and movies.

    Read more →
  • Intelligent character recognition

    Intelligent character recognition

    Intelligent character recognition (ICR) is a method of extracting handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data from physical documents. ICR is used to organize paper-based unstructured data by scanning documents, extracting information, and adapting extracted data for database storage. ICR algorithms collaborate with OCR to automate data entry from forms by removing the need for keystrokes. It has a high degree of accuracy and is a dependable method for processing various handwritten media quickly. == Capabilities == Most ICR software has a self-learning neural network-based algorithms, which automatically update the recognition database for new handwriting patterns. It extends the usefulness of scanning devices for the purpose of document processing, from printed character recognition (a function of OCR) to hand-written matter recognition. Because this process is involved in recognizing hand writing, accuracy levels may, in some circumstances, not be very good but can achieve 97%+ accuracy rates in reading handwriting in structured forms. Often to achieve these high recognition rates several read engines are used within the software and each is given elective voting rights to determine the true reading of characters. In numeric fields, engines which are designed to read numbers take preference, while in alpha fields, engines designed to read hand written letters have higher elective rights. When used in conjunction with a bespoke interface hub, hand-written data can be automatically populated into a back office system avoiding laborious manual keying and can be more accurate than traditional human data entry. === Automated forms processing === An important development of ICR was the invention of automated forms processing in 1993 by Joseph Corcoran who was awarded a patent on the invention. This involved a three-stage process of capturing the image of the form to be processed by ICR and preparing it to enable the ICR engine to give best results, then capturing the information using the ICR engine and finally processing the results to automatically validate the output from the ICR engine. This application of ICR increased the usefulness of the technology and made it applicable for use with real world forms in normal business applications. Modern software applications use ICR as a technology of recognizing text in forms filled in by hand (hand-printed). == Differences between ICR and OCR == === OCR === Optical character recognition (OCR) is commonly considered to apply to any recognition technique that reads machine printed text. An example of a traditional OCR use case would be to translate the characters from an image of a printed document, such as a book page, newspaper clipping, or legal contract, into a separate file that could be searched and updated with a word processor or document viewer. It's also quite helpful for automating the processing of forms. Information can be swiftly extracted from form fields and entered into another application, like a spreadsheet or database, by zonally applying the OCR engine to those fields. Yet, data is typically manually input rather than typed into form fields. Character identification becomes even more challenging while reading handwritten material. The diversity of more than 700,000 printed font variants is tiny compared to the near unlimited variations in hand-printed characters. The recognition program must take into account not just stylistic differences but also the kind of writing implement used, the standard of the paper, errors, hand stability, and smudges or running ink. === ICR === Intelligent character recognition (ICR) makes use of continuously improving algorithms to collect more information about the variances in hand-printed characters and more precisely identify them. ICR, which was created in the early 1990s to aid in the automation of forms processing, enables the conversion of manually entered data into text that is simple to read, search for, and change. When used to read characters that are obviously divided into distinct areas or zones, such as fixed fields seen on many structured forms, it works best. Both OCR and ICR can be configured to read a variety of languages; however, limiting the expected character set to a smaller number of languages will produce better recognition outcomes. ICR cannot read cursive handwriting since it must still be able to assess each character individually. While writing in cursive, it might be difficult to tell where one character ends and another one begins, and there are more differences across samples than when hand-printing text. A more recent method called intelligent word recognition (IWR) focuses on reading a word in context rather than recognizing individual characters. == Intelligent word recognition == Intelligent word recognition (IWR) can recognize and extract not only printed-handwritten information, cursive handwriting as well. ICR recognizes on the character-level, whereas IWR works with full words or phrases. Capable of capturing unstructured information from every day pages, IWR is said to be more evolved than hand print ICR. Not meant to replace conventional ICR and OCR systems, IWR is optimized for processing real-world documents that contain mostly free-form, hard-to-recognize data fields that are inherently unsuitable for ICR. This means that the highest and best use of IWR is to eliminate a high percentage of the manual entry of handwritten data and run-on hand print fields on documents that otherwise could be keyed only by humans.

    Read more →
  • Generalized filtering

    Generalized filtering

    Generalized filtering is a generic Bayesian filtering scheme for nonlinear state-space models. It is based on a variational principle of least action, formulated in generalized coordinates of motion. Note that "generalized coordinates of motion" are related to—but distinct from—generalized coordinates as used in (multibody) dynamical systems analysis. Generalized filtering furnishes posterior densities over hidden states (and parameters) generating observed data using a generalized gradient descent on variational free energy, under the Laplace assumption. Unlike classical (e.g. Kalman-Bucy or particle) filtering, generalized filtering eschews Markovian assumptions about random fluctuations. Furthermore, it operates online, assimilating data to approximate the posterior density over unknown quantities, without the need for a backward pass. Special cases include variational filtering, dynamic expectation maximization and generalized predictive coding. == Definition == Definition: Generalized filtering rests on the tuple ( Ω , U , X , S , p , q ) {\displaystyle (\Omega ,U,X,S,p,q)} : A sample space Ω {\displaystyle \Omega } from which random fluctuations ω ∈ Ω {\displaystyle \omega \in \Omega } are drawn Control states U ∈ R {\displaystyle U\in \mathbb {R} } – that act as external causes, input or forcing terms Hidden states X : X × U × Ω → R {\displaystyle X:X\times U\times \Omega \to \mathbb {R} } – that cause sensory states and depend on control states Sensor states S : X × U × Ω → R {\displaystyle S:X\times U\times \Omega \to \mathbb {R} } – a probabilistic mapping from hidden and control states Generative density p ( s ~ , x ~ , u ~ ∣ m ) {\displaystyle p({\tilde {s}},{\tilde {x}},{\tilde {u}}\mid m)} – over sensory, hidden and control states under a generative model m {\displaystyle m} Variational density q ( x ~ , u ~ ∣ μ ~ ) {\displaystyle q({\tilde {x}},{\tilde {u}}\mid {\tilde {\mu }})} – over hidden and control states with mean μ ~ ∈ R {\displaystyle {\tilde {\mu }}\in \mathbb {R} } Here ~ denotes a variable in generalized coordinates of motion: u ~ = [ u , u ′ , u ″ , … ] T {\displaystyle {\tilde {u}}=[u,u',u'',\ldots ]^{T}} === Generalized filtering === The objective is to approximate the posterior density over hidden and control states, given sensor states and a generative model – and estimate the (path integral of) model evidence p ( s ~ ( t ) | m ) {\displaystyle p({\tilde {s}}(t)\vert m)} to compare different models. This generally involves an intractable marginalization over hidden states, so model evidence (or marginal likelihood) is replaced with a variational free energy bound. Given the following definitions: μ ~ ( t ) = a r g m i n μ ~ { F ( s ~ ( t ) , μ ~ ) } {\displaystyle {\tilde {\mu }}(t)={\underset {\tilde {\mu }}{\operatorname {arg\,min} }}\{F({\tilde {s}}(t),{\tilde {\mu }})\}} G ( s ~ , x ~ , u ~ ) = − ln ⁡ p ( s ~ , x ~ , u ~ | m ) {\displaystyle G({\tilde {s}},{\tilde {x}},{\tilde {u}})=-\ln p({\tilde {s}},{\tilde {x}},{\tilde {u}}\vert m)} Denote the Shannon entropy of the density q {\displaystyle q} by H [ q ] = E q [ − log ⁡ ( q ) ] {\displaystyle H[q]=E_{q}[-\log(q)]} . We can then write the variational free energy in two ways: F ( s ~ , μ ~ ) = E q [ G ( s ~ , x ~ , u ~ ) ] − H [ q ( x ~ , u ~ | μ ~ ) ] = − ln ⁡ p ( s ~ | m ) + D K L [ q ( x ~ , u ~ | μ ~ ) | | p ( x ~ , u ~ | s ~ , m ) ] {\displaystyle F({\tilde {s}},{\tilde {\mu }})=E_{q}[G({\tilde {s}},{\tilde {x}},{\tilde {u}})]-H[q({\tilde {x}},{\tilde {u}}\vert {\tilde {\mu }})]=-\ln p({\tilde {s}}\vert m)+D_{KL}[q({\tilde {x}},{\tilde {u}}\vert {\tilde {\mu }})\vert \vert p({\tilde {x}},{\tilde {u}}\vert {\tilde {s}},m)]} The second equality shows that minimizing variational free energy (i) minimizes the Kullback-Leibler divergence between the variational and true posterior density and (ii) renders the variational free energy (a bound approximation to) the negative log evidence (because the divergence can never be less than zero). Under the Laplace assumption q ( x ~ , u ~ ∣ μ ~ ) = N ( μ ~ , C ) {\displaystyle q({\tilde {x}},{\tilde {u}}\mid {\tilde {\mu }})={\mathcal {N}}({\tilde {\mu }},C)} the variational density is Gaussian and the precision that minimizes free energy is C − 1 = Π = ∂ μ ~ μ ~ G ( μ ~ ) {\displaystyle C^{-1}=\Pi =\partial _{{\tilde {\mu }}{\tilde {\mu }}}G({\tilde {\mu }})} . This means that free-energy can be expressed in terms of the variational mean (omitting constants): F = G ( μ ~ ) + 1 2 ln ⁡ | ∂ μ ~ μ ~ G ( μ ~ ) | {\displaystyle F=G({\tilde {\mu }})+\textstyle {1 \over 2}\ln \vert \partial _{{\tilde {\mu }}{\tilde {\mu }}}G({\tilde {\mu }})\vert } The variational means that minimize the (path integral) of free energy can now be recovered by solving the generalized filter: μ ~ ˙ = D μ ~ − ∂ μ ~ F ( s ~ , μ ~ ) {\displaystyle {\dot {\tilde {\mu }}}=D{\tilde {\mu }}-\partial _{\tilde {\mu }}F({\tilde {s}},{\tilde {\mu }})} where D {\displaystyle D} is a block matrix derivative operator of identify matrices such that D u ~ = [ u ′ , u ″ , … ] T {\displaystyle D{\tilde {u}}=[u',u'',\ldots ]^{T}} === Variational basis === Generalized filtering is based on the following lemma: The self-consistent solution to μ ~ ˙ = D μ ~ − ∂ μ ~ F ( s , μ ~ ) {\displaystyle {\dot {\tilde {\mu }}}=D{\tilde {\mu }}-\partial _{\tilde {\mu }}F(s,{\tilde {\mu }})} satisfies the variational principle of stationary action, where action is the path integral of variational free energy S = ∫ d t F ( s ~ ( t ) , μ ~ ( t ) ) {\displaystyle S=\int dt\,F({\tilde {s}}(t),{\tilde {\mu }}(t))} Proof: self-consistency requires the motion of the mean to be the mean of the motion and (by the fundamental lemma of variational calculus) μ ~ ˙ = D μ ~ ⇔ ∂ μ ~ F ( s ~ , μ ~ ) = 0 ⇔ δ μ ~ S = 0 {\displaystyle {\dot {\tilde {\mu }}}=D{\tilde {\mu }}\Leftrightarrow \partial _{\tilde {\mu }}F({\tilde {s}},{\tilde {\mu }})=0\Leftrightarrow \delta _{\tilde {\mu }}S=0} Put simply, small perturbations to the path of the mean do not change variational free energy and it has the least action of all possible (local) paths. Remarks: Heuristically, generalized filtering performs a gradient descent on variational free energy in a moving frame of reference: μ ~ ˙ − D μ ~ = − ∂ μ ~ F ( s , μ ~ ) {\displaystyle {\dot {\tilde {\mu }}}-D{\tilde {\mu }}=-\partial _{\tilde {\mu }}F(s,{\tilde {\mu }})} , where the frame itself minimizes variational free energy. For a related example in statistical physics, see Kerr and Graham who use ensemble dynamics in generalized coordinates to provide a generalized phase-space version of Langevin and associated Fokker-Planck equations. In practice, generalized filtering uses local linearization over intervals Δ t {\displaystyle \Delta t} to recover discrete updates Δ μ ~ = ( exp ⁡ ( Δ t ⋅ J ) − I ) J − 1 μ ~ ˙ J = ∂ μ ~ μ ~ ˙ = D − ∂ μ ~ μ ~ F ( s ~ , μ ~ ) {\displaystyle {\begin{aligned}\Delta {\tilde {\mu }}&=(\exp(\Delta t\cdot J)-I)J^{-1}{\dot {\tilde {\mu }}}\\J&=\partial _{\tilde {\mu }}{\dot {\tilde {\mu }}}=D-\partial _{{\tilde {\mu }}{\tilde {\mu }}}F({\tilde {s}},{\tilde {\mu }})\end{aligned}}} This updates the means of hidden variables at each interval (usually the interval between observations). == Generative (state-space) models in generalized coordinates == Usually, the generative density or model is specified in terms of a nonlinear input-state-output model with continuous nonlinear functions: s = g ( x , u ) + ω s x ˙ = f ( x , u ) + ω x {\displaystyle {\begin{aligned}s&=g(x,u)+\omega _{s}\\{\dot {x}}&=f(x,u)+\omega _{x}\end{aligned}}} The corresponding generalized model (under local linearity assumptions) obtains the from the chain rule s ~ = g ~ ( x ~ , u ~ ) + ω ~ s s = g ( x , u ) + ω s s ′ = ∂ x g ⋅ x ′ + ∂ u g ⋅ u ′ + ω s ′ s ″ = ∂ x g ⋅ x ″ + ∂ u g ⋅ u ″ + ω s ″ ⋮ x ~ ˙ = f ~ ( x ~ , u ~ ) + ω ~ x x ˙ = f ( x , u ) + ω x x ˙ ′ = ∂ x f ⋅ x ′ + ∂ u f ⋅ u ′ + ω x ′ x ˙ ″ = ∂ x f ⋅ x ″ + ∂ u f ⋅ u ″ + ω x ″ ⋮ {\displaystyle {\begin{aligned}{\tilde {s}}&={\tilde {g}}({\tilde {x}},{\tilde {u}})+{\tilde {\omega }}_{s}\\\\s&=g(x,u)+\omega _{s}\\s'&=\partial _{x}g\cdot x'+\partial _{u}g\cdot u'+\omega '_{s}\\s''&=\partial _{x}g\cdot x''+\partial _{u}g\cdot u''+\omega ''_{s}\\&\vdots \\\end{aligned}}\qquad {\begin{aligned}{\dot {\tilde {x}}}&={\tilde {f}}({\tilde {x}},{\tilde {u}})+{\tilde {\omega }}_{x}\\\\{\dot {x}}&=f(x,u)+\omega _{x}\\{\dot {x}}'&=\partial _{x}f\cdot x'+\partial _{u}f\cdot u'+\omega '_{x}\\{\dot {x}}''&=\partial _{x}f\cdot x''+\partial _{u}f\cdot u''+\omega ''_{x}\\&\vdots \end{aligned}}} Gaussian assumptions about the random fluctuations ω {\displaystyle \omega } then prescribe the likelihood and empirical priors on the motion of hidden states p ( s ~ , x ~ , u ~ | m ) = p ( s ~ | x ~ , u ~ , m ) p ( D x ~ | x , u ~ , m ) p ( x | m ) p ( u ~ | m ) p ( s ~ | x ~ , u ~ , m ) = N ( g ~ ( x ~ , u ~ ) , Σ ~ ( x ~ , u ~ ) s ) p ( D x ~ | x , u ~ , m ) = N ( f ~ ( x ~ , u ~ ) , Σ ~ ( x ~ , u ~ ) x ) {\displayst

    Read more →
  • Joseph Keshet

    Joseph Keshet

    Joseph (Yossi) Keshet (Hebrew: יוסף (יוסי) קשת; born: 28 February 1973) is an Israeli professor in the Electrical and Computer Engineering Faculty of the Technion, where he is the director of the Speech, Language, and Deep Learning Lab. His research focuses on human speech processing and machine learning. == Early life and education == Keshet was born in Tel-Aviv. He graduated from the Amal School and began his academic studies at the Department of Electrical Engineering-Systems at Tel-Aviv University in 1991 and received his B.Sc. (Cum Laude) in 1994. Keshet served in the IDF Unit 8200 from 1995 to 2002 as the head of the speech processing research section in the R&D Center. During his service, he received a national award from the Administration for the Development of Weapons and Technological Infrastructure (Maf’at). Keshet was award his M.Sc. from the same department after he completed his Israel Defense Force service in 2002. His Dissertation was titled: Stop consonant spotting in continuous speech and was supervised by Dan Chazan from IBM Research Labs, Haifa. He continued his Ph.D. studies at the Hebrew University of Jerusalem until 2008. Prof. Yoram Singer supervised his thesis on Large Margin Algorithms for Discriminative Continuous Speech. == Career == Keshet was a Research Associate (postdoc) at IDIAP Research Institute, Martigny, Switzerland in 2007, and joined the TTI-Chicago and Department of Computer Science, University of Chicago, Chicago, IL in 2009 as Research Assistant Professor. In 2013, he returned to Israel and joined the Computer Science department at Bar-Ilan University as a senior lecturer and head of the Speech, Language, and Deep Learning Lab. In 2020, Keshet became a Founding Venture Partner at the Disruptive AI Venture Capital. In the same year, he also joined Amazon in Tel-Aviv as an Amazon Scholar. In 2022, Keshet joined the Faculty of Electrical and Computer Engineering at the Technion. == Research == Keshet's research work focuses on both machine learning and computational study of human speech and language. His work on speech and language concentrates on speech processing, speech recognition, acoustic phonetics, and pathological speech. In machine learning, Keshet is focused on deep learning and structured tasks. According to Google Scholar (September 2020), Keshet is one of the 15 most cited researchers in the field of spoken language processing. The algorithms that were developed in the Speech, Language, and Deep Learning Lab can analyze different pathological conditions in the throat and vocal cords based on the subject's voice. Other algorithms showed that the voice can be used to estimate physical and emotional state of the speaker. Another research led by Keshet suggested that it is possible to fool structured AI systems (like Google Voice). == Membership in professional societies == Keshet is the founder and chair of the Machine Learning for Speech and Language Processing Special Interest Group (SIGML) of the International Speech Communication Association (ISCA), from 2011. He is a senior member of the IEEE Signal Processing Society since 2018 and a member of ISCA since 2002. == Publications == Prof. Keshet has authored more than 70 scientific publications and edited one book. === Book === Joseph Keshet and Samy Bengio, Eds., Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, John Wiley & Sons, March 2009. === Selected articles === Jacob T. Cohen, Alma Cohen, Limor Benyamini, Yossi Adi, Joseph Keshet, Predicting glottal closure insufficiency using fundamental frequency contour analysis, Head & Neck, Journal of the Sciences and Specialities of the Head and Neck, Volume 41, Issue 7, pp. 2324–2331, July 2019. Yehoshua Dissen, Jacob Goldberger, and Joseph Keshet, Formant Estimation and Tracking: A Deep Learning Approach, Journal of the Acoustical Society of America, 145 (2), February 2019. Joseph Keshet, Automatic speech recognition: A primer for speech-language pathology researchers, International Journal of Speech-Language Pathology, Vol. 20 No. 6, pp. 599–609, 2018. Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, Joseph Keshet, Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring, Usenix, 2018. Tzeviya Fuchs, Joseph Keshet, Spoken Term Detection Automatically Adjusted for a Given Threshold, IEEE Journal of Selected Topics in Signal Processing, Dec 2017, Volume 11, Issue 8, pp. 1–8. Moustapha Cisse, Yossi Adi, Natalia Neverova, Joseph Keshet, Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples, Neural Information and Processing Systems (NIPS), 2017. Joseph Keshet, Subhransu Maji, Tamir Hazan, and Tommi Jaakkola, Perturbation Models and PAC-Bayesian Generalization Bounds, in Perturbations, Optimization, and Statistics, Tamir Hazan, George Papandreou, and Daniel Tarlow, Eds., The MIT Press, 2016. Matthew Goldrick, Joseph Keshet, Erin Gustafson, Jordana Heller, and Jeremy Needle, Automatic Analysis of Slips of the Tongue: Insights into the Cognitive Architecture of Speech Production, Cognition, 149, 31–39, 2016. Joseph Keshet, Optimizing the Measure of Performance in Structured Prediction, in Advanced Structured Prediction, Sebastian Nowozin, Peter V. Gehler, Jeremy January, and Christoph H. Lampert, Eds., The MIT Press, 2014. Morgan Sonderegger and Joseph Keshet, Automatic Measurement of Voice Onset Time using Discriminative Structured Prediction, Journal of the Acoustical Society of America, Vol. 132, Issue 6, pp. 3965−3979, 2012. David McAllester, Tamir Hazan and Joseph Keshet, Direct Loss Minimization for Structured Prediction, The 24th Annual Conference on Neural Information Processing Systems (NIPS), 2010. Joseph Keshet, David Grangier and Samy Bengio, Discriminative Keyword Spotting, Speech Communication, Volume 51, Issue 4, pp. 317–329, April 2009. == Personal life == Keshet is married to Lital. They have three children.

    Read more →
  • Alice AI (AI model family)

    Alice AI (AI model family)

    Alice AI is a neural network family developed by the Russian company Yandex LLC. Alice AI can create and revise texts, generate new ideas and capture the context of the conversation with the user. Alice AI is trained using a dataset which includes information from books, magazines, newspapers and other open sources available on the internet. The neural network may get facts wrong and hallucinate, but as it learns, it will produce increasingly accurate answers. == Usage == YandexGPT is integrated into virtual assistant Alice (an analog of Siri and Alexa) and is available in Yandex services and applications. The company gives businesses access to the neural network’s API through the public cloud platform Yandex Cloud and develops its own B2B solutions on its basis. Since July 2023, 800 companies have participated in the closed testing of YandexGPT. IT developers, banks, retail businesses, and companies from other industries can use the technology in two modes — API and Playground (an interface in the Yandex Cloud console for testing models and hypotheses). Two model versions are available to businesses: one works in asynchronous mode and is better able to handle complex tasks, while the other is suitable for creating quick responses in real time. As a result, YandexGPT has been tested in dozens of scenarios such as content tasks, tech support, creating chatbots, virtual assistants, etc. == History == In February 2023, Yandex announced that it was working on its own version of the ChatGPT generative neural network while developing a language model from the YaLM (Yet another Language Model) family. The project was tentatively named YaLM 2.0, which was later changed to YandexGPT. On May 17, the company unveiled a neural network called YandexGPT (YaGPT) and enabled its virtual assistant Alice to interact with the new language model. On June 15, 2023, Yandex added the YandexGPT language model to the image generation application Shedevrum. This enabled its users to create fully-fledged posts complete with a title, text, and relevant illustration. In July 2023, YandexGPT launched new features enabling businesses to create virtual assistants and chatbots, as well as generate and structure texts. On September 7, 2023, Yandex presented a new version of the language model, YandexGPT 2, at the Practical ML Conf. Compared to the previous one, the new version is able to perform more types of tasks, and the quality of answers has improved. The developers claimed that YandexGPT 2 answered user questions better than the first version in 67% of cases. From October 6, 2023, YandexGPT can create short retellings of online Russian-language videos on the Internet. It can summarize videos that are from two minutes to four hours long and contain speech.

    Read more →
  • AI Photo Editors Reviews: What Actually Works in 2026

    AI Photo Editors Reviews: What Actually Works in 2026

    Curious about the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Jian Ma (computational biologist)

    Jian Ma (computational biologist)

    Jian Ma (Chinese: 马坚) is an American computer scientist and computational biologist. He is the Ray and Stephanie Lane Professor of Computational Biology in the School of Computer Science at Carnegie Mellon University. He is a faculty member in the Ray and Stephanie Lane Computational Biology Department. His lab develops AI/ML methods to study the structure and function of the human genome and cellular organization and their implications for health and disease. During his Ph.D. and postdoc training, he developed algorithms to reconstruct the ancestral mammalian genome and evolutionary history. His research group has recently pioneered a series of new machine learning solutions for 3D genome organization, single-cell epigenomics, spatial omics, and complex molecular interactions. His lab also explores large language models to uncover gene regulatory mechanisms and the intricate connections among cellular components, with the aim of driving discovery and guiding experimentation. He received an NSF CAREER award in 2011. In 2020, he was awarded a Guggenheim Fellowship in Computer Science. He received the Allen Newell Award for Research Excellence (2025). He is an elected Fellow of the American Association for the Advancement of Science, the American Institute for Medical and Biological Engineering, the International Society for Computational Biology, and the Association for Computing Machinery. He leads an NIH 4D Nucleome Center to develop machine learning algorithms to better understand the cell nucleus. He served as the Program Chair for RECOMB 2024. He is also a member of the Scientific Advisory Board of the Chan Zuckerberg Biohub Chicago (CZ Biohub Chicago) and the RECOMB Steering Committee. In 2024, he launched the Center for AI-Driven Biomedical Research (AI4BIO) at CMU, which will be a catalyst for innovations at the intersection of AI and biomedicine across the School of Computer Science and campus. == Selected Recent Publications == Chen V#, Yang M#, Cui W, Kim JS, Talwalkar A, and Ma J. Applying interpretable machine learning in computational biology - pitfalls, recommendations and opportunities for new developments. Nature Methods, 21(8):1454-1461, 2024. Xiong K#, Zhang R#, and Ma J. scGHOST: Identifying single-cell 3D genome subcompartments. Nature Methods, 21(5):814-822, 2024. Zhou T, Zhang R, Jia D, Doty RT, Munday AD, Gao D, Xin L, Abkowitz JL, Duan Z, and Ma J. GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells. Nature Genetics, 56(8):1701-1711, 2024. Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, and Ma J. Computational methods for analysing multiscale 3D genome organization. Nature Reviews Genetics, 5(2):123-141, 2024. Chidester B#, Zhou T#, Alam S, and Ma J. SPICEMIX enables integrative single-cell spatial modeling of cell identity. Nature Genetics, 55(1):78-88, 2023. [Cover Article] Zhang R#, Zhou T#, and Ma J. Ultrafast and interpretable single-cell 3D genome analysis with Fast-Higashi. Cell Systems, 13(10):P798-807.E6, 2022. [Cover Article] Zhu X#, Zhang Y#, Wang Y, Tian D, Belmont AS, Swedlow JR, and Ma J. Nucleome Browser: An integrative and multimodal data navigation platform for 4D Nucleome. Nature Methods, 19(8):911-913, 2022. Zhang R, Zhou T, and Ma J. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nature Biotechnology, 40:254–261, 2022.

    Read more →
  • Yorick Wilks

    Yorick Wilks

    Yorick Alexander Wilks FBCS (27 October 1939 – 14 April 2023) was a British computer scientist. He was an emeritus professor of artificial intelligence at the University of Sheffield, visiting professor of artificial intelligence at Gresham College (a post created especially for him), senior research fellow at the Oxford Internet Institute, senior scientist at the Florida Institute for Human and Machine Cognition, and a member of the Epiphany Philosophers. In February 2023, Wilks joined WiredVibe as Director of AI and a Board Member, with the goal of commercializing his previous research and ideas. He remained in this role until his death, which occurred shortly before WiredVibe was acquired by AKY X, a company that continues to build on his legacy and contributions. == Biography == Wilks was born in Gerrards Cross, Buckinghamshire in England. He was educated at Torquay Boys' Grammar School, followed by Pembroke College, Cambridge, where he read Philosophy, joined the Epiphany Philosophers and obtained his Doctor of Philosophy degree (1968) under Professor R. B. Braithwaite for the thesis 'Argument and Proof'; he was an early pioneer in meaning-based approaches to the understanding of natural language content by computers. His main early contribution in the 1970s was called "Preference Semantics" (Wilks, 1973; Wilks and Fass, 1992), an algorithmic method for assigning the "most coherent" interpretation to a sentence in terms of having the maximum number of internal preferences of its parts (normally verbs or adjectives) satisfied. That early work was hand-coded with semantic entries (of the order of some hundreds) as was normal at the time, but since then has led to the empirical determinations of preferences (chiefly of English verbs) in the 1980s and 1990s. A key component of the notion of preference in semantics was that the interpretation of an utterance is not a well- or ill-formed notion, as was argued in Chomskyan approaches, such as those of Jerry Fodor and Jerrold Katz. It was rather that a semantic interpretation was the best available, even though some preferences might not be satisfied. So, in "The machine answered the question with a low whine" the agent of "answer" does not satisfy that verb's preference for a human answerer—which would cause it to be deemed ill-formed by Fodor and Katz—but is accepted as sub-optimal or metaphorical, and, now, conventional. The function of the algorithm is not to determine well-formedness at all but to make the optimal selection of word-senses to participate in the overall interpretation. Thus, in "The Pole answered..." the system will always select the human sense of the agent and not the inanimate one if it gives a more coherent interpretation overall. Preference Semantics is thus some of the earliest computational work—with programs run at Systems Development Corporation in Santa Monica in 1967 in LISP on an IBM360—in the now established field of word sense disambiguation. This approach was used in the first operational machine translation system based principally on meaning structures and built by Wilks at Stanford Artificial Intelligence Laboratory in the early 1970s (Wilks, 1973) at the same time and place as Roger Schank was applying his "Conceptual Dependency" approach to machine translation. The LISP code of Wilks' system was in The Computer Museum, Boston. Wilks was elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, a member of the UK Computing Research Committee, and a permanent member of ICCL, the International Committee on Computational Linguistics. He was professor of artificial intelligence at the University of Sheffield and a senior research fellow at the Oxford Internet Institute. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council to investigate in the field of large-scale information extraction (LaSIE); in the following years he would obtain more grants to carry on exploring the field of information extraction (AVENTINUS, ECRAN, PASTA...). In the 1990s Wilks also became interested in modelling human-computer dialogue and the team led by David Levy and him as chief researcher won the Loebner Prize in 1997. He was the founding director of the EU funded Companions Project on creating long-term computer companions for people. At his Festschrift in 2007 at the British Computer Society in London a volume of his own papers was presented along with a volume of essays in his honour. He was awarded the Antonio Zampolli prize in honour of his lifetime work at the LREC 2008 conference on 28 May 2008, and the Lifetime Achievement Award at the ACL 2008 conference on 18 June 2008. In 2009, he was awarded the British Computer Society's Lovelace Medal, its annual award for research achievement, and was awarded the Fellowship of the Association for Computing Machinery. In 1998, Wilks became head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as professor of artificial intelligence, a post he still held. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Wilks also set up the Natural Language Processing Group of the University of Sheffield. In 1994 he (along with Rob Gaizauskas and Hamish Cunningham) designed GATE, an advanced NLP architecture that has been widely distributed. National Life Stories conducted an oral history interview (C1672/24) with Yorick Wilks in 2016 for its Science and Religion collection held by the British Library. Wilks died on 14 April 2023, at the age of 83. == Awards == Wilks received many awards: (2009) Elected Fellow of the Association for Computing Machinery (2009) Lovelace Medal by the British Computer Society (2008) Zampolli Prize (ELRA, awarded at LREC in Marrakech, Morocco) (2008) Lifetime Achievement Award (Association for Computational Linguistics, in Columbus) (2006) Visiting Professor, University of Oxford (2004) Elected to UK Computing Research Committee (2004) Elected Fellow, British Computer Society (2003) Visiting Fellow, Oxford Internet Institute (1998) Elected Fellow of European Association for Artificial Intelligence (1997) Elected Fellow, EPSRC College of Computing (1991) Visiting Fellow, Trinity Hall, Cambridge (1991) Elected Fellow of the American Association for Artificial Intelligence (1983) Royal Society Travel Fellowship (1983) Commonwealth of Australia Visiting Professor (1981) Visiting Sloan Fellow, University of California, Berkeley (1980) Invited Participant in the Nobel Symposium on Language, Stockholm (1979) NATO Senior Scientist Fellowship (1979) Visiting Sloan Fellow, Yale University (1975) SRC Senior Visiting Fellowship, University of Edinburgh == Membership == Wilks was an active member of the following associations: Association for Computational Linguistics Society for the Study of AI and Simulation of Behaviour Association for Computing Machinery Cognitive Science Society British Society for the Philosophy of Science American Association for Artificial Intelligence Aristotelian Society == Selected works == === Books === Wilks, Y. (2019) Artificial Intelligence: Modern Magic or Dangerous Future?.Icon Books. New illustrated edition, 2023, MIT Press. Wilks, Y. (2015) Machine Translation: its scope and limits. Springer Wilks, Y (ed.) (2010) Close Engagements with Artificial Companions: Key Social, Psychological and Design issues. John Benjamins; Amsterdam Wilks, Y., Brewster, C. (2009) Natural Language Processing as a Foundation of the Semantic Web. Now Press: London. Wilks, Y. (2007) Words and Intelligence I, Selected papers by Yorick Wilks. In K. Ahmad, C. Brewster & M. Stevenson (eds.), Springer: Dordrecht. Wilks, Y. (ed. and with introduction and commentaries). (2006) Language, cohesion and form: selected papers of Margaret Masterman. Cambridge: Cambridge University Press. Wilks, Y., Nirenburg, S., Somers, H. (eds.) (2003) Readings in Machine Translation. Cambridge, MA: MIT Press. Wilks, Y.(ed.). (1999) Machine Conversations. Kluwer: New York. Wilks, Y., Slator, B., Guthrie, L. (1996) Electric Words: dictionaries, computers and meanings. Cambridge, MA: MIT Press. Ballim, A., Wilks, Y. (1991) Artificial Believers. Norwood, NJ: Erlbaum. Wilks, Y.(ed.). (1990) Theoretical Issues in Natural Language Processing. Norwood, NJ: Erlbaum. Wilks, Y., Partridge, D. (eds. plus three YW chapters and an introduction). (1990) The Foundations of Artificial Intelligence: a sourcebook. Cambridge: Cambridge University Press. Wilks, Y., Sparck-Jones, K.(eds.). (1984) Automatic Natural Language Processing, paperback edition. New York: Wiley. Originally published by Ellis Horwood. Wilks, Y., Charniak, E. (eds and principal authors). (1976) Computational Semantics—an Introduction to Artificial Intelligence and

    Read more →
  • History of natural language processing

    History of natural language processing

    The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence. == Early history == The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine. The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni, was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian, was more detailed. Troyanskii’s proposal included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on Esperanto. == Logical period == In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human. In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule-based system of syntactic structures. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first statistical machine translation systems were developed. Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies. In 1969 Roger Schank introduced the conceptual dependency theory for natural language understanding. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky. == Statistical period == Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing. This was due both to the steady increase in computational power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks. === Datasets === The emergence of statistical approaches was aided by both increase in computing power and the availability of large datasets. At that time, large multilingual corpora were starting to emerge. Notably, some were produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. Many of the notable early successes occurred in the field of machine translation. In 1993, the IBM alignment models were used for statistical machine translation. Compared to previous machine translation systems, which were symbolic systems manually coded by computational linguists, these systems were statistical, which allowed them to automatically learn from large textual corpora. Though these systems do not work well in situations where only small corpora is available, so data-efficient methods continue to be an area of research and development. In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very large" at the time, was used for word disambiguation. To take advantage of large, unlabelled datasets, algorithms were developed for unsupervised and self-supervised learning. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results. == Neural period == Neural language models were developed in 1990s. In 1990, the Elman network, using a recurrent neural network, encoded each word in a training set as a vector, called a word embedding, and the whole vocabulary as a vector database, allowing it to perform such tasks as sequence-predictions that are beyond the power of a simple multilayer perceptron. A shortcoming of the static embeddings was that they didn't differentiate between multiple meanings of homonyms. Yoshua Bengio developed the first neural probabilistic language model in 2000. Novel algorithms, availability of larger datasets and higher processing power made possible training of larger and larger language models. Attention mechanism was introduced by Bahdanau et al. in 2014. This work laid the foundations for the famous "Attention Is All You Need" paper that introduced the Transformer architecture in 2017. The concept of large language model (LLM) emerged in late 2010s. LLM is a language model trained with self-supervised learning on vast amount of text. Earliest public LLMs had hundreds of millions of parameters, but this number quickly rose to billion and even trillions. In recent years, advancements in deep learning and large language models have significantly enhanced the capabilities of natural language processing, leading to widespread applications in areas such as healthcare, customer service, and content generation. == Software ==

    Read more →
  • Trigram tagger

    Trigram tagger

    In computational linguistics, a trigram tagger is a statistical method for automatically identifying words as being nouns, verbs, adjectives, adverbs, etc. based on second order Markov models that consider triples of consecutive words. It is trained on a text corpus as a method to predict the next word, taking the product of the probabilities of unigram, bigram and trigram. In speech recognition, algorithms utilizing trigram-tagger score better than those algorithms utilizing IIMM tagger but less well than Net tagger. The description of the trigram tagger is provided by Brants (2000).

    Read more →
  • Is an AI Blog Writer Worth It in 2026?

    Is an AI Blog Writer Worth It in 2026?

    Trying to pick the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →