AI Assistant Jobs Remote

AI Assistant Jobs Remote — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Automated restaurant

    Automated restaurant

    An automated restaurant or robotic restaurant is a restaurant that uses robots to do tasks such as delivering food and drink to the tables or cooking the food. Restaurant automation means the use of a restaurant management system to automate some or occasionally all of the major operations of a restaurant establishment. More recently, restaurants are opening that have completely or partially automated their services. These may include: taking orders, preparing food, serving, and billing. A few fully automated restaurants operate without any human intervention whatsoever. Robots are designed to help and sometimes replace human labour (such as waiters and chefs). The automation of restaurants may also allow for the option for greater customization of an order. == History == === Vending machines === In the late 19th and early 20th century a number of restaurants served food solely through vending machines. These restaurants were called automats or, in Japan, shokkenki. Customers ordered their food directly through the machines. === Sushi conveyors === Yoshiaki Shiraishi is a Japanese innovator who is known for the creation of conveyor belt sushi. He had the idea following difficulty staffing his small sushi restaurant and managing the restaurant on his own. He was inspired seeing beer bottles on a conveyor belt in an Asahi brewery. Yoshiaki's restaurants are an early example of restaurant automation; they used a conveyor belt to distribute dishes around the restaurant, eliminating the need for waiters. This example of automation dates back to the Japanese economic miracle; the first of Yoshiaki's conveyor belt sushi restaurants was opened under the name Mawaru Genroku Sushi in 1958, in Osaka. === Partial automation === As of 2011, across Europe, McDonald's had already begun implementing 7,000 touch screen kiosks that could handle cashiering duties. From 2015 to 2020, Zume had an automated pizza parlor. Later companies would try to produce smaller, less ambitious devices, with one robotics company producing a machine that could automate the slowest and most repetitive parts of assembling a pizza, such as spreading pizza sauce or placing slices of pepperoni, while leaving other customizations to employees. In 2020, a restaurant in the Netherlands began trialling the use of a robot to serve guests. In September 2021, Karakuri's 'Semblr' food service robot served personalised lunches for the 4,000 employees of grocery technology solutions provider ocado Group's head offices in Hatfield, UK. 2,700 different combinations of dishes were on offer. Customers could specify in grams what hot and cold items, proteins, sauces and fresh toppings they wanted. In 2021, Columbia University School of Engineering and Applied Science engineers developed a method of cooking 3D printed chicken with software-controlled robotic lasers. The “Digital Food” team exposed raw 3D printed chicken structures to both blue and infrared light. They then assessed the cooking depth, colour development, moisture retention and flavour differences of the laser-cooked 3D printed samples in comparison to stove-cooked meat. In June 2022 a California nonprofit chain of residential communities, Front Porch, experimented with robots in dining rooms at two locations to supplement wait staff by carrying plated food and drink to tables, and removing dishes. 65% of residents found the robots helpful, with 51% saying they let the staff spend more quality time with diners. 51% of staff were "excited" and 58% said they enabled more quality time with diners. The chain has 19 senior living communities (and 35 affordable housing communities), so it has potential to expand robots to more dining rooms. It is shifting to memory care, which may affect plans. == Rationales == === Advantages === Efficiency: Automated restaurants can significantly enhance operational efficiency by minimizing human error and reducing service time. With automated ordering, payment, and food preparation systems, customers can enjoy faster service and reduced waiting times. Cost savings: By reducing the need for human staff, automated restaurants can potentially lower labor costs. This can be particularly beneficial in areas with high labor expenses, as it allows for better resource allocation and cost management. Consistency: Automation ensures consistency in food quality and presentation. With precise portion control and standardized cooking methods, customers can expect the same quality and taste in their meals every time they visit. Enhanced customer experience: Self-service kiosks and automated systems provide customers with control and convenience. They can customize their orders, browse through menu options, and pay seamlessly, creating a more interactive and satisfying dining experience. === Disadvantages === Lack of personal touch: Automated restaurants may lack the personal interaction and warmth that traditional restaurants provide. Some customers prefer the human touch, personalized recommendations, and the social aspect of dining out. Technical issues: Reliance on technology means that technical glitches and malfunctions can occur, resulting in service disruptions or delays. Maintenance and technical support become critical in ensuring smooth operations. Limited menu complexity: The automation process may be better suited for standardized menu items rather than complex or customized dishes. The ability to cater to unique dietary preferences or accommodate special requests may be limited. Employment implications: Automated restaurants may result in job losses for traditional restaurant staff, potentially impacting the local workforce. It is important to consider the social and economic implications of adopting such technology. == Locations == Automated restaurants have been opening in many countries. Examples include: Nala Restaurant in Naperville, Illinois Fritz's Railroad Restaurant in Kansas City, Kansas Výtopna, a Railway Restaurant using model trains: franchise of various restaurants and coffeehouses in the Czech Republic Bagger's Restaurant in Nuremberg, Germany FuA-Men Restaurant, a ramen restaurant located in Nagoya, Japan Fōster Nutrition in Buenos Aires, Argentina Dalu Robot Restaurant in Jinan, China Haohai Robot Restaurant in Harbin, China Robot Kitchen Restaurant in Hong Kong Robo-Chef restaurant in Tehran, Iran, started in 2017, is the first robotic and "waiterless" restaurant of the Middle East. MIT graduates opened Spyce Kitchens in downtown Boston, Massachusetts, in 2018 Foodom, under Country Garden Holdings, opened January 12, 2020, in Guangzhou, China Robot Chacha, the first robot restaurant of India, is planning to open in the capital city of New Delhi. Kura Revolving Sushi Bar, with a number of locations in the United States, uses a tablets at tables for ordering, a conveyor belt to deliver food, and robots to deliver drinks and condiments. Chipotle Mexican Grill is beginning to deploy the Hyphen Makeline, which assembles up to 350 bowls and salads automatically per hour, and Chippy, an automatic tortilla chip fryer made by Miso Robotics. Serious Dumplings in Boca Raton, Florida

    Read more →
  • Markov model

    Markov model

    In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property. == Introduction == Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. There are four common Markov models used in different situations, depending on whether every sequential state is observable or not, and whether the system is to be adjusted on the basis of observations made: == Markov chain == The simplest Markov model is the Markov chain. It models the state of a system with a random variable that changes through time. In this context, the Markov property indicates that the distribution for this variable depends only on the distribution of a previous state. An example use of a Markov chain is Markov chain Monte Carlo, which uses the Markov property to prove that a particular method for performing a random walk will sample from the joint distribution. == Hidden Markov model == A hidden Markov model is a Markov chain for which the state is only partially observable or noisily observable. In other words, observations are related to the state of the system, but they are typically insufficient to precisely determine the state. Several well-known algorithms for hidden Markov models exist. For example, given a sequence of observations, the Viterbi algorithm will compute the most-likely corresponding sequence of states, the forward algorithm will compute the probability of the sequence of observations, and the Baum–Welch algorithm will estimate the starting probabilities, the transition function, and the observation function of a hidden Markov model. One common use is for speech recognition, where the observed data is the speech audio waveform and the hidden state is the spoken text. In this example, the Viterbi algorithm finds the most likely sequence of spoken words given the speech audio. == Markov decision process == A Markov decision process is a Markov chain in which state transitions depend on the current state and an action vector that is applied to the system. Typically, a Markov decision process is used to compute a policy of actions that will maximize some utility with respect to expected rewards. == Partially observable Markov decision process == A partially observable Markov decision process (POMDP) is a Markov decision process in which the state of the system is only partially observed. POMDPs are known to be NP complete, but recent approximation techniques have made them useful for a variety of applications, such as controlling simple agents or robots. == Markov random field == A Markov random field, or Markov network, may be considered to be a generalization of a Markov chain in multiple dimensions. In a Markov chain, state depends only on the previous state in time, whereas in a Markov random field, each state depends on its neighbors in any of multiple directions. A Markov random field may be visualized as a field or graph of random variables, where the distribution of each random variable depends on the neighboring variables with which it is connected. More specifically, the joint distribution for any random variable in the graph can be computed as the product of the "clique potentials" of all the cliques in the graph that contain that random variable. Modeling a problem as a Markov random field is useful because it implies that the joint distributions at each vertex in the graph may be computed in this manner. == Hierarchical Markov models == Hierarchical Markov models can be applied to categorize human behavior at various levels of abstraction. For example, a series of simple observations, such as a person's location in a room, can be interpreted to determine more complex information, such as in what task or activity the person is performing. Two kinds of Hierarchical Markov Models are the Hierarchical hidden Markov model and the Abstract Hidden Markov Model. Both have been used for behavior recognition and certain conditional independence properties between different levels of abstraction in the model allow for faster learning and inference. == Tolerant Markov model == A Tolerant Markov model (TMM) is a probabilistic-algorithmic Markov chain model. It assigns the probabilities according to a conditioning context that considers the last symbol, from the sequence to occur, as the most probable instead of the true occurring symbol. A TMM can model three different natures: substitutions, additions or deletions. Successful applications have been efficiently implemented in DNA sequences compression. == Markov-chain forecasting models == Markov-chains have been used as a forecasting methods for several topics, for example price trends, wind power and solar irradiance. The Markov-chain forecasting models utilize a variety of different settings, from discretizing the time-series to hidden Markov-models combined with wavelets and the Markov-chain mixture distribution model (MCM).

    Read more →
  • Quickprop

    Quickprop

    Quickprop is an iterative method for determining the minimum of the loss function of an artificial neural network, following an algorithm inspired by the Newton's method. Sometimes, the algorithm is classified to the group of the second order learning methods. It follows a quadratic approximation of the previous gradient step and the current gradient, which is expected to be close to the minimum of the loss function, under the assumption that the loss function is locally approximately square, trying to describe it by means of an upwardly open parabola. The minimum is sought in the vertex of the parabola. The procedure requires only local information of the artificial neuron to which it is applied. The k {\displaystyle k} -th approximation step is given by: Δ ( k ) w i j = Δ ( k − 1 ) w i j ( ∇ i j E ( k ) ∇ i j E ( k − 1 ) − ∇ i j E ( k ) ) {\displaystyle \Delta ^{(k)}\,w_{ij}=\Delta ^{(k-1)}\,w_{ij}\left({\frac {\nabla _{ij}\,E^{(k)}}{\nabla _{ij}\,E^{(k-1)}-\nabla _{ij}\,E^{(k)}}}\right)} Where w i j {\displaystyle w_{ij}} is the weight of input i {\displaystyle i} of neuron j {\displaystyle j} , and E {\displaystyle E} is the loss function. The Quickprop algorithm is an implementation of the error backpropagation algorithm, but the network can behave chaotically during the learning phase due to large step sizes.

    Read more →
  • Gremlin (query language)

    Gremlin (query language)

    Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, and hybrid depth- and breadth-first evaluation with Turing completeness. As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing. == History == 2009-10-30 the project is born, and immediately named "TinkerPop" 2009-12-25 v0.1 is the first release 2011-05-21 v1.0 is released 2012-05-24 v2.0 is released 2015-01-16 TinkerPop becomes an Apache Incubator project 2015-07-09 v3.0.0-incubating is released 2016-05-23 Apache TinkerPop becomes a top-level project 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop 2017-12-17 v3.3.1 is released 2018-05-08 v3.3.3 is released 2019-08-05 v3.4.3 is released 2020-02-20 v3.4.6 is released 2021-05-01 v3.5.0 is released 2022-04-04 v3.6.0 is released 2023-07-31 v3.7.0 is released 2025-11-12 v3.8.0 is released == Vendor integration == Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin. == Traversal examples == The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the MovieLens dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below. === Simple traversals === For each vertex in the graph, emit its label, then group and count each distinct label. What year was the oldest movie made? What is Die Hard's average rating? === Projection traversals === For each category, emit a map of its name and the number of movies it represents. For each movie with at least 11 ratings, emit a map of its name and average rating. Sort the maps in decreasing order by their average rating. Emit the first 10 maps (i.e. top 10). === Declarative pattern matching traversals === Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match()-step. What 80's action movies do 30-something programmers like? Group count the movies by their name and sort the group count map in decreasing order by value. Clip the map to the top 10 and emit the map entries. === OLAP traversal === Which movies are most central in the implicit 5-stars graph? == Gremlin graph traversal machine == Gremlin is a virtual machine composed of an instruction set as well as an execution engine. An analogy is drawn between Gremlin and Java. === Gremlin steps (instruction set) === The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect. The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports function composition and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a traversal. A string representation of the traversal above provided below. The steps are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin instruction set is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query. Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the JVM). For instance, the popular SPARQL graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query would compile to In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal). === Gremlin Machine (virtual machine) === The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both graph databases (OLTP) and graph processors (OLAP).

    Read more →
  • Percept (artificial intelligence)

    Percept (artificial intelligence)

    A percept is the input that an intelligent agent is perceiving at any given moment. It is essentially the same concept as a percept in psychology, except that it is being perceived not by the brain but by the agent. A percept is detected by a sensor, often a camera, processed accordingly, and acted upon by an actuator. Each percept is added to a "percept sequence", which is a complete history of each percept ever detected. The agent's action at any instant point may depend on the entire percept sequence up to that particular instant point. An intelligent agent chooses how to act not only based on the current percept, but the percept sequence. The next action is chosen by the agent function, which maps every percept to an action. For example, if a camera were to record a gesture, the agent would process the percepts, calculate the corresponding spatial vectors, examine its percept history, and use the agent program (the application of the agent function) to act accordingly. == Examples == Examples of percepts include inputs from touch sensors, cameras, infrared sensors, sonar, microphones, mice, and keyboards. A percept can also be a higher-level feature of the data, such as lines, depth, objects, faces, or gestures.

    Read more →
  • Stochastic gradient descent

    Stochastic gradient descent

    Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate. The basic idea behind stochastic approximation can be traced back to the Robbins–Monro algorithm of the 1950s. Today, stochastic gradient descent has become an important optimization method in machine learning. == Background == Both statistical estimation and machine learning consider the problem of minimizing an objective function that has the form of a sum: Q ( w ) = 1 n ∑ i = 1 n Q i ( w ) , {\displaystyle Q(w)={\frac {1}{n}}\sum _{i=1}^{n}Q_{i}(w),} where the parameter w {\displaystyle w} that minimizes Q ( w ) {\displaystyle Q(w)} is to be estimated. Each summand function Q i {\displaystyle Q_{i}} is typically associated with the i {\displaystyle i} -th observation in the data set (used for training). In classical statistics, sum-minimization problems arise in least squares and in maximum-likelihood estimation (for independent observations). The general class of estimators that arise as minimizers of sums are called M-estimators. However, in statistics, it has been long recognized that requiring even local minimization is too restrictive for some problems of maximum-likelihood estimation. Therefore, contemporary statistical theorists often consider stationary points of the likelihood function (or zeros of its derivative, the score function, and other estimating equations). The sum-minimization problem also arises for empirical risk minimization. There, Q i ( w ) {\displaystyle Q_{i}(w)} is the value of the loss function at i {\displaystyle i} -th example, and Q ( w ) {\displaystyle Q(w)} is the empirical risk. When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations: w := w − η ∇ Q ( w ) = w − η n ∑ i = 1 n ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q(w)=w-{\frac {\eta }{n}}\sum _{i=1}^{n}\nabla Q_{i}(w).} The step size is denoted by η {\displaystyle \eta } (sometimes called the learning rate in machine learning) and here " := {\displaystyle :=} " denotes the update of a variable in the algorithm. In many cases, the summand functions have a simple form that enables inexpensive evaluations of the sum-function and the sum gradient. For example, in statistics, one-parameter exponential families allow economical function-evaluations and gradient-evaluations. However, in other cases, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. When the training set is enormous and no simple formulas exist, evaluating the sums of gradients becomes very expensive, because evaluating the gradient requires evaluating all the summand functions' gradients. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems. == Iterative method == In stochastic (or "on-line") gradient descent, the true gradient of Q ( w ) {\displaystyle Q(w)} is approximated by a gradient at a single sample: w := w − η ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q_{i}(w).} As the algorithm sweeps through the training set, it performs the above update for each training sample. Several passes can be made over the training set until the algorithm converges. If this is done, the data can be shuffled for each pass to prevent cycles. Typical implementations may use an adaptive learning rate so that the algorithm converges. In pseudocode, stochastic gradient descent can be presented as : A compromise between computing the true gradient and the gradient at a single sample is to compute the gradient against more than one training sample (called a "mini-batch") at each step. This can perform significantly better than "true" stochastic gradient descent described, because the code can make use of vectorization libraries rather than computing each step separately as was first shown in where it was called "the bunch-mode back-propagation algorithm". It may also result in smoother convergence, as the gradient computed at each step is averaged over more training samples. The convergence of stochastic gradient descent has been analyzed using the theories of convex minimization and of stochastic approximation. Briefly, when the learning rates η {\displaystyle \eta } decrease with an appropriate rate, and subject to relatively mild assumptions, stochastic gradient descent converges almost surely to a global minimum when the objective function is convex or pseudoconvex, and otherwise converges almost surely to a local minimum. This is in fact a consequence of the Robbins–Siegmund theorem. == Linear regression == Suppose we want to fit a straight line y ^ = w 1 + w 2 x {\displaystyle {\hat {y}}=w_{1}+w_{2}x} to a training set with observations ( ( x 1 , y 1 ) , ( x 2 , y 2 ) … , ( x n , y n ) ) {\displaystyle ((x_{1},y_{1}),(x_{2},y_{2})\ldots ,(x_{n},y_{n}))} and corresponding estimated responses ( y ^ 1 , y ^ 2 , … , y ^ n ) {\displaystyle ({\hat {y}}_{1},{\hat {y}}_{2},\ldots ,{\hat {y}}_{n})} using least squares. The objective function to be minimized is Q ( w ) = ∑ i = 1 n Q i ( w ) = ∑ i = 1 n ( y ^ i − y i ) 2 = ∑ i = 1 n ( w 1 + w 2 x i − y i ) 2 . {\displaystyle Q(w)=\sum _{i=1}^{n}Q_{i}(w)=\sum _{i=1}^{n}\left({\hat {y}}_{i}-y_{i}\right)^{2}=\sum _{i=1}^{n}\left(w_{1}+w_{2}x_{i}-y_{i}\right)^{2}.} The last line in the above pseudocode for this specific problem will become: [ w 1 w 2 ] ← [ w 1 w 2 ] − η [ ∂ ∂ w 1 ( w 1 + w 2 x i − y i ) 2 ∂ ∂ w 2 ( w 1 + w 2 x i − y i ) 2 ] = [ w 1 w 2 ] − η [ 2 ( w 1 + w 2 x i − y i ) 2 x i ( w 1 + w 2 x i − y i ) ] . {\displaystyle {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}\leftarrow {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}{\frac {\partial }{\partial w_{1}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\\{\frac {\partial }{\partial w_{2}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\end{bmatrix}}={\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}2(w_{1}+w_{2}x_{i}-y_{i})\\2x_{i}(w_{1}+w_{2}x_{i}-y_{i})\end{bmatrix}}.} Note that in each iteration or update step, the gradient is only evaluated at a single x i {\displaystyle x_{i}} . This is the key difference between stochastic gradient descent and batched gradient descent. In general, given a linear regression y ^ = ∑ k ∈ 1 : m w k x k {\displaystyle {\hat {y}}=\sum _{k\in 1:m}w_{k}x_{k}} problem, stochastic gradient descent behaves differently when m < n {\displaystyle m

  • Transkribus

    Transkribus

    Transkribus is a platform for the text recognition, image analysis and structure recognition of historical documents. The platform was created in the context of the two EU projects "tranScriptorium" (2013–2015) and "READ" (Recognition and Enrichment of Archival Documents – 2016–2019). It was developed by the University of Innsbruck. Since July 1, 2019 the platform has been directed and further developed by the READ-COOP, a non-profit cooperative. The platform integrates tools developed by research groups throughout Europe, including the Pattern Recognition and Human Language Technology (PRHLT) group of the Technical University of Valencia and the Computational Intelligence Technology Lab (CITlab) group of University of Rostock. Comparable programs that offer similar functions are eScriptorium and OCR4All.

    Read more →
  • Plate notation

    Plate notation

    In Bayesian inference, plate notation is a method of representing variables that repeat in a graphical model. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of repetitions of the subgraph in the plate. The assumptions are that the subgraph is duplicated that many times, the variables in the subgraph are indexed by the repetition number, and any links that cross a plate boundary are replicated once for each subgraph repetition. == Example == In this example, we consider Latent Dirichlet allocation, a Bayesian network that models how documents in a corpus are topically related. There are two variables not in any plate; α is the parameter of the uniform Dirichlet prior on the per-document topic distributions, and β is the parameter of the uniform Dirichlet prior on the per-topic word distribution. The outermost plate represents all the variables related to a specific document, including θ i {\displaystyle \theta _{i}} , the topic distribution for document i. The M in the corner of the plate indicates that the variables inside are repeated M times, once for each document. The inner plate represents the variables associated with each of the N i {\displaystyle N_{i}} words in document i: z i j {\displaystyle z_{ij}} is the topic distribution for the jth word in document i, and w i j {\displaystyle w_{ij}} is the actual word used. The N in the corner represents the repetition of the variables in the inner plate N j {\displaystyle N_{j}} times, once for each word in document i. The circle representing the individual words is shaded, indicating that each w i j {\displaystyle w_{ij}} is observable, and the other circles are empty, indicating that the other variables are latent variables. The directed edges between variables indicate dependencies between the variables: for example, each w i j {\displaystyle w_{ij}} depends on z i j {\displaystyle z_{ij}} and β. == Extensions == A number of extensions have been created by various authors to express more information than simply the conditional relationships. However, few of these have become standard. Perhaps the most commonly used extension is to use rectangles in place of circles to indicate non-random variables—either parameters to be computed, hyperparameters given a fixed value (or computed through empirical Bayes), or variables whose values are computed deterministically from a random variable. The diagram on the right shows a few more non-standard conventions used in some articles in Wikipedia (e.g. variational Bayes): Variables that are actually random vectors are indicated by putting the vector size in brackets in the middle of the node. Variables that are actually random matrices are similarly indicated by putting the matrix size in brackets in the middle of the node, with commas separating row size from column size. Categorical variables are indicated by placing their size (without a bracket) in the middle of the node. Categorical variables that act as "switches", and which pick one or more other random variables to condition on from a large set of such variables (e.g. mixture components), are indicated with a special type of arrow containing a squiggly line and ending in a T junction. Boldface is consistently used for vector or matrix nodes (but not categorical nodes). == Software implementation == Plate notation has been implemented in various TeX/LaTeX drawing packages, but also as part of graphical user interfaces to Bayesian statistics programs such as BUGS and BayesiaLab and PyMC.

    Read more →
  • Rule induction

    Rule induction

    Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data. Data mining in general and rule induction in detail are trying to create algorithms without human programming but with analyzing existing data structures. In the easiest case, a rule is expressed with “if-then statements” and was created with the ID3 algorithm for decision tree learning. Rule learning algorithm are taking training data as input and creating rules by partitioning the table with cluster analysis. A possible alternative over the ID3 algorithm is genetic programming which evolves a program until it fits to the data. Creating different algorithm and testing them with input data can be realized in the WEKA software. Additional tools are machine learning libraries for Python, like scikit-learn. == Paradigms == Some major rule induction paradigms are: Association rule learning algorithms (e.g., Agrawal) Decision rule algorithms (e.g., Quinlan 1987) Hypothesis testing algorithms (e.g., RULEX) Horn clause induction Version spaces Rough set rules Inductive Logic Programming Boolean decomposition (Feldman) == Algorithms == Some rule induction algorithms are: Charade Rulex Progol CN2

    Read more →
  • Independent component analysis

    Independent component analysis

    In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA was invented by Jeanny Hérault and Christian Jutten in 1985. ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room. == Introduction == Independent component analysis attempts to decompose a multivariate signal into independent non-Gaussian signals. As an example, sound is usually a signal that is composed of the numerical addition, at each time t, of signals from several sources. The question then is whether it is possible to separate these contributing sources from the observed total signal. When the statistical independence assumption is correct, blind ICA separation of a mixed signal gives very good results. It is also used for signals that are not supposed to be generated by mixing for analysis purposes. A simple application of ICA is the "cocktail party problem", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes. Note that a filtered and delayed signal is a copy of a dependent component, and thus the statistical independence assumption is not violated. Mixing weights for constructing the M {\textstyle M} observed signals from the N {\textstyle N} components can be placed in an M × N {\textstyle M\times N} matrix. An important thing to consider is that if N {\textstyle N} sources are present, at least N {\textstyle N} observations (e.g. microphones if the observed signal is audio) are needed to recover the original signals. When there are an equal number of observations and source signals, the mixing matrix is square ( M = N {\textstyle M=N} ). Other cases of underdetermined ( M < N {\textstyle M N {\textstyle M>N} ) have been investigated. The success of ICA separation of mixed signals relies on two assumptions and three effects of mixing source signals. Two assumptions: The source signals are independent of each other. The values in each source signal have non-Gaussian distributions. Three effects of mixing source signals: Independence: As per assumption 1, the source signals are independent; however, their signal mixtures are not. This is because the signal mixtures share the same source signals. Normality: According to the Central Limit Theorem, the distribution of a sum of independent random variables with finite variance tends towards a Gaussian distribution.Loosely speaking, a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original variables. Here we consider the value of each signal as the random variable. Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal. Those principles contribute to the basic establishment of ICA. If the signals extracted from a set of mixtures are independent and have non-Gaussian distributions or have low complexity, then they must be source signals. Another common example is image steganography, where ICA is used to embed one image within another. For instance, two grayscale images can be linearly combined to create mixed images in which the hidden content is visually imperceptible. ICA can then be used to recover the original source images from the mixtures. This technique underlies digital watermarking, which allows the embedding of ownership information into images, as well as more covert applications such as undetected information transmission. The method has even been linked to real-world cyberespionage cases. In such applications, ICA serves to unmix the data based on statistical independence, making it possible to extract hidden components that are not apparent in the observed data. Steganographic techniques, including those potentially involving ICA-based analysis, have been used in real-world cyberespionage cases. In 2010, the FBI uncovered a Russian spy network known as the "Illegals Program" (Operation Ghost Stories), where agents used custom-built steganography tools to conceal encrypted text messages within image files shared online. In another case, a former General Electric engineer, Xiaoqing Zheng, was convicted in 2022 for economic espionage. Zheng used steganography to exfiltrate sensitive turbine technology by embedding proprietary data within image files for transfer to entities in China. == Defining component independence == ICA finds the independent components (also called factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define a proxy for independence, and this choice governs the form of the ICA algorithm. The two broadest definitions of independence for ICA are Minimization of mutual information Maximization of non-Gaussianity The Minimization-of-Mutual information (MMI) family of ICA algorithms uses measures like Kullback-Leibler Divergence and maximum entropy. The non-Gaussianity family of ICA algorithms, motivated by the central limit theorem, uses kurtosis and negentropy. Typical algorithms for ICA use centering (subtract the mean to create a zero mean signal), whitening (usually with the eigenvalue decomposition), and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. == Mathematical definitions == Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case. === General Derivation === In the classical ICA model, it is assumed that the observed data x i ∈ R m {\displaystyle \mathbf {x} _{i}\in \mathbb {R} ^{m}} at time t i {\displaystyle t_{i}} is generated from source signals s i ∈ R m {\displaystyle \mathbf {s} _{i}\in \mathbb {R} ^{m}} via a linear transformation x i = A s i {\displaystyle \mathbf {x} _{i}=A\mathbf {s} _{i}} , where A {\displaystyle A} is an unknown, invertible mixing matrix. To recover the source signals, the data is first centered (zero mean), and then whitened so that the transformed data has unit covariance. This whitening reduces the problem from estimating a general matrix A {\displaystyle A} to estimating an orthogonal matrix V {\displaystyle V} , significantly simplifying the search for independent components. If the covariance matrix of the centered data is Σ x = A A ⊤ {\displaystyle \Sigma _{x}=AA^{\top }} , then using the eigen-decomposition Σ x = Q D Q ⊤ {\displaystyle \Sigma _{x}=QDQ^{\top }} , the whitening transformation can be taken as D − 1 / 2 Q ⊤ {\displaystyle D^{-1/2}Q^{\top }} . This step ensures that the recovered sources are uncorrelated and of unit variance, leaving only the task of rotating the whitened data to maximize statistical independence. This general derivation underlies many ICA algorithms and is foundational in understanding the ICA model. ==== Reduced Mixing Problem ==== Independent component analysis (ICA) addresses the problem of recovering a set of unobserved source signals s i = ( s i 1 , s i 2 , … , s i m ) T {\displaystyle s_{i}=(s_{i1},s_{i2},\dots ,s_{im})^{T}} from observed mixed signals x i = ( x i 1 , x i 2 , … , x i m ) T {\displaystyle x_{i}=(x_{i1},x_{i2},\dots ,x_{im})^{T}} , based on the linear mixing model: x i = A s i , {\displaystyle x_{i}=A\,s_{i},} where the A {\displaystyle A} is an m × m {\displaystyle m\times m} invertible matrix called the mixing matrix, s i {\displaystyle s_{i}} represents the m‑dimensional vector containing the values of the sources at time t i {\displaystyle t_{i}} , and x i {\displaystyle x_{i}} is the corresponding vector of observed values at time t i {\displaystyle t_{i}} . The goal is to estimate both A {\displaystyle A} and the source signals { s i } {\displaystyle \{s_{i}\}} solely from the observed data { x i } {\displaystyle \{x_{i}\}} . After centering, the Gram matrix is computed as: ( X ∗ ) T X ∗ = Q D Q T , {\displaystyle (X^{})^{T}X^{}=Q\,D\,Q^{T},} where D is a diagonal matrix with positive entries (assuming X ∗ {\displaystyle X^{}} has maximum rank), and Q is an orthogonal matrix. Writing the SVD of the mixing matrix A = U Σ V T {\displaystyle A=U\Sigma V^{T}} and comparing with A A T = U Σ 2 U T {\displaystyle AA^{T}=U\Sigma ^{2}U^{T}} the mixing A has the form A = Q D 1 / 2 V T . {\displaystyle A=Q\,D^{1/2}\,V^{T}.} So, the normalized source values satisfy s i ∗ = V y i ∗ {\displaystyle s_{i}^{}=V\,y_{i}^{}} , where y i ∗ = D − 1 2 Q T x i ∗ . {\displaystyle y_{i}^{}=D^{-{\tfrac {1}{2}}}Q^{T}x_{i}^{}.} Thus, ICA reduces

    Read more →
  • Ensemble learning

    Ensemble learning

    In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives. == Overview == Supervised learning algorithms search through a hypothesis space to find a suitable hypothesis that will make good predictions with a particular problem. Even if this space contains hypotheses that are very well-suited for a particular problem, it may be very difficult to find a good one. Ensembles combine multiple hypotheses to form one which should be theoretically better. Ensemble learning trains two or more machine learning algorithms on a specific classification or regression task. The algorithms within the ensemble model are generally referred as "base models", "base learners", or "weak learners" in literature. These base models can be constructed using a single modelling algorithm, or several different algorithms. The idea is to train a diverse set of weak models on the same modelling task, such that the outputs of each weak learner have poor predictive ability (i.e., high bias), and among all weak learners, the outcome and error values exhibit high variance. Fundamentally, an ensemble learning model trains at least two high-bias (weak) and high-variance (diverse) models to be combined into a better-performing model. The set of weak models — which would not produce satisfactory predictive results individually — are combined or averaged to produce a single, high performing, accurate, and low-variance model to fit the task as required. Ensemble learning typically refers to bagging (bootstrap aggregating), boosting or stacking/blending techniques to induce high variance among the base models. Bagging creates diversity by generating random samples from the training observations and fitting the same model to each different sample — also known as homogeneous parallel ensembles. Boosting follows an iterative process by sequentially training each base model on the up-weighted errors of the previous base model, producing an additive model to reduce the final model errors — also known as sequential ensemble learning. Stacking or blending consists of different base models, each trained independently (i.e. diverse/high variance) to be combined into the ensemble model — producing a heterogeneous parallel ensemble. Common applications of ensemble learning include random forests (an extension of bagging), Boosted Tree models, and Gradient Boosted Tree Models. Models in applications of stacking are generally more task-specific — such as combining clustering techniques with other parametric and/or non-parametric techniques. Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model. In one sense, ensemble learning may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation. On the other hand, the alternative is to do a lot more learning with one non-ensemble model. An ensemble may be more efficient at improving overall accuracy for the same increase in compute, storage, or communication resources by using that increase on two or more methods, than would have been improved by increasing resource use for a single method. Fast algorithms such as decision trees are commonly used in ensemble methods (e.g., random forests), although slower algorithms can benefit from ensemble techniques as well. By analogy, ensemble techniques have been used also in unsupervised learning scenarios, for example in consensus clustering or in anomaly detection. == Ensemble theory == Empirically, ensembles tend to yield better results when there is a significant diversity among the models. Many ensemble methods, therefore, seek to promote diversity among the models they combine. Although perhaps non-intuitive, more random algorithms (like random decision trees) can be used to produce a stronger ensemble than very deliberate algorithms (like entropy-reducing decision trees). Using a variety of strong learning algorithms, however, has been shown to be more effective than using techniques that attempt to dumb-down the models in order to promote diversity. It is possible to increase diversity in the training stage of the model using correlation for regression tasks or using information measures such as cross entropy for classification tasks. Theoretically, one can justify the diversity concept because the lower bound of the error rate of an ensemble system can be decomposed into accuracy, diversity, and the other term. === The geometric framework === Ensemble learning, including both regression and classification tasks, can be explained using a geometric framework. Within this framework, the output of each individual classifier or regressor for the entire dataset can be viewed as a point in a multi-dimensional space. Additionally, the target result is also represented as a point in this space, referred to as the "ideal point." The Euclidean distance is used as the metric to measure both the performance of a single classifier or regressor (the distance between its point and the ideal point) and the dissimilarity between two classifiers or regressors (the distance between their respective points). This perspective transforms ensemble learning into a deterministic problem. For example, within this geometric framework, it can be proved that the averaging of the outputs (scores) of all base classifiers or regressors can lead to equal or better results than the average of all the individual models. It can also be proved that if the optimal weighting scheme is used, then a weighted averaging approach can outperform any of the individual classifiers or regressors that make up the ensemble or as good as the best performer at least. == Ensemble size == While the number of component classifiers of an ensemble has a great impact on the accuracy of prediction, there is a limited number of studies addressing this problem. A priori determining of ensemble size and the volume and velocity of big data streams make this even more crucial for online ensemble classifiers. Mostly statistical tests were used for determining the proper number of components. More recently, a theoretical framework suggested that there is an ideal number of component classifiers for an ensemble such that having more or less than this number of classifiers would deteriorate the accuracy. It is called "the law of diminishing returns in ensemble construction." Their theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. == Common types of ensembles == === Bayes optimal classifier === The Bayes optimal classifier is a classification technique. It is an ensemble of all the hypotheses in the hypothesis space. On average, no other ensemble can outperform it. The Naive Bayes classifier is a version of this that assumes that the data is conditionally independent on the class and makes the computation more feasible. Each hypothesis is given a vote proportional to the likelihood that the training dataset would be sampled from a system if that hypothesis were true. To facilitate training data of finite size, the vote of each hypothesis is also multiplied by the prior probability of that hypothesis. The Bayes optimal classifier can be expressed with the following equation: y = a r g m a x c j ∈ C ∑ h i ∈ H P ( c j | h i ) P ( T | h i ) P ( h i ) {\displaystyle y={\underset {c_{j}\in C}{\mathrm {argmax} }}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T|h_{i})P(h_{i})}} where y {\displaystyle y} is the predicted class, C {\displaystyle C} is the set of all possible classes, H {\displaystyle H} is the hypothesis space, P {\displaystyle P} refers to a probability, and T {\displaystyle T} is the training data. As an ensemble, the Bayes optimal classifier represents a hypothesis that is not necessarily in H {\displaystyle H} . The hypothesis represented by the Bayes optimal classifier, however, is the optimal hypothesis in ensemble space (the space of all possible ensembles consisting only of hypotheses in H {\displaystyle H} ). This formula can be restated using Bayes' theorem, which says that the posterior is proportional to the likelihood times the prior: P ( h i | T ) ∝ P ( T | h i ) P ( h i ) {\displaystyle P(h_{i}|T)\propto P(T|h_{i})P(h_{i})} hence, y = a r g m a x c j ∈ C ∑ h i ∈ H P ( c j | h i ) P ( h i | T ) {\displaystyle y={\underset {c_{j}\in C}{\mathrm {argmax} }}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(h_{i}|T)}} === Bootstrap aggregating (bagging) === Bootstrap aggregation (bagging) involves training an ensemble on bootstrapped data sets. A bootstrapped set is cr

    Read more →
  • Premature convergence

    Premature convergence

    Premature convergence is an unwanted effect in evolutionary algorithms (EA), a metaheuristic that mimics the basic principles of biological evolution as a computer algorithm for solving an optimization problem. The effect means that the population of an EA has converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene. == Strategies for preventing premature convergence == Strategies to regain genetic variation can be: a mating strategy called incest prevention, uniform crossover, mimicking sexual selection, favored replacement of similar individuals (preselection or crowding), segmentation of individuals of similar fitness (fitness sharing), increasing population size niche and specie The genetic variation can also be regained by mutation though this process is highly random. A general strategy to reduce the risk of premature convergence is to use structured populations instead of the commonly used panmictic ones. == Identification of the occurrence of premature convergence == It is hard to determine when premature convergence has occurred, and it is equally hard to predict its presence in the future. One measure is to use the difference between the average and maximum fitness values, as used by Patnaik & Srinivas, to then vary the crossover and mutation probabilities. Population diversity is another measure which has been extensively used in studies to measure premature convergence. However, although it has been widely accepted that a decrease in the population diversity directly leads to premature convergence, there have been little studies done on the analysis of population diversity. In other words, by using the term population diversity, the argument for a study in preventing premature convergence lacks robustness, unless specified what their definition of population diversity is. There are models to counter the effect and risk of premature convergence that do not compromise core GA parameters like population size, mutation rate, and other core mechanisms. These models were inspired by biological ecology, where genetic interactions are limited by external mechanisms such as spatial topologies or speciation. These ecological models, such as the Eco-GA, adopt diffusion-based strategies to improve the robustness of GA runs and increase the likelihood of reaching near-global optima. == Causes for premature convergence == There are a number of presumed or hypothesized causes for the occurrence of premature convergence. === Self-adaptive mutations === Rechenberg introduced the idea of self-adaptation of mutation distributions in evolution strategies. According to Rechenberg, the control parameters for these mutation distributions evolved internally through self-adaptation, rather than predetermination. He called it the 1/5-success rule of evolution strategies (1 + 1)-ES: The step size control parameter would be increased by some factor if the relative frequency of positive mutations through a determined period of time is larger than 1/5, vice versa if it is smaller than 1/5. Self-adaptive mutations may very well be one of the causes for premature convergence. Accurately locating of optima can be enhanced by self-adaptive mutation, as well as accelerating the search for this optima. This has been widely recognized, though the mechanism's underpinnings of this have been poorly studied, as it is often unclear whether the optima is found locally or globally. Self-adaptive methods can cause global convergence to global optimum, provided that the selection methods used are using elitism, as well as that the rule of self-adaptation doesn't interfere with the mutation distribution, which has the property of ensuring a positive minimum probability when hitting a random subset. This is for non-convex objective functions with sets that include bounded lower levels of non-zero measurements. A study by Rudolph suggests that self-adaption mechanisms among elitist evolution strategies do resemble the 1/5-success rule, and could very well get caught by a local optimum that include a positive probability. === Panmictic populations === Most EAs use unstructured or panmictic populations where basically every individual in the population is eligible for mate selection based on fitness. Thus, The genetic information of an only slightly better individual can spread in a population within a few generations, provided that no better other offspring is produced during this time. Especially in comparatively small populations, this can quickly lead to a loss of genotypic diversity and thus to premature convergence. A well-known countermeasure is to switch to alternative population models which introduce substructures into the population that preserve genotypic diversity over a longer period of time and thus counteract the tendency towards premature convergence. This has been shown for various EAs such as genetic algorithms, the evolution strategy, other EAs or memetic algorithms.

    Read more →
  • Esdat

    Esdat

    ESdat is a data management, analysis and reporting software for environmental and groundwater data, developed by EarthScience Information Systems (EScIS). It is used to manage many types of environmental data including laboratory chemistry (analytical results, QA data, lab sample planning, and electronic Chain of Custody), field chemistry (water, gas, and soil), hydrogeological data (groundwater, borehole and well construction, lithological, geotechnical and stratigraphic, and LNAPL), meteorological data (rain, wind, and temperature), emission data (dust deposition, HiVol, air quality, and noise) and logger data. Data can be compared against environmental standards or site-specific trigger levels to generate exceedence tables, time series graphs, maps, statistics, and other outputs. ESdat integrates with Power BI and ArcGIS and data can also be exported in a range of other database formats, including USEPA Regions 2,4 & 5, and NYS DEC. ESdat is used by environmental consultants, government, mining and industry for validation, interrogation, and reporting of data derived from complex environmental programs, such as contaminated sites, groundwater investigations, and regulatory compliance for landfills or mining operations.

    Read more →
  • European Conference on Computer Vision

    European Conference on Computer Vision

    The European Conference on Computer Vision (ECCV) is a biennial research conference with the proceedings published by Springer Science+Business Media. Similar to ICCV in scope and quality, it is held those years which ICCV is not. It is considered to be one of the top conferences in computer vision, alongside CVPR and ICCV, with an 'A' rating from the Australian Ranking of ICT Conferences and an 'A1' rating from the Brazilian ministry of education. The acceptance rate for ECCV 2010 was 24.4% for posters and 3.3% for oral presentations. Like other top computer vision conferences, ECCV has tutorial talks, technical sessions, and poster sessions. The conference is usually spread over five to six days with the main technical program occupying three days in the middle, and tutorial and workshops, focused on specific topics, being held in the beginning and at the end. The ECCV presents the Koenderink Prize annually to recognize fundamental contributions in computer vision. == Location == The conference is usually held in autumn in Europe.

    Read more →
  • World Programming System

    World Programming System

    The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data and to perform data science tasks. It has interactive visual programming tools using data workflows, and it has coding tools supporting the use of the SAS language mixed with Python, R and SQL. == About == WPS can use programs written in the language of SAS without the need for translating them into any other language. In this regard WPS is compatible with the SAS system. WPS has a built-in language interpreter able to process the language of SAS and produce similar results. WPS is available to run on z/OS, Windows, macOS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX. On all supported platforms, programs written in the language of SAS can be executed from a WPS command line interface, often referred to as running in batch mode. WPS can also be used from a graphical user interface known as the WPS Workbench for managing, editing and running programs written in the language of SAS. The WPS Workbench user interface is based on Eclipse. WPS version 4 (released in March 2018) introduced a drag-and-drop workflow canvas providing interactive blocks for data retrieval, blending and preparation, data discovery and profiling, predictive modelling powered by machine learning algorithms, model performance validation and scorecards. WPS version 3 (released in February 2012) provided a new client/server architecture that allows the WPS Workbench GUI to execute SAS programs on remote server installations of WPS in a network or cloud. The resulting output, data sets, logs, etc., can then all be viewed and manipulated from inside the Workbench as if the workloads had been executed locally. SAS programs do not require any special language statements to use this feature. == Summary of main features == Runs on Windows, macOS, z/OS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX An integrated development environment based on Eclipse for Linux, macOS and Windows. Support for language of SAS elements. Support for the language of SAS Macros. Matrix Programming support using PROC IML. Support for generating band plots, bar charts, box plots, bubble plots, contour plots, dendrogram plots, ellipse plots, fringe plots, heat maps, high-low plots, histograms, loess plots, needle plots, pie charts, penalised b-spline, radar charts, reference lines, scatter plots, series plots, step plots, regression plots and vector plots. Support for statistical procedures ACECLUS, ASSOCRULES, ANOVA, BIN, BOXPLOT, CANCORR, CANDISC, CLUSTER, CORRESP, DISCRIM, DISTANCE, FACTOR, FASTCLUS, FREQ, GAM, GANNO, GENMOD, GLIMMIX, GLM, GLMMOD, GLMSELECT, ICLIFETEST, KDE, LIFEREG, LIFETEST, LOESS, LOGISTIC, MDS, MEANS, MI, MIANALYSE, MIXED, MODECLUS, NESTED, NLIN, NPAR1WAY, PHREG, PLAN, PLS, POWER, PRINCOMP, PROBIT, QUANTREG, RBF, REG, ROBUSTREG, RSREG, SCORE, SEGMENT, SIMNORMAL, STANDARD, STDSIZE, STDRATE, STEPDISC, SUMMARY, SURVEYMEANS, SURVEYSELECT, TPSPLINE, TRANSREG, TREE, TTEST, UNIVARIATE, VARCLUS, VARCOMP Support for time series procedures ARIMA, AUTOREG, ESM, EXPAND, FORECAST, LOAN, SEVERITY, SPECTRA, TIMESERIES, X12 Support for machine learning procedures DECISIONFOREST, DECISIONTREE, GMM, MLP, OPTIMALBIN, SEGMENT, SVM Support for ODS. Reads and writes SAS datasets (compressed or uncompressed). Access: Actian Matrix (previously known as ParAccel), DASD, DB2, Excel, Greenplum, Hadoop, Informix, Kognitio Archived 2012-08-24 at the Wayback Machine, MariaDB, MySQL, Netezza, ODBC, OLEDB, Oracle, PostgreSQL, SAND, Snowflake, SPSS/PSPP, SQL Server, Sybase, Sybase IQ, Teradata, VSAM, Vertica and XML. Support for SAS Tape Format. Direct output of reports to CSV, PDF and HTML. Support to connect WPS systems programmatically, remote submit parts of a program to execute on connected remote servers, upload and download data between the connected systems. Support for Hadoop Support for R Support for Python == Industry recognition == Gartner recognized World Programming in their Cool Vendors in Data Science, 2014 Report. == Lawsuit == In 2010 World Programming defended its use of the language of SAS in the High Court of England and Wales in SAS Institute Inc. v World Programming Ltd. The software was the subject of a lawsuit by SAS Institute. The EU Court of Justice ruled in favor of World Programming, stating that the copyright protection does not extend to the software functionality, the programming language used and the format of the data files used by the program. It stated that there is no copyright infringement when a company which does not have access to the source code of a program studies, observes and tests that program to create another program with the same functionality.

    Read more →