AI Code Programming

AI Code Programming — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Twproject

    Twproject

    Twproject (say: T W Project) is a web-based project and groupware management tool created by Open Lab, an Italian software house founded in 2001. It won the 17th Jolt Productivity Award in 2007 in the project management category. In March 2019 it becomes property of Twproject company. It has widespread use in universities as a teaching tool in project management courses. It is used by Oracle Corporation, Prada, Calzedonia, General Electric and many other companies from corporations to small start-ups. == History == April 2001 - The idea of Teamwork came to Open-Lab founders from a need to overcome the PM tools used at that time. It was built in Microsoft ASP and Adobe Flash November 2002 - Open-Lab decide to move from Flash to HTML and from ASP to Java-JSP. Teamwork 2 development is started. June 2004 - Teamwork 2 released, using top open-source technologies like Hibernate, jBlooming, dynamic CSS, Ajax 7 January 2005 - Teamwork goes open source, under LGPL license; remains such until June 2006 (18 months): it is a hit application on SourceForge, with 38.000 downloads, covered by greeting but starving April 2005 - Open-Lab takes the decision to change commercial strategy to finance development of Teamwork version 3 6 June 2006 - Teamwork 3 is finally out (15 months development). New interface, many new features, agile support and much more 27 March 2007 - Teamwork wins the 2007 JOLT Productivity Awards for project management category July 2007 - Teamwork 4 development started: new interface, extended use of new HTML capabilities, JS-oriented interface, start using jQuery February 2009 - Teamwork 4.0 is out February 2010 - Teamwork 4.4: public project pages, Chinese interface. jQuery is getting more space in Teamwork December 2010 - Teamwork 4.6: released Mobile module available for iPhone, Android, BlackBerry. Intensive usage of jQuery June 2011 - Teamwork 4.7: released Issue Kanban / Organizer January 2012 - Teamwork 5.0 development started. Lighter interface, extensive usage of dynamic pages, easier installer and first time approach. Learning curve highly reduced. A jQuery Gantt editor included and released free for the community July 2012 - Teamwork 5 released and also the free online Gantt editor November 2012 - Teamwork 5.1 with new trees and improved model for staffing March 2013 - Teamwork 5.2 with stronger support for customizations and Japanese interface. April 2014 - Teamwork has changed its name in Twproject because the domain teamwork.com has been purchased by Teamwork. April 2013 - Twproject 5.4 with a redesigned more powerful Gantt chart. August 2015 - Twproject 5 finale release. September 2015 - Twproject 6 with a completely redesigned user interface. March 2019 - A new company Twproject srl has been spun off. September 2021 - Twproject 7 has been released introducing WBS based management and workload management. == Features == Project & task management (with Microsoft Project import/export), and JSON format Gantt editor. Uses jQuery Gantt components Time tracking. Several entry points: dashboard, weekly view, issues, start/stop buttons Resource planning with weekly/monthly view, work load overview, unavailability from agenda Issue tracking & planning(with Kanban), e-mail integration, task dedicated inboxes Dashboard configuration, with customizable portlets and layout Message boards Scrum module Meeting and minute management, attached documents Agenda (Integrates with iCal, Microsoft Outlook, Microsoft Entourage, and Google Calendar) Document management, remote file systems link with NTFS, FTP, SVN, S3 (Dropbox, Google drive) Mobile application for iPhone, iPad, Android, Blackberry, Windows phone == Integration == A complete JSON API is available for integrations. The applications runs in Java JDK 8+ on the Hibernate object/relational mapping. The standard distribution uses Apache Tomcat 9, but can run on any J2EE application server. Twproject is tested on these DB servers: MySQL, Oracle, SQL Server, PostgreSql, HSQLDB, but as uses Hibernate can run on many others. There is simple graphical step-by-step installer for Windows, Mac, Linux, .zip/.tar.gz/.rpm packages.

    Read more →
  • Online machine learning

    Online machine learning

    In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., prediction of prices in the financial international markets. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches. Online machine learning algorithms find applications in a wide variety of fields such as sponsored search to maximize ad revenue, portfolio optimization, shortest path prediction (with stochastic weights, e.g. traffic on roads for a maps application), spam filtering, real-time fraud detection, dynamic pricing for e-commerce, etc. There is also growing interest in usage of online learning paradigms for LLMs to enable continuous, real-time adaptation after the initial training. == Introduction == In the setting of supervised learning, a function of f : X → Y {\displaystyle f:X\to Y} is to be learned, where X {\displaystyle X} is thought of as a space of inputs and Y {\displaystyle Y} as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution p ( x , y ) {\displaystyle p(x,y)} on X × Y {\displaystyle X\times Y} . In reality, the learner never knows the true distribution p ( x , y ) {\displaystyle p(x,y)} over instances. Instead, the learner usually has access to a training set of examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} . In this setting, the loss function is given as V : Y × Y → R {\displaystyle V:Y\times Y\to \mathbb {R} } , such that V ( f ( x ) , y ) {\displaystyle V(f(x),y)} measures the difference between the predicted value f ( x ) {\displaystyle f(x)} and the true value y {\displaystyle y} . The ideal goal is to select a function f ∈ H {\displaystyle f\in {\mathcal {H}}} , where H {\displaystyle {\mathcal {H}}} is a space of functions called a hypothesis space, so that some notion of total loss is minimized. Depending on the type of model (statistical or adversarial), one can devise different notions of loss, which lead to different learning algorithms. == Statistical view of online learning == In statistical learning models, the training sample ( x i , y i ) {\displaystyle (x_{i},y_{i})} are assumed to have been drawn from the true distribution p ( x , y ) {\displaystyle p(x,y)} and the objective is to minimize the expected "risk" I [ f ] = E [ V ( f ( x ) , y ) ] = ∫ V ( f ( x ) , y ) d p ( x , y ) . {\displaystyle I[f]=\mathbb {E} [V(f(x),y)]=\int V(f(x),y)\,dp(x,y)\ .} A common paradigm in this situation is to estimate a function f ^ {\displaystyle {\hat {f}}} through empirical risk minimization or regularized empirical risk minimization (usually Tikhonov regularization). The choice of loss function here gives rise to several well-known learning algorithms such as regularized least squares and support vector machines. A purely online model in this category would learn based on just the new input ( x t + 1 , y t + 1 ) {\displaystyle (x_{t+1},y_{t+1})} , the current best predictor f t {\displaystyle f_{t}} and some extra stored information (which is usually expected to have storage requirements independent of training data size). For many formulations, for example nonlinear kernel methods, true online learning is not possible, though a form of hybrid online learning with recursive algorithms can be used where f t + 1 {\displaystyle f_{t+1}} is permitted to depend on f t {\displaystyle f_{t}} and all previous data points ( x 1 , y 1 ) , … , ( x t , y t ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{t},y_{t})} . In this case, the space requirements are no longer guaranteed to be constant since it requires storing all previous data points, but the solution may take less time to compute with the addition of a new data point, as compared to batch learning techniques. A common strategy to overcome the above issues is to learn using mini-batches, which process a small batch of b ≥ 1 {\displaystyle b\geq 1} data points at a time, this can be considered as pseudo-online learning for b {\displaystyle b} much smaller than the total number of training points. Mini-batch techniques are used with repeated passing over the training data to obtain optimized out-of-core versions of machine learning algorithms, for example, stochastic gradient descent. When combined with backpropagation, this is currently the de facto training method for training artificial neural networks. === Example: linear least squares === The simple example of linear least squares is used to explain a variety of ideas in online learning. The ideas are general enough to be applied to other settings, for example, with other convex loss functions. === Batch learning === Consider the setting of supervised learning with f {\displaystyle f} being a linear function to be learned: f ( x j ) = ⟨ w , x j ⟩ = w ⋅ x j {\displaystyle f(x_{j})=\langle w,x_{j}\rangle =w\cdot x_{j}} where x j ∈ R d {\displaystyle x_{j}\in \mathbb {R} ^{d}} is a vector of inputs (data points) and w ∈ R d {\displaystyle w\in \mathbb {R} ^{d}} is a linear filter vector. The goal is to compute the filter vector w {\displaystyle w} . To this end, a square loss function V ( f ( x j ) , y j ) = ( f ( x j ) − y j ) 2 = ( ⟨ w , x j ⟩ − y j ) 2 {\displaystyle V(f(x_{j}),y_{j})=(f(x_{j})-y_{j})^{2}=(\langle w,x_{j}\rangle -y_{j})^{2}} is used to compute the vector w {\displaystyle w} that minimizes the empirical loss I n [ w ] = ∑ j = 1 n V ( ⟨ w , x j ⟩ , y j ) = ∑ j = 1 n ( x j T w − y j ) 2 {\displaystyle I_{n}[w]=\sum _{j=1}^{n}V(\langle w,x_{j}\rangle ,y_{j})=\sum _{j=1}^{n}(x_{j}^{\mathsf {T}}w-y_{j})^{2}} where y j ∈ R . {\displaystyle y_{j}\in \mathbb {R} .} Let X {\displaystyle X} be the i × d {\displaystyle i\times d} data matrix and y ∈ R i {\displaystyle y\in \mathbb {R} ^{i}} is the column vector of target values after the arrival of the first i {\displaystyle i} data points. Assuming that the covariance matrix Σ i = X T X {\displaystyle \Sigma _{i}=X^{\mathsf {T}}X} is invertible (otherwise it is preferential to proceed in a similar fashion with Tikhonov regularization), the best solution f ∗ ( x ) = ⟨ w ∗ , x ⟩ {\displaystyle f^{}(x)=\langle w^{},x\rangle } to the linear least squares problem is given by w ∗ = ( X T X ) − 1 X T y = Σ i − 1 ∑ j = 1 i x j y j . {\displaystyle w^{}=(X^{\mathsf {T}}X)^{-1}X^{\mathsf {T}}y=\Sigma _{i}^{-1}\sum _{j=1}^{i}x_{j}y_{j}.} Now, calculating the covariance matrix Σ i = ∑ j = 1 i x j x j T {\displaystyle \Sigma _{i}=\sum _{j=1}^{i}x_{j}x_{j}^{\mathsf {T}}} takes time O ( i d 2 ) {\displaystyle O(id^{2})} , inverting the d × d {\displaystyle d\times d} matrix takes time O ( d 3 ) {\displaystyle O(d^{3})} , while the rest of the multiplication takes time O ( d 2 ) {\displaystyle O(d^{2})} , giving a total time of O ( i d 2 + d 3 ) {\displaystyle O(id^{2}+d^{3})} . When there are n {\displaystyle n} total points in the dataset, to recompute the solution after the arrival of every datapoint i = 1 , … , n {\displaystyle i=1,\ldots ,n} , the naive approach will have a total complexity O ( n 2 d 2 + n d 3 ) {\displaystyle O(n^{2}d^{2}+nd^{3})} . Note that when storing the matrix Σ i {\displaystyle \Sigma _{i}} , then updating it at each step needs only adding x i + 1 x i + 1 T {\displaystyle x_{i+1}x_{i+1}^{\mathsf {T}}} , which takes O ( d 2 ) {\displaystyle O(d^{2})} time, reducing the total time to O ( n d 2 + n d 3 ) = O ( n d 3 ) {\displaystyle O(nd^{2}+nd^{3})=O(nd^{3})} , but with an additional storage space of O ( d 2 ) {\displaystyle O(d^{2})} to store Σ i {\displaystyle \Sigma _{i}} . === Online learning: recursive least squares === The recursive least squares (RLS) algorithm considers an online approach to the least squares problem. It can be shown that by initialising w 0 = 0 ∈ R d {\displaystyle \textstyle w_{0}=0\in \mathbb {R} ^{d}} and Γ 0 = I ∈ R d × d {\displaystyle \textstyle \Gamma _{0}=I\in \mathbb {R} ^{d\times d}} , the solution of the linear least squares problem given in the previous section can be computed by the following iteration: Γ i = Γ i − 1 − Γ i − 1 x i x i T Γ i − 1 1 + x i T Γ i − 1 x i {\displaystyle \Gamma _{i}=\Gamma _{i-1}-{\frac {\Gamma _{i-1}x_{i}x_{i}^{\mathsf {T}}\Gamma _{i-1}}{1+x_{i}^{\mathsf {T}}\Gamma _{i-1}x_{i}}}} w i = w i − 1 − Γ i x i ( x i T w i − 1 − y i ) {\displaystyle w_{i}=w_{i-1}-\Gamma _{i}x_{i}\left(x_{i}^{\mathsf {T}}w_{

    Read more →
  • T-distributed stochastic neighbor embedding

    T-distributed stochastic neighbor embedding

    t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. It is based on Stochastic Neighbor Embedding originally developed by Geoffrey Hinton and Sam Roweis, where Laurens van der Maaten and Hinton proposed the t-distributed variant. It is a nonlinear dimensionality reduction technique for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability. The t-SNE algorithm comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence (KL divergence) between the two distributions with respect to the locations of the points in the map. While the original algorithm uses the Euclidean distance between objects as the base of its similarity metric, this can be changed as appropriate. A Riemannian variant is UMAP. t-SNE has been used for visualization in a wide range of applications, including genomics, computer security research, natural language processing, music analysis, cancer research, bioinformatics, geological domain interpretation, and biomedical signal processing. For a data set with n {\displaystyle n} elements, t-SNE runs in O ( n 2 ) {\displaystyle O(n^{2})} time and requires O ( n 2 ) {\displaystyle O(n^{2})} space. == Details == Given a set of N {\displaystyle N} high-dimensional objects x 1 , … , x N {\displaystyle \mathbf {x} _{1},\dots ,\mathbf {x} _{N}} , t-SNE first computes probabilities p i j {\displaystyle p_{ij}} that are proportional to the similarity of objects x i {\displaystyle \mathbf {x} _{i}} and x j {\displaystyle \mathbf {x} _{j}} , as follows. For i ≠ j {\displaystyle i\neq j} , define p j ∣ i = exp ⁡ ( − ‖ x i − x j ‖ 2 / 2 σ i 2 ) ∑ k ≠ i exp ⁡ ( − ‖ x i − x k ‖ 2 / 2 σ i 2 ) {\displaystyle p_{j\mid i}={\frac {\exp(-\lVert \mathbf {x} _{i}-\mathbf {x} _{j}\rVert ^{2}/2\sigma _{i}^{2})}{\sum _{k\neq i}\exp(-\lVert \mathbf {x} _{i}-\mathbf {x} _{k}\rVert ^{2}/2\sigma _{i}^{2})}}} and set p i ∣ i = 0 {\displaystyle p_{i\mid i}=0} . Note the above denominator ensures ∑ j p j ∣ i = 1 {\displaystyle \sum _{j}p_{j\mid i}=1} for all i {\displaystyle i} . As van der Maaten and Hinton explained: "The similarity of datapoint x j {\displaystyle x_{j}} to datapoint x i {\displaystyle x_{i}} is the conditional probability, p j | i {\displaystyle p_{j|i}} , that x i {\displaystyle x_{i}} would pick x j {\displaystyle x_{j}} as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian centered at x i {\displaystyle x_{i}} ." Now define p i j = p j ∣ i + p i ∣ j 2 N {\displaystyle p_{ij}={\frac {p_{j\mid i}+p_{i\mid j}}{2N}}} This is motivated because p i {\displaystyle p_{i}} and p j {\displaystyle p_{j}} from the N samples are estimated as 1/N, so the conditional probability can be written as p i ∣ j = N p i j {\displaystyle p_{i\mid j}=Np_{ij}} and p j ∣ i = N p j i {\displaystyle p_{j\mid i}=Np_{ji}} . Since p i j = p j i {\displaystyle p_{ij}=p_{ji}} , you can obtain previous formula. Also note that p i i = 0 {\displaystyle p_{ii}=0} and ∑ i , j p i j = 1 {\displaystyle \sum _{i,j}p_{ij}=1} . The bandwidth of the Gaussian kernels σ i {\displaystyle \sigma _{i}} is set in such a way that the entropy of the conditional distribution equals a predefined entropy using the bisection method. As a result, the bandwidth is adapted to the density of the data: smaller values of σ i {\displaystyle \sigma _{i}} are used in denser parts of the data space. The entropy increases with the perplexity of this distribution P i {\displaystyle P_{i}} ; this relation is seen as P e r p ( P i ) = 2 H ( P i ) {\displaystyle Perp(P_{i})=2^{H(P_{i})}} where H ( P i ) {\displaystyle H(P_{i})} is the Shannon entropy H ( P i ) = − ∑ j p j | i log 2 ⁡ p j | i . {\displaystyle H(P_{i})=-\sum _{j}p_{j|i}\log _{2}p_{j|i}.} The perplexity is a hand-chosen parameter of t-SNE, and as the authors state, "perplexity can be interpreted as a smooth measure of the effective number of neighbors. The performance of SNE is fairly robust to changes in the perplexity, and typical values are between 5 and 50.". Since the Gaussian kernel uses the Euclidean distance ‖ x i − x j ‖ {\displaystyle \lVert x_{i}-x_{j}\rVert } , it is affected by the curse of dimensionality, and in high dimensional data when distances lose the ability to discriminate, the p i j {\displaystyle p_{ij}} become too similar (asymptotically, they would converge to a constant). It has been proposed to adjust the distances with a power transform, based on the intrinsic dimension of each point, to alleviate this. t-SNE aims to learn a d {\displaystyle d} -dimensional map y 1 , … , y N {\displaystyle \mathbf {y} _{1},\dots ,\mathbf {y} _{N}} (with y i ∈ R d {\displaystyle \mathbf {y} _{i}\in \mathbb {R} ^{d}} and d {\displaystyle d} typically chosen as 2 or 3) that reflects the similarities p i j {\displaystyle p_{ij}} as well as possible. To this end, it measures similarities q i j {\displaystyle q_{ij}} between two points in the map y i {\displaystyle \mathbf {y} _{i}} and y j {\displaystyle \mathbf {y} _{j}} , using a very similar approach. Specifically, for i ≠ j {\displaystyle i\neq j} , define q i j {\displaystyle q_{ij}} as q i j = ( 1 + ‖ y i − y j ‖ 2 ) − 1 ∑ k ∑ l ≠ k ( 1 + ‖ y k − y l ‖ 2 ) − 1 {\displaystyle q_{ij}={\frac {(1+\lVert \mathbf {y} _{i}-\mathbf {y} _{j}\rVert ^{2})^{-1}}{\sum _{k}\sum _{l\neq k}(1+\lVert \mathbf {y} _{k}-\mathbf {y} _{l}\rVert ^{2})^{-1}}}} and set q i i = 0 {\displaystyle q_{ii}=0} . Herein a heavy-tailed Student t-distribution (with one-degree of freedom, which is the same as a Cauchy distribution) is used to measure similarities between low-dimensional points in order to allow dissimilar objects to be modeled far apart in the map. The locations of the points y i {\displaystyle \mathbf {y} _{i}} in the map are determined by minimizing the (non-symmetric) Kullback–Leibler divergence of the distribution P {\displaystyle P} from the distribution Q {\displaystyle Q} , that is: K L ( P ∥ Q ) = ∑ i ≠ j p i j log ⁡ p i j q i j {\displaystyle \mathrm {KL} \left(P\parallel Q\right)=\sum _{i\neq j}p_{ij}\log {\frac {p_{ij}}{q_{ij}}}} The minimization of the Kullback–Leibler divergence with respect to the points y i {\displaystyle \mathbf {y} _{i}} is performed using gradient descent. The result of this optimization is a map that reflects the similarities between the high-dimensional inputs. == Output == While t-SNE plots often seem to display clusters, the visual clusters can be strongly influenced by the chosen parameterization (especially the perplexity) and so a good understanding of the parameters for t-SNE is needed. Such "clusters" can be shown to even appear in structured data with no clear clustering, and so may be false findings. Similarly, the size of clusters produced by t-SNE is not informative, and neither is the distance between clusters. Thus, interactive exploration may be needed to choose parameters and validate results. It has been shown that t-SNE can often recover well-separated clusters, and with special parameter choices, approximates a simple form of spectral clustering. == Software == A C++ implementation of Barnes-Hut is available on the github account of one of the original authors. The R package Rtsne implements t-SNE in R. ELKI contains tSNE, also with Barnes-Hut approximation scikit-learn, a popular machine learning library in Python implements t-SNE with both exact solutions and the Barnes-Hut approximation. Tensorboard, the visualization kit associated with TensorFlow, also implements t-SNE (online version) The Julia package TSne implements t-SNE

    Read more →
  • Rprop

    Rprop

    Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks. This is a first-order optimization algorithm. This algorithm was created by Martin Riedmiller and Heinrich Braun in 1992. Similarly to the Manhattan update rule, Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight". For each weight, if there was a sign change of the partial derivative of the total error function compared to the last iteration, the update value for that weight is multiplied by a factor η−, where η− < 1. If the last iteration produced the same sign, the update value is multiplied by a factor of η+, where η+ > 1. The update values are calculated for each weight in the above manner, and finally each weight is changed by its own update value, in the opposite direction of that weight's partial derivative, so as to minimise the total error function. η+ is empirically set to 1.2 and η− to 0.5. Rprop can result in very large weight increments or decrements if the gradients are large, which is a problem when using mini-batches as opposed to full batches. RMSprop addresses this problem by keeping the moving average of the squared gradients for each weight and dividing the gradient by the square root of the mean square. RPROP is a batch update algorithm. Next to the cascade correlation algorithm and the Levenberg–Marquardt algorithm, Rprop is one of the fastest weight update mechanisms. == Variations == Martin Riedmiller developed three algorithms, all named RPROP. Igel and Hüsken assigned names to them and added a new variant: RPROP+ is defined at A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. RPROP− is defined at Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms. Backtracking is removed from RPROP+. iRPROP− is defined in Rprop – Description and Implementation Details and was reinvented by Igel and Hüsken. This variant is very popular and most simple. iRPROP+ is defined at Improving the Rprop Learning Algorithm and is very robust and typically faster than the other three variants.

    Read more →
  • SAP StreamWork

    SAP StreamWork

    SAP StreamWork is an enterprise collaboration tool from SAP SE released in March 2010, and discontinued in December 2015. StreamWork allowed real-time collaboration like Google Wave, but focused on business activities such as analyzing data, planning meetings, and making decisions. It incorporated technology from Box.net and Evernote to allow users to connect to online files and documents, and document-reader technology from Scribd allowed users to view documents directly within its environment. StreamWork supported the OpenSocial set of application programming interfaces (APIs), allowing it to connect to tools built by third-party developers, such as Google Docs. A version of StreamWork intended for large enterprises used a virtual appliance based on Novell's SUSE Linux Enterprise to connect it to business systems, including those from SAP.

    Read more →
  • Kubeflow

    Kubeflow

    Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks), model training (Kubeflow Pipelines, Kubeflow Training Operator), model serving (KServe), and automated machine learning (Katib). Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. == History == The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan to address a perceived lack of flexible options for building production-ready machine learning systems. The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. == Components == === Kubeflow Notebooks for model development === Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. === Kubeflow Pipelines for model training === Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. === Kubeflow Training Operator for model training === For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. === KServe for model serving === The KServe component (previously named KFServing) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. Publicly disclosed adopters of KServe include Bloomberg, Gojek, the Wikimedia Foundation, and others. === Katib for automated machine learning === Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. == Release timeline ==

    Read more →
  • Mating pool

    Mating pool

    Mating pool is a concept used in evolutionary algorithms and means a population of parents for the next population. The mating pool is formed by candidate solutions that the selection operators deem to have the highest fitness in the current population. Solutions that are included in the mating pool are referred to as parents. Individual solutions can be repeatedly included in the mating pool, with individuals of higher fitness values having a higher chance of being included multiple times. Crossover operators are then applied to the parents, resulting in recombination of genes recognized as superior. Lastly, random changes in the genes are introduced through mutation operators, increasing the genetic variation in the gene pool. Those two operators improve the chance of creating new, superior solutions. A new generation of solutions is thereby created, the children, who will constitute the next population. Depending on the selection method, the total number of parents in the mating pool can be different to the size of the initial population, resulting in a new population that’s smaller. To continue the algorithm with an equally sized population, random individuals from the old populations can be chosen and added to the new population. At this point, the fitness value of the new solutions is evaluated. If the termination conditions are fulfilled, processes come to an end. Otherwise, they are repeated. The repetition of the steps result in candidate solutions that evolve towards the most optimal solution over time. The genes will become increasingly uniform towards the most optimal gene, a process called convergence. If 95% of the population share the same version of a gene, the gene has converged. When all the individual fitness values have reached the value of the best individual, i.e. all the genes have converged, population convergence is achieved. == Mating pool creation == Several methods can be applied to create a mating pool. All of these processes involve the selective breeding of a particular number of individuals within a population. There are multiple criteria that can be employed to determine which individuals make it into the mating pool and which are left behind. The selection methods can be split into three general types: fitness proportionate selection, ordinal based selection and threshold based selection. === Fitness proportionate selection === In the case of fitness proportionate selection, random individuals are selected to enter the pool. However, the ones with a higher level of fitness are more likely to be picked and therefore have a greater chance of passing on their features to the next generation. One of the techniques used in this type of parental selection is the roulette wheel selection. This approach divides a hypothetical circular wheel into different slots, the size of which is equal to the fitness values of each potential candidate. Afterwards, the wheel is rotated and a fixed point determines which individual gets picked. The greater the fitness value of an individual, the higher the probability of being chosen as a parent by the random spin of the wheel. Alternatively, stochastic universal sampling can be implemented. This selection method is also based on the rotation of a spinning wheel. However, in this case there is more than one fixed point and as a result all of the mating pool members will be selected simultaneously. === Ordinal based selection === The ordinal based selection methods include the tournament and ranking selection. Tournament selection involves the random selection of individuals of a population and the subsequent comparison of their fitness levels. The winners of these “tournaments” are the ones with the highest values and will be put into the mating pool as parents. In ranking selection all the individuals are sorted based on their fitness values. Then, the selection of the parents is made according to the rank of the candidates. Every individual has a chance of being chosen, but higher ranked ones are favored === Threshold based selection === The last type of selection method is referred to as the threshold based method. This includes the truncation selection method, which sorts individuals based on their phenotypic values on a specific trait and later selects the proportion of them that are within a certain threshold as parents.

    Read more →
  • Latent and observable variables

    Latent and observable variables

    In statistics, latent variables (from Latin: present participle of lateo 'lie hidden') are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such latent variable models are used in many disciplines, including engineering, medicine, ecology, physics, machine learning/artificial intelligence, natural language processing, bioinformatics, chemometrics, demography, economics, management, political science, psychology and the social sciences. Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. Among the earliest expressions of this idea is Francis Bacon's polemic the Novum Organum, itself a challenge to the more traditional logic expressed in Aristotle's Organon: But the latent process of which we speak, is far from being obvious to men’s minds, beset as they now are. For we mean not the measures, symptoms, or degrees of any process which can be exhibited in the bodies themselves, but simply a continued process, which, for the most part, escapes the observation of the senses. In this situation, the term hidden variables is commonly used, reflecting the fact that the variables are meaningful, but not observable. Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations. The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "sub-symbolic" data in the real world to symbolic data in the modeled world. == Examples == === Psychology === Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model. The "Big Five personality traits" have been inferred using factor analysis. extraversion spatial ability wisdom: “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.” Spearman's g, or the general intelligence factor in psychometrics === Economics === Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, by linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly, so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging. === Medicine === Latent-variable methodology is used in many branches of medicine. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box). == Inferring latent variables == There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include: linear mixed-effects models and nonlinear mixed-effects models Hidden Markov models Factor analysis Item response theory Analysis and inference methods include: Principal component analysis Instrumented principal component analysis Partial least squares regression Latent semantic analysis and probabilistic latent semantic analysis EM algorithms Metropolis–Hastings algorithm === Bayesian algorithms and methods === Bayesian statistics is often used for inferring latent variables. Latent Dirichlet allocation The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories. The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.

    Read more →
  • Irwin Sobel

    Irwin Sobel

    Irwin Sobel (born September 12, 1940) is a scientist and researcher in digital image processing. == Biography == Irwin Sobel was born in New York City. He graduated from MIT in 1961 and completed his Ph.D. research at the Stanford Artificial Intelligence Project (SAIL) with thesis Camera Models and Machine Perception. His Ph.D. advisor was Jerome A. Feldman. Starting in 1973, he spent nine years doing postdoctoral research at Columbia University. After 1982, he worked as a Senior Researcher at HP Labs. == Sobel operator == In 1968, Sobel gave a talk entitled "An Isotropic 3x3 Image Gradient Operator" at SAIL; this method became known as the Sobel operator. It was developed jointly with a colleague, Gary Feldman, also at SAIL.

    Read more →
  • Crossover (evolutionary algorithm)

    Crossover (evolutionary algorithm)

    Crossover in evolutionary algorithms and evolutionary computation, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. New solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population. The aim of recombination is to transfer good characteristics from two different parents to one child. Different algorithms in evolutionary computation may use different data structures to store genetic information, and each genetic representation can be recombined with different crossover operators. Typical data structures that can be recombined with crossover are bit arrays, vectors of real numbers, or trees. The list of operators presented below is by no means complete and serves mainly as an exemplary illustration of this dyadic genetic operator type. More operators and more details can be found in the literature. == Crossover for binary arrays == Traditional genetic algorithms store genetic information in a chromosome represented by a bit array. Crossover methods for bit arrays are popular and an illustrative example of genetic recombination. === One-point crossover === A point on both parents' chromosomes is picked randomly, and designated a 'crossover point'. Bits to the right of that point are swapped between the two parent chromosomes. This results in two offspring, each carrying some genetic information from both parents. === Two-point and k-point crossover === In two-point crossover, two crossover points are picked randomly from the parent chromosomes. The bits in between the two points are swapped between the parent organisms. Two-point crossover is equivalent to performing two single-point crossovers with different crossover points. This strategy can be generalized to k-point crossover for any positive integer k, picking k crossover points. === Uniform crossover === In uniform crossover, typically, each bit is chosen from either parent with equal probability. Other mixing ratios are sometimes used, resulting in offspring which inherit more genetic information from one parent than the other. In a uniform crossover, we don’t divide the chromosome into segments, rather we treat each gene separately. In this, we essentially flip a coin for each chromosome to decide whether or not it will be included in the off-spring. == Crossover for integer or real-valued genomes == For the crossover operators presented above and for most other crossover operators for bit strings, it holds that they can also be applied accordingly to integer or real-valued genomes whose genes each consist of an integer or real-valued number. Instead of individual bits, integer or real-valued numbers are then simply copied into the child genome. The offspring lie on the remaining corners of the hyperbody spanned by the two parents P 1 = ( 1.5 , 6 , 8 ) {\displaystyle P_{1}=(1.5,6,8)} and P 2 = ( 7 , 2 , 1 ) {\displaystyle P_{2}=(7,2,1)} , as exemplified in the accompanying image for the three-dimensional case. === Discrete recombination === If the rules of the uniform crossover for bit strings are applied during the generation of the offspring, this is also called discrete recombination. === Intermediate recombination === In this recombination operator, the allele values of the child genome a i {\displaystyle a_{i}} are generated by mixing the alleles of the two parent genomes a i , P 1 {\displaystyle a_{i,P_{1}}} and a i , P 2 {\displaystyle a_{i,P_{2}}} : α i = α i , P 1 ⋅ β i + α i , P 2 ⋅ ( 1 − β i ) w i t h β i ∈ [ − d , 1 + d ] {\displaystyle \alpha _{i}=\alpha _{i,P_{1}}\cdot \beta _{i}+\alpha _{i,P_{2}}\cdot \left(1-\beta _{i}\right)\quad {\mathsf {with}}\quad \beta _{i}\in \left[-d,1+d\right]} randomly equally distributed per gene i {\displaystyle i} The choice of the interval [ − d , 1 + d ] {\displaystyle [-d,1+d]} causes that besides the interior of the hyperbody spanned by the allele values of the parent genes additionally a certain environment for the range of values of the offspring is in question. A value of 0.25 {\displaystyle 0.25} is recommended for d {\displaystyle d} to counteract the tendency to reduce the allele values that otherwise exists at d = 0 {\displaystyle d=0} . The adjacent figure shows for the two-dimensional case the range of possible new alleles of the two exemplary parents P 1 = ( 3 , 6 ) {\displaystyle P_{1}=(3,6)} and P 2 = ( 9 , 2 ) {\displaystyle P_{2}=(9,2)} in intermediate recombination. The offspring of discrete recombination C 1 {\displaystyle C_{1}} and C 2 {\displaystyle C_{2}} are also plotted. Intermediate recombination satisfies the arithmetic calculation of the allele values of the child genome required by virtual alphabet theory. Discrete and intermediate recombination are used as a standard in the evolution strategy. == Crossover for permutations == For combinatorial tasks, permutations are usually used that are specifically designed for genomes that are themselves permutations of a set. The underlying set is usually a subset of N {\displaystyle \mathbb {N} } or N 0 {\displaystyle \mathbb {N} _{0}} . If 1- or n-point or uniform crossover for integer genomes is used for such genomes, a child genome may contain some values twice and others may be missing. This can be remedied by genetic repair, e.g. by replacing the redundant genes in positional fidelity for missing ones from the other child genome. In order to avoid the generation of invalid offspring, special crossover operators for permutations have been developed which fulfill the basic requirements of such operators for permutations, namely that all elements of the initial permutation are also present in the new one and only the order is changed. It can be distinguished between combinatorial tasks, where all sequences are admissible, and those where there are constraints in the form of inadmissible partial sequences. A well-known representative of the first task type is the traveling salesman problem (TSP), where the goal is to visit a set of cities exactly once on the shortest tour. An example of the constrained task type is the scheduling of multiple workflows. Workflows involve sequence constraints on some of the individual work steps. For example, a thread cannot be cut until the corresponding hole has been drilled in a workpiece. Such problems are also called order-based permutations. In the following, two crossover operators are presented as examples, the partially mapped crossover (PMX) motivated by the TSP and the order crossover (OX1) designed for order-based permutations. A second offspring can be produced in each case by exchanging the parent chromosomes. === Partially mapped crossover (PMX) === The PMX operator was designed as a recombination operator for TSP like Problems. The explanation of the procedure is illustrated by an example: === Order crossover (OX1) === The order crossover goes back to Davis in its original form and is presented here in a slightly generalized version with more than two crossover points. It transfers information about the relative order from the second parent to the offspring. First, the number and position of the crossover points are determined randomly. The resulting gene sequences are then processed as described below: Among other things, order crossover is well suited for scheduling multiple workflows, when used in conjunction with 1- and n-point crossover. === Further crossover operators for permutations === Over time, a large number of crossover operators for permutations have been proposed, so the following list is only a small selection. For more information, the reader is referred to the literature. cycle crossover (CX) order-based crossover (OX2) position-based crossover (POS) edge recombination voting recombination (VR) alternating-positions crossover (AP) maximal preservative crossover (MPX) merge crossover (MX) sequential constructive crossover operator (SCX) The usual approach to solving TSP-like problems by genetic or, more generally, evolutionary algorithms, presented earlier, is either to repair illegal descendants or to adjust the operators appropriately so that illegal offspring do not arise in the first place. Alternatively, Riazi suggests the use of a double chromosome representation, which avoids illegal offspring.

    Read more →
  • Wolfram Mathematica

    Wolfram Mathematica

    Wolfram Mathematica (also known as Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California. Mathematica's Wolfram Language is fundamentally based on Lisp; for example, the Mathematica command Most is identically equal to the Lisp command butlast. == Notebook interface == Mathematica is split into two parts: the kernel and the front end. The kernel interprets expressions (Wolfram Language code) and returns result expressions, which can then be displayed by the front end. The original front end, designed by Theodore Gray in 1988, consists of a notebook interface and allows the creation and editing of notebook documents that can contain code, plaintext, images, and graphics. Code development is also supported through support in a range of standard integrated development environment (IDE) including Eclipse, IntelliJ IDEA, Atom, Vim, Visual Studio Code and Git. The Mathematica Kernel also includes a command line front end. Other interfaces include JMath, based on GNU Readline and WolframScript which runs self-contained Mathematica programs (with arguments) from the UNIX command line. == High-performance computing == Capabilities for high-performance computing were extended with the introduction of packed arrays in version 4 (1999) and sparse matrices (version 5, 2003), and by adopting the GNU Multiple Precision Arithmetic Library to evaluate high-precision arithmetic. Version 5.2 (2005) added automatic multi-threading when computations are performed on multi-core computers. This release included CPU-specific optimized libraries. In addition Mathematica is supported by third party specialist acceleration hardware such as ClearSpeed. In 2002, gridMathematica was introduced to allow user level parallel programming on heterogeneous clusters and multiprocessor systems and in 2008 parallel computing technology was included in all Mathematica licenses including support for grid technology such as Windows HPC Server 2008, Microsoft Compute Cluster Server and Sun Grid. Support for CUDA and OpenCL GPU hardware was added in 2010. == Extensions == As of Version 14, there are 6,602 built-in functions and symbols in the Wolfram Language. Stephen Wolfram announced the launch of the Wolfram Function Repository in June 2019 as a way for the public Wolfram community to contribute functionality to the Wolfram Language. There are currently more than 3000 functions contributed as Resource Functions. In addition to the Wolfram Function Repository, there is a Wolfram Data Repository with computable data and the Wolfram Neural Net Repository for machine learning. Wolfram Mathematica is the basis of the Combinatorica package, which adds discrete mathematics functionality in combinatorics and graph theory to the program. == Connections to other applications, programming languages, and services == Communication with other applications can be done using a protocol called Wolfram Symbolic Transfer Protocol (WSTP). It allows communication between the Wolfram Mathematica kernel and the front end and provides a general interface between the kernel and other applications. Wolfram Research freely distributes a developer kit for linking applications written in the programming language C to the Mathematica kernel through WSTP using J/Link., a Java program that can ask Mathematica to perform computations. Similar functionality is achieved with .NET /Link, but with .NET programs instead of Java programs. Other languages that connect to Mathematica include Haskell, AppleScript, Racket, Visual Basic, Python, and Clojure. Mathematica supports the generation and execution of Modelica models for systems modeling and connects with Wolfram System Modeler. Links are also available to many third-party software packages and APIs. Mathematica can also capture real-time data from a variety of sources and can read and write to public blockchains (Bitcoin, Ethereum, and ARK). It supports import and export of over 220 data, image, video, sound, computer-aided design (CAD), geographic information systems (GIS), document, and biomedical formats. In 2019, support was added for compiling Wolfram Language code to LLVM. Version 12.3 of the Wolfram Language added support for Arduino. == Computable data == Mathematica is also integrated with Wolfram Alpha, an online answer engine that provides additional data, some of which is kept updated in real time, for users who use Mathematica with an internet connection. Some of the data sets include astronomical, chemical, geopolitical, language, biomedical, airplane, and weather data, in addition to mathematical data (such as knots and polyhedra). == Reception == BYTE in 1989 listed Mathematica as among the "Distinction" winners of the BYTE Awards, stating that it "is another breakthrough Macintosh application ... it could enable you to absorb the algebra and calculus that seemed impossible to comprehend from a textbook". Mathematica has been criticized for being closed source. Wolfram Research claims keeping Mathematica closed source is central to its business model and the continuity of the software.

    Read more →
  • Evolutionary multimodal optimization

    Evolutionary multimodal optimization

    In applied mathematics, multimodal optimization deals with optimization tasks that involve finding all or most of the multiple (at least locally optimal) solutions of a problem, as opposed to a single best solution. Evolutionary multimodal optimization is a branch of evolutionary computation, which is closely related to machine learning. Wong provides a short survey, wherein the chapter of Shir and the book of Preuss cover the topic in more detail. == Motivation == Knowledge of multiple solutions to an optimization task is especially helpful in engineering, when due to physical (and/or cost) constraints, the best results may not always be realizable. In such a scenario, if multiple solutions (locally and/or globally optimal) are known, the implementation can be quickly switched to another solution and still obtain the best possible system performance. Multiple solutions could also be analyzed to discover hidden properties (or relationships) of the underlying optimization problem, which makes them important for obtaining domain knowledge. In addition, the algorithms for multimodal optimization usually not only locate multiple optima in a single run, but also preserve their population diversity, resulting in their global optimization ability on multimodal functions. Moreover, the techniques for multimodal optimization are usually borrowed as diversity maintenance techniques to other problems. == Background == Classical techniques of optimization would need multiple restart points and multiple runs in the hope that a different solution may be discovered every run, with no guarantee however. Evolutionary algorithms (EAs) due to their population based approach, provide a natural advantage over classical optimization techniques. They maintain a population of possible solutions, which are processed every generation, and if the multiple solutions can be preserved over all these generations, then at termination of the algorithm we will have multiple good solutions, rather than only the best solution. Note that this is against the natural tendency of classical optimization techniques, which will always converge to the best solution, or a sub-optimal solution (in a rugged, “badly behaving” function). Finding and maintenance of multiple solutions is wherein lies the challenge of using EAs for multi-modal optimization. Niching is a generic term referred to as the technique of finding and preserving multiple stable niches, or favorable parts of the solution space possibly around multiple solutions, so as to prevent convergence to a single solution. The field of Evolutionary algorithms encompasses genetic algorithms (GAs), evolution strategy (ES), differential evolution (DE), particle swarm optimization (PSO), and other methods. Attempts have been made to solve multi-modal optimization in all these realms and most, if not all the various methods implement niching in some form or the other. == Multimodal optimization using genetic algorithms/evolution strategies == De Jong's crowding method, Goldberg's sharing function approach, Petrowski's clearing method, restricted mating, maintaining multiple subpopulations are some of the popular approaches that have been proposed by the community. The first two methods are especially well studied, however, they do not perform explicit separation into solutions belonging to different basins of attraction. The application of multimodal optimization within ES was not explicit for many years, and has been explored only recently. A niching framework utilizing derandomized ES was introduced by Shir, proposing the CMA-ES as a niching optimizer for the first time. The underpinning of that framework was the selection of a peak individual per subpopulation in each generation, followed by its sampling to produce the consecutive dispersion of search-points. The biological analogy of this machinery is an alpha-male winning all the imposed competitions and dominating thereafter its ecological niche, which then obtains all the sexual resources therein to generate its offspring. Recently, an evolutionary multiobjective optimization (EMO) approach was proposed, in which a suitable second objective is added to the originally single objective multimodal optimization problem, so that the multiple solutions form a weak pareto-optimal front. Hence, the multimodal optimization problem can be solved for its multiple solutions using an EMO algorithm. Improving upon their work, the same authors have made their algorithm self-adaptive, thus eliminating the need for pre-specifying the parameters. An approach that does not use any radius for separating the population into subpopulations (or species) but employs the space topology instead is proposed in.

    Read more →
  • List of C software and tools

    List of C software and tools

    This is a list of software and programming tools for the C programming language, including libraries, debuggers, compilers, integrated development environments (IDEs), and other related development tools and utilities. == Libraries and tools == Adns — asynchronous DNS resolver library Advanced Linux Sound Architecture — API for sound card device drivers Allegro — cross-platform software library for video game development Apache Portable Runtime — Apache web server tool set of APIs that map to the underlying operating system Argon2 — memory-hard password hashing library Berkeley DB — embedded database software library for key/value data Binary File Descriptor library — binary file manipulation library in the GNU toolchain Boehm garbage collector – conservative garbage collector Borland Graphics Interface — graphics library for Borland compilers BSAFE — FIPS 140-2 validated cryptography library Chipmunk — 2D real-time rigid body physics engine C POSIX library — specification of a C standard library for POSIX systems C standard library – standard library for the C programming language Cairo – vector graphics library API for software developers CFD General Notation System (CGNS) — data format and library for computational fluid dynamics cJSON — lightweight JSON parser CLIPS — public-domain software tool for building expert systems Core Audio — low-level API for dealing with sound in Apple's macOS and iOS operating systems Core Foundation — API for macOS and iOS and other Apple operating systems Core Image — GPU accelerated image processing technology for Apple operating systems with Quartz graphics rendering layer. Core Text — text layout and font rendering API for macOS and iOS. Cryptlib — portable cryptography library cURL / libcurl — CLI app for uploading and downloading individual files, such as a URL from a web server over HTTP. DevIL — cross-platform image library for loading and converting file formats DirectFB — graphics acceleration and input device handling library Dld — dynamic loading library Expat — stream-oriented XML 1.0 parser library, written in C99. FFmpeg — multimedia framework for audio/video processing Fontconfig — font customization and configuration library FreeTDS — database library for Sybase and Microsoft SQL Server FreeType — render text onto bitmaps with a font rasterization engine GD Graphics Library — image creation and manipulation library GDK — graphics abstraction layer for GTK GEGL — graph-based image processing framework GIO — I/O and virtual file system library in GLib GLib — utility library providing data structures, event loops, and portability functions. glibc — GNU implementation of the C standard library GLFW — library for OpenGL contexts, windows, and input device handling GNet — networking library for GLib GNU Libtool — Library management tool GNU portability library — collection of portability routines for GNU software GNU Portable Threads — POSIX/ANSI-C based user space thread library for UNIX for scheduling multithreading GNU Readline — command-line editing library GnuTLS — secure communications (TLS/SSL) library GObject — object system library for GNOME GTK — widget toolkit for creating graphical user interfaces GTK Scene Graph Kit (GSK) — scene graph and rendering toolkit for GTK HDF — file format and library for managing large datasets Integrated Performance Primitives — Intel library of optimized multimedia and data processing routines IUP — portable GUI toolkit J2K-Codec — JPEG 2000 image codec JasPer — reference implementation of the codec specified in the JPEG-2000 Part-1 standard LDAP API — API for interacting with Lightweight Directory Access Protocol LZO — lossless compression library Liba52 — decoder for A/52 (AC-3) audio streams libarchive — reading and writing various archive and compression formats Libart — 2D graphics library Libavcodec — codec library from FFmpeg Libavdevice — library for handling multimedia devices Libavfilter — audio and video filter library Libavformat — library for muxing and demuxing multimedia Libpcap — packet capture library Libdca — decoder for DTS audio Libdvdcss — access to encrypted DVD-Video discs libevent — asynchronous event notification callbacks libffi — foreign function interface libfuse — userspace filesystem Libgegl — programming interface to GEGL image processing libgcrypt — cryptography Libgimp — plug-in development library for GIMP Libhybris — compatibility layer for running Android libraries on Linux Libinput — input device library for Wayland and X.Org libjpeg — JPEG image library libLAS — reading and writing geospatial data encoded in the ASPRS laser (LAS) file format libmicrohttpd — small C library for embedding HTTP server functionality Libmpcodecs — media player codec library from MPlayer Libmpdemux — demultiplexing library from MPlayer libpng — PNG image format Libpostproc — video post-processing library from FFmpeg libpq — PostgreSQL client LibreSSL — fork of OpenSSL for TLS Librsb — parallel library for sparse matrix computations Librsvg — SVG rendering library libsndfile — reading and writing audio files libsodium — easy-to-use cryptography library Libswscale — image scaling and colorspace conversion library LibTIFF — TIFF image handling library Libusb — USB device access library Libuv — asynchronous I/O and event loop library LibVLC — media player engine from VLC LibVNCServer — implementation of the VNC server protocol Libvpx — VP8 and VP9 video codec library Libwww — early World Wide Web protocol library from W3C libxml2 — XML parsing Libxslt — XSLT library for the GNOME Project libzip — ZIP archives Lightning Memory-Mapped Database — fast key–value database engine LittleCMS — open-source color management system LZ4 — fast lossless compression algorithm LZFSE — compression library developed by Apple MatrixSSL — lightweight TLS implementation Mbed TLS — portable cryptography and TLS library MediaLib — Sun Microsystems library for multimedia processing Mesa — OpenGL and Vulkan graphics library Microwindows — small windowing system for embedded devices Ming — library for generating SWF (Flash) files Mongoose — embedded web server and networking library Mpg123 — MP3 audio decoding library MPIR — multiple-precision arithmetic library MsQuic — Microsoft implementation of the QUIC transport protocol MuJoCo — physics engine for robotics and control Mustache — logic-less templating library Ncurses — terminal control library Nettle — low-level cryptography library Newt — text-based user interface library Netpbm — graphics conversion and processing library Nghttp2 — implementation of the HTTP/2 protocol Oniguruma — regular expression library Open Asset Import Library — library to import/export 3D model formats OpenCL — parallel computing API/library OpenCV — computer vision OpenGL — API for rendering 2D and 3D vector graphics OpenGL Utility Library — OpenGL utility functions OpenJPEG — JPEG 2000 image codec OpenSSL — SSL and TLS protocols and cryptography library Pango — layout engine library which works with the HarfBuzz shaping engine for displaying multi-language text perf (Linux) — performance analyzing tool PCRE — regular expression library PROJ — library for map projections and coordinate transforms Quartz 2D — 2D graphics rendering API for macOS and iOS platforms, part of the Core Graphics framework. Raylib — simple library for games and multimedia Redland RDF Application Framework — RDF data storage library S2n-tls — TLS implementation from AWS Setcontext — context switching library functions SDL — Simple DirectMedia Layer systemd — system and service manager libraries for Linux Tk — GUI widgets for building graphical user interfaces VDPAU — video decoding acceleration API Vorbis — audio compression codec library VTD-XML — high-performance XML parser Wimlib — library for handling Windows Imaging Format disk images Windows.h — base Windows API header file WolfSSH — lightweight SSH library WolfSSL — lightweight SSL/TLS library X Toolkit Intrinsics — toolkit library for the X Window System x264 — H.264 video codec library XCB — C binding for the X Window System protocol Xft — font rendering library using FreeType Xlib — low-level X Window System API XMDF — eXtensible Model Data Format for scientific data XMLStarlet — XML command-line toolkit zlib — data compression Zopfli — data compression library that performs deflate, gzip and zlib data encoding. Zstd — fast data compression library == Integrated development environments == Anjuta — GNOME IDE CLion — cross-platform commercial IDE from JetBrains Code::Blocks — cross-platform open-source IDE CodeLite — open-source IDE Dev-C++ Eclipse CDT Geany — text editor with IDE features KDevelop — KDE IDE NetBeans Qt Creator SlickEdit Visual Studio Xcode === Online IDEs === CodeSandbox — online IDE primarily for web development with some C support via containers GitHub Codespaces — cloud-based online IDE developed by GitHub Google Cloud Shell — browser-based shell and editor that can comp

    Read more →
  • Unique negative dimension

    Unique negative dimension

    Unique negative dimension (UND) is a complexity measure for the model of learning from positive examples. The unique negative dimension of a class C {\displaystyle C} of concepts is the size of the maximum subclass D ⊆ C {\displaystyle D\subseteq C} such that for every concept c ∈ D {\displaystyle c\in D} , we have ∩ ( D ∖ { c } ) ∖ c {\displaystyle \cap (D\setminus \{c\})\setminus c} is nonempty. This concept was originally proposed by M. Gereb-Graus in "Complexity of learning from one-side examples", Technical Report TR-20-89, Harvard University Division of Engineering and Applied Science, 1989.

    Read more →
  • Low-rank approximation

    Low-rank approximation

    In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure. Low-rank approximation is closely related to numerous other techniques, including principal component analysis, factor analysis, total least squares, latent semantic analysis, orthogonal regression, and dynamic mode decomposition. == Definition == Given structure specification S : R n p → R m × n {\displaystyle {\mathcal {S}}:\mathbb {R} ^{n_{p}}\to \mathbb {R} ^{m\times n}} , vector of structure parameters p ∈ R n p {\displaystyle p\in \mathbb {R} ^{n_{p}}} , norm ‖ ⋅ ‖ {\displaystyle \|\cdot \|} , and desired rank r {\displaystyle r} , minimize over p ^ ‖ p − p ^ ‖ subject to rank ⁡ ( S ( p ^ ) ) ≤ r . {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {p}}\quad \|p-{\widehat {p}}\|\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\mathcal {S}}({\widehat {p}}){\big )}\leq r.} == Applications == Linear system identification, in which case the approximating matrix is Hankel structured. Machine learning, in which case the approximating matrix is nonlinearly structured. Recommender systems, in which cases the data matrix has missing values and the approximation is categorical. Distance matrix completion, in which case there is a positive definiteness constraint. Natural language processing, in which case the approximation is nonnegative. Computer algebra, in which case the approximation is Sylvester structured. Matrix product states, in which case the approximation is usually rescaled to have fixed Frobenius norm. == Basic low-rank approximation problem == The unstructured problem with fit measured by the Frobenius norm, i.e., minimize over D ^ ‖ D − D ^ ‖ F subject to rank ⁡ ( D ^ ) ≤ r {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {D}}\quad \|D-{\widehat {D}}\|_{\text{F}}\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\widehat {D}}{\big )}\leq r} has an analytic solution in terms of the singular value decomposition of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem. This problem was originally solved by Erhard Schmidt in the infinite dimensional context of integral operators (although his methods easily generalize to arbitrary compact operators on Hilbert spaces) and later rediscovered by C. Eckart and G. Young. L. Mirsky generalized the result to arbitrary unitarily invariant norms. Let D = U Σ V ⊤ ∈ R m × n , m ≥ n {\displaystyle D=U\Sigma V^{\top }\in \mathbb {R} ^{m\times n},\quad m\geq n} be the singular value decomposition of D {\displaystyle D} , where Σ =: diag ⁡ ( σ 1 , … , σ r ) {\displaystyle \Sigma =:\operatorname {diag} (\sigma _{1},\ldots ,\sigma _{r})} , where r ≤ min { m , n } = n {\displaystyle r\leq \min\{m,n\}=n} , is the m × n {\displaystyle m\times n} rectangular diagonal matrix with r {\displaystyle r} non-zero singular values σ 1 ≥ … ≥ σ r > σ r + 1 = … = σ n = 0 {\displaystyle \sigma _{1}\geq \ldots \geq \sigma _{r}>\sigma _{r+1}=\ldots =\sigma _{n}=0} . For a given k ∈ { 1 , … , r } {\displaystyle k\in \{1,\dots ,r\}} , partition U {\displaystyle U} , Σ {\displaystyle \Sigma } , and V {\displaystyle V} as follows: U =: [ U 1 U 2 ] , Σ =: [ Σ 1 0 0 Σ 2 ] , and V =: [ V 1 V 2 ] , {\displaystyle U=:{\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}},\quad \Sigma =:{\begin{bmatrix}\Sigma _{1}&0\\0&\Sigma _{2}\end{bmatrix}},\quad {\text{and}}\quad V=:{\begin{bmatrix}V_{1}&V_{2}\end{bmatrix}},} where U 1 {\displaystyle U_{1}} is m × k {\displaystyle m\times k} , Σ 1 {\displaystyle \Sigma _{1}} is k × k {\displaystyle k\times k} , and V 1 {\displaystyle V_{1}} is n × k {\displaystyle n\times k} . Then the rank k {\displaystyle k} matrix D ^ ∗ := U 1 Σ 1 V 1 ⊤ , {\displaystyle {\widehat {D}}^{}:=U_{1}\Sigma _{1}V_{1}^{\top },} obtained from the truncated singular value decomposition is such that ‖ D − D ^ ∗ ‖ F = min rank ⁡ ( D ^ ) ≤ k ‖ D − D ^ ‖ F = σ k + 1 2 + ⋯ + σ r 2 . {\displaystyle \|D-{\widehat {D}}^{}\|_{\text{F}}=\min _{\operatorname {rank} ({\widehat {D}})\leq k}\|D-{\widehat {D}}\|_{\text{F}}={\sqrt {\sigma _{k+1}^{2}+\cdots +\sigma _{r}^{2}}}.} The minimizer D ^ ∗ {\displaystyle {\widehat {D}}^{}} is unique if and only if σ k > σ k + 1 {\displaystyle \sigma _{k}>\sigma _{k+1}} . == Proof of Eckart–Young–Mirsky theorem (for spectral norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . Recall that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices, and Σ {\displaystyle \Sigma } is an m × n {\displaystyle m\times n} diagonal matrix with entries ( σ 1 , σ 2 , ⋯ , σ m ) {\displaystyle (\sigma _{1},\sigma _{2},\cdots ,\sigma _{m})} such that σ 1 ≥ σ 2 ≥ ⋯ ≥ σ m ≥ 0 {\displaystyle \sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{m}\geq 0} . We claim that the best rank- k {\displaystyle k} approximation to A {\displaystyle A} in the spectral norm, denoted by ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} , is given by A k := ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}:=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ 2 = ‖ ∑ i = 1 n σ i u i v i ⊤ − ∑ i = 1 k σ i u i v i ⊤ ‖ 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ 2 = σ k + 1 {\displaystyle \|A-A_{k}\|_{2}=\left\|\sum _{i=1}^{\color {red}{n}}\sigma _{i}u_{i}v_{i}^{\top }-\sum _{i=1}^{\color {red}{k}}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\left\|\sum _{i=\color {red}{k+1}}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\sigma _{k+1}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ 2 = σ k + 1 ≤ ‖ A − B k ‖ 2 {\displaystyle \|A-A_{k}\|_{2}=\sigma _{k+1}\leq \|A-B_{k}\|_{2}} . Since Y {\displaystyle Y} has k {\displaystyle k} columns, then there must be a nontrivial linear combination of the first k + 1 {\displaystyle k+1} columns of V {\displaystyle V} , i.e., w = γ 1 v 1 + ⋯ + γ k + 1 v k + 1 , {\displaystyle w=\gamma _{1}v_{1}+\cdots +\gamma _{k+1}v_{k+1},} such that Y ⊤ w = 0 {\displaystyle Y^{\top }w=0} . Without loss of generality, we can scale w {\displaystyle w} so that ‖ w ‖ 2 = 1 {\displaystyle \|w\|_{2}=1} or (equivalently) γ 1 2 + ⋯ + γ k + 1 2 = 1 {\displaystyle \gamma _{1}^{2}+\cdots +\gamma _{k+1}^{2}=1} . Therefore, ‖ A − B k ‖ 2 2 ≥ ‖ ( A − B k ) w ‖ 2 2 = ‖ A w ‖ 2 2 = γ 1 2 σ 1 2 + ⋯ + γ k + 1 2 σ k + 1 2 ≥ σ k + 1 2 . {\displaystyle \|A-B_{k}\|_{2}^{2}\geq \|(A-B_{k})w\|_{2}^{2}=\|Aw\|_{2}^{2}=\gamma _{1}^{2}\sigma _{1}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2}\geq \sigma _{k+1}^{2}.} The result follows by taking the square root of both sides of the above inequality. == Proof of Eckart–Young–Mirsky theorem (for Frobenius norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . We claim that the best rank k {\displaystyle k} approximation to A {\displaystyle A} in the Frobenius norm, denoted by ‖ ⋅ ‖ F {\displaystyle \|\cdot \|_{F}} , is given by A k = ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ F 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ F 2 = ∑ i = k + 1 n σ i 2 {\displaystyle \|A-A_{k}\|_{F}^{2}=\left\|\sum _{i=k+1}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ F 2 = ∑ i = k + 1 n σ i 2 ≤ ‖ A − B k ‖ F 2 . {\displaystyle \|A-A_{k}\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}\leq \|A-B_{k}\|_{F}^{2}.} By the triangle inequality with the spectral norm

    Read more →