AI Tools

AI Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Audio inpainting

    Audio inpainting

    Audio inpainting (also known as audio interpolation) is an audio restoration task which deals with the reconstruction of missing or corrupted portions of a digital audio signal. Inpainting techniques are employed when parts of the audio have been lost due to various factors such as transmission errors, data corruption or errors during recording. The goal of audio inpainting is to fill in the gaps (i.e., the missing portions) in the audio signal seamlessly, making the reconstructed portions indistinguishable from the original content and avoiding the introduction of audible distortions or alterations. Many techniques have been proposed to solve the audio inpainting problem and this is usually achieved by analyzing the temporal and spectral information surrounding each missing portion of the considered audio signal. Classic methods employ statistical models or digital signal processing algorithms to predict and synthesize the missing or damaged sections. Recent solutions, instead, take advantage of deep learning models, thanks to the growing trend of exploiting data-driven methods in the context of audio restoration. Depending on the extent of the lost information, the inpainting task can be divided in three categories. Short inpainting refers to the reconstruction of few milliseconds (approximately less than 10) of missing signal, that occurs in the case of short distortions such as clicks or clipping. In this case, the goal of the reconstruction is to recover the lost information exactly. In long inpainting instead, with gaps in the order of hundreds of milliseconds or even seconds, this goal becomes unrealistic, since restoration techniques cannot rely on local information. Therefore, besides providing a coherent reconstruction, the algorithms need to generate new information that has to be semantically compatible with the surrounding context (i.e., the audio signal surrounding the gaps). The case of medium duration gaps lays between short and long inpainting. It refers to the reconstruction of tens of millisecond of missing data, a scale where the non-stationary characteristic of audio already becomes important. == Definition == Consider a digital audio signal x {\displaystyle \mathbf {x} } . A corrupted version of x {\displaystyle \mathbf {x} } , which is the audio signal presenting missing gaps to be reconstructed, can be defined as x ~ = m ∘ x {\displaystyle \mathbf {\tilde {x}} =\mathbf {m} \circ \mathbf {x} } , where m {\displaystyle \mathbf {m} } is a binary mask encoding the reliable or missing samples of x {\displaystyle \mathbf {x} } , and ∘ {\displaystyle \circ } represents the element-wise product. Audio inpainting aims at finding x ^ {\displaystyle \mathbf {\hat {x}} } (i.e., the reconstruction), which is an estimation of x {\displaystyle \mathbf {x} } . This is an ill-posed inverse problem, which is characterized by a non-unique set of solutions. For this reason, similarly to the formulation used for the inpainting problem in other domains, the reconstructed audio signal can be found through an optimization problem that is formally expressed as x ^ ∗ = argmin X ^ L ( m ∘ x ^ , x ~ ) + R ( x ^ ) {\displaystyle \mathbf {\hat {x}} ^{}={\underset {\hat {\mathbf {X} }}{\text{argmin}}}~L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )+R(\mathbf {\hat {x}} )} . In particular, x ^ ∗ {\displaystyle \mathbf {\hat {x}} ^{}} is the optimal reconstructed audio signal and L {\displaystyle L} is a distance measure term that computes the reconstruction accuracy between the corrupted audio signal and the estimated one. For example, this term can be expressed with a mean squared error or similar metrics. Since L {\displaystyle L} is computed only on the reliable frames, there are many solutions that can minimize L ( m ∘ x ^ , x ~ ) {\displaystyle L(\mathbf {m} \circ \mathbf {\hat {x}} ,\mathbf {\tilde {x}} )} . It is thus necessary to add a constraint to the minimization, in order to restrict the results only to the valid solutions. This is expressed through the regularization term R {\displaystyle R} that is computed on the reconstructed audio signal x ^ {\displaystyle \mathbf {\hat {x}} } . This term encodes some kind of a-priori information on the audio data. For example, R {\displaystyle R} can express assumptions on the stationarity of the signal, on the sparsity of its representation or can be learned from data. == Techniques == There exist various techniques to perform audio inpainting. These can vary significantly, influenced by factors such as the specific application requirements, the length of the gaps and the available data. In the literature, these techniques are broadly divided in model-based techniques (sometimes also referred as signal processing techniques) and data-driven techniques. === Model-based techniques === Model-based techniques involve the exploitation of mathematical models or assumptions about the underlying structure of the audio signal. These models can be based on prior knowledge of the audio content or statistical properties observed in the data. By leveraging these models, missing or corrupted portions of the audio signal can be inferred or estimated. An example of a model-based techniques are autoregressive models. These methods interpolate or extrapolate the missing samples based on the neighboring values, by using mathematical functions to approximate the missing data. In particular, in autoregressive models the missing samples are completed through linear prediction. The autoregressive coefficients necessary for this prediction are learned from the surrounding audio data, specifically from the data adjacent to each gap. Some more recent techniques approach audio inpainting by representing audio signals as sparse linear combinations of a limited number of basis functions (as for example in the Short Time Fourier Transform). In this context, the aim is to find the sparse representation of the missing section of the signal that most accurately matches the surrounding, unaffected signal. The aforementioned methods exhibit optimal performance when applied to filling in relatively short gaps, lasting only a few tens of milliseconds, and thus they can be included in the context of short inpainting. However, these signal-processing techniques tend to struggle when dealing with longer gaps. The reason behind this limitation lies in the violation of the stationarity condition, as the signal often undergoes significant changes after the gap, making it substantially different from the signal preceding the gap. As a way to overcome these limitations, some approaches add strong assumptions also about the fundamental structure of the gap itself, exploiting sinusoidal modeling or similarity graphs to perform inpainting of longer missing portions of audio signals. === Data-driven techniques === Data-driven techniques rely on the analysis and exploitation of the available audio data. These techniques often employ deep learning algorithms that learn patterns and relationships directly from the provided data. They involve training models on large datasets of audio examples, allowing them to capture the statistical regularities present in the audio signals. Once trained, these models can be used to generate missing portions of the audio signal based on the learned representations, without being restricted by stationarity assumptions. Data-driven techniques also offer the advantage of adaptability and flexibility, as they can learn from diverse audio datasets and potentially handle complex inpainting scenarios. As of today, such techniques constitute the state-of-the-art of audio inpainting, being able to reconstruct gaps of hundreds of milliseconds or even seconds. These performances are made possible by the use of generative models that have the capability to generate novel content to fill in the missing portions. For example, generative adversarial networks, which are the state-of-the-art of generative models in many areas, rely on two competing neural networks trained simultaneously in a two-player minmax game: the generator produces new data from samples of a random variable, the discriminator attempts to distinguish between generated and real data. During the training, the generator's objective is to fool the discriminator, while the discriminator attempts to learn to better classify real and fake data. In GAN-based inpainting methods the generator acts as a context encoder and produces a plausible completion for the gap only given the available information surrounding it. The discriminator is used to train the generator and tests the consistency of the produced inpainted audio. Recently, also diffusion models have established themselves as the state-of-the-art of generative models in many fields, often beating even GAN-based solutions. For this reason they have also been used to solve the audio inpainting problem, obtaining valid results. These models generate new data instances by inverting the

    Read more →
  • Neural operators

    Neural operators

    Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), which are critical tools in modeling the natural environment. Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies while being significantly faster than numerical solvers. Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, and the geosciences. In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, and subsumes these settings as special cases when limited to a fixed input resolution. == Operator learning == Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function . Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training. The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids. == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. Neural operators seek to approximate some operator G : A → U {\displaystyle {\mathcal {G}}:{\mathcal {A}}\to {\mathcal {U}}} between function spaces A {\displaystyle {\mathcal {A}}} and U {\displaystyle {\mathcal {U}}} by building a parametric map G ϕ : A → U {\displaystyle {\mathcal {G}}_{\phi }:{\mathcal {A}}\to {\mathcal {U}}} . Such parametric maps G ϕ {\displaystyle {\mathcal {G}}_{\phi }} can generally be defined in the form G ϕ := Q ∘ σ ( W T + K T + b T ) ∘ ⋯ ∘ σ ( W 1 + K 1 + b 1 ) ∘ P , {\displaystyle {\mathcal {G}}_{\phi }:={\mathcal {Q}}\circ \sigma (W_{T}+{\mathcal {K}}_{T}+b_{T})\circ \cdots \circ \sigma (W_{1}+{\mathcal {K}}_{1}+b_{1})\circ {\mathcal {P}},} where P , Q {\displaystyle {\mathcal {P}},{\mathcal {Q}}} are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output dimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. σ {\displaystyle \sigma } is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer t = 1 , … , T {\displaystyle t=1,\dots ,T} has a respective local operator W t {\displaystyle W_{t}} (usually parameterized by a pointwise neural network), a kernel integral operator K t {\displaystyle {\mathcal {K}}_{t}} , and a bias function b t {\displaystyle b_{t}} . Given some intermediate functional representation v t {\displaystyle v_{t}} with domain D {\displaystyle D} in the t {\displaystyle t} -th hidden layer, a kernel integral operator K ϕ {\displaystyle {\mathcal {K}}_{\phi }} is defined as ( K ϕ v t ) ( x ) := ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x):=\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy,} where the kernel κ ϕ {\displaystyle \kappa _{\phi }} is a learnable implicit neural network, parametrized by ϕ {\displaystyle \phi } . In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of v t {\displaystyle v_{t}} at n {\displaystyle n} points { y j } j n {\displaystyle \{y_{j}\}_{j}^{n}} . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows: ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y ≈ ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j , {\displaystyle \int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy\approx \sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}},} where Δ y j {\displaystyle \Delta _{y_{j}}} is the sub-area volume or quadrature weight associated to the point y j {\displaystyle y_{j}} . Thus, a simplified layer can be computed as v t + 1 ( x ) ≈ σ ( ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j + W t ( v t ( y j ) ) + b t ( x ) ) . {\displaystyle v_{t+1}(x)\approx \sigma \left(\sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}}+W_{t}(v_{t}(y_{j}))+b_{t}(x)\right).} The above approximation, along with parametrizing κ ϕ {\displaystyle \kappa _{\phi }} as an implicit neural network, results in the graph neural operator (GNO). There have been various parameterizations of neural operators for different applications. These typically differ in their parameterization of κ {\displaystyle \kappa } . The most popular instantiation is the Fourier neural operator (FNO). FNO takes κ ϕ ( x , y , v t ( x ) , v t ( y ) ) := κ ϕ ( x − y ) {\displaystyle \kappa _{\phi }(x,y,v_{t}(x),v_{t}(y)):=\kappa _{\phi }(x-y)} and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator: ( K ϕ v t ) ( x ) = F − 1 ( R ϕ ⋅ ( F v t ) ) ( x ) , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x)={\mathcal {F}}^{-1}(R_{\phi }\cdot ({\mathcal {F}}v_{t}))(x),} where F {\displaystyle {\mathcal {F}}} represents the Fourier transform and R ϕ {\displaystyle R_{\phi }} represents the Fourier transform of some periodic function κ ϕ {\displaystyle \kappa _{\phi }} . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation. == Training == Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset { ( a i , u i ) } i = 1 N {\displaystyle \{(a_{i},u_{i})\}_{i=1}^{N}} of size N {\displaystyle N} , neural operators minimize (a discretization of) L U ( { ( a i , u i ) } i = 1 N ) := ∑ i = 1 N ‖ u i − G θ ( a i ) ‖ U 2 {\displaystyle {\mathcal {L}}_{\mathca

    Read more →
  • Graph cuts in computer vision and artificial intelligence

    Graph cuts in computer vision and artificial intelligence

    As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, numerous military applications (eg Automatic target recognition) and many other problems that can be formulated in terms of energy minimization (eg Climate Science and Environmental modelling). Graph cut techniques are now increasingly being used in combination with more general spatial Artificial intelligence techniques (eg to enforce structure in Large language model output to sharpen tumour boundaries and similarly for various Augmented reality, Self-driving car, Robotics, Google Maps applications etc). Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g. normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms). "Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum. == History == The foundational theory of graph cuts in computer vision was first developed by Margaret Greig, Bruce Porteous and Allan Seheult (GPS) of Durham University in a now legendary discussion contribution to Julian Besag's 1986 paper and a more detailed follow on paper in 1989. In the Bayesian statistical context of smoothing noisy images, using a Markov random field as the image prior distribution, they showed with a mathematically beautiful proof how the maximum a posteriori estimate of a binary image can be obtained exactly by maximizing the flow through an associated image network, or graph, involving the introduction of a source and sink and Log-likelihood ratios. The problem was shown to be efficiently solvable exactly, an unexpected result as the problem was believed to be computationally intractable (NP hard). GPS also addressed the computational cost of the max-flow algorithm on large graphs, a significant concern at the time. They proposed a partitioning algorithm (see Section 4 of GPS) involving the recursive amalgamation of non-overlapping blocks, or tiles, which gave a 12X increase in speed. This approach recursively solved and amalgamated independent sub-graphs until the whole graph was solved. While contemporaries like Geman and Geman had advocated Parallel computing in the context of Simulated annealing, the GPS blocking strategy offered a deterministic structure amenable to parallelisation and anticipated modern artificial intelligence design across multiple GPUs. However, until recently, this aspect of the paper was largely ignored and subsequent research focused on Serial computer global search trees, such as the Boykov-Kolmogorov algorithm. Although the general k {\displaystyle k} -colour problem is NP hard for k > 2 , {\displaystyle k>2,} the GPS approach has turned out to have very wide applicability in general computer vision problems. This was first demonstrated by Boykov, Veksler and Zabih who, in a seminal paper published more than 10 years after the original GPS paper, and in other important works, lit the blue touch paper for the general adoption of graph cut techniques in computer vision. They showed that, for general problems, the GPS approach can be applied iteratively to sequences of binary problems, using their now ubiquitous alpha-expansion algorithm, yielding near optimal solutions. Prior to these results, approximate local optimisation techniques such as simulated annealing (as proposed by the Geman brothers) or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) were used to solve such image smoothing problems. Building on these advancements, GPS graph cut optimization was subsequently adapted for interactive image segmentation, most notably through the "GrabCut" algorithm introduced by Carsten Rother, Vladimir Kolmogorov, and Andrew Blake of Microsoft Research, Cambridge. GrabCut extended earlier interactive graph cut methods by replacing monochrome image histograms with Gaussian mixture models to estimate colour distributions, and by employing an iterative GPS energy minimisation scheme. This approach significantly simplified user interaction, requiring only a rough bounding box around the target object rather than detailed user-drawn strokes, and it quickly became a standard tool in both academic research and commercial image editing software. The GPS paper connected and bridged profound ideas from Mathematical statistics (Bayes' theorem, Markov random field), Physics (Ising model), Optimisation (Energy function) and Computer science (Network flow problem) and led the move away from approximate local and slow optimisation approaches (eg simulated annealing) to more powerful exact, or near exact, faster global optimisation techniques. It is now recognised as seminal as it was well ahead of its time and, in particular, was published years before the computing power revolution of Moore's law and GPUs. Significantly, GPS was published in a mathematical statistics (rather than a computer vision) journal, and this led to it being overlooked by the computer vision community for many years. It is unofficially known as "The Velvet Underground" paper of computer vision (ie although very few computer vision people read the paper [bought the record], those that did, most importantly Boykov, Veksler and Zabih, started new and important research [formed a band]). This is confirmed by GPS' very large amplification ratio (2nd order citations/first order citations), estimated at well in excess of 100. Despite the foundational nature of the GPS work, formal recognition from the computer vision community has predominantly gone to the researchers who followed to extend and popularise the graph cut method. For example, Boykov, Veksler and Zabih deservedly received a Helmholtz Prize from the ICCV in 2011. This prize recognises ICCV papers from 10 or more years earlier that have had a significant impact on computer vision research. In 2011, Couprie et al. proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent p {\displaystyle p} . When p = 1 {\displaystyle p=1} , the Power Watershed is optimized by graph cuts, when p = 0 {\displaystyle p=0} the Power Watershed is optimized by shortest paths, p = 2 {\displaystyle p=2} is optimized by the random walker algorithm and p = ∞ {\displaystyle p=\infty } is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms. == Binary segmentation of images == === Notation === Image: x ∈ { R , G , B } N {\displaystyle x\in \{R,G,B\}^{N}} Output: Segmentation (also called opacity) S ∈ R N {\displaystyle S\in R^{N}} (soft segmentation). For hard segmentation S ∈ { 0 for background , 1 for foreground/object to be detected } N {\displaystyle S\in \{0{\text{ for background}},1{\text{ for foreground/object to be detected}}\}^{N}} Energy function: E ( x , S , C , λ ) {\displaystyle E(x,S,C,\lambda )} where C is the color parameter and λ is the coherence parameter. E ( x , S , C , λ ) = E c o l o r + E c o h e r e n c e {\displaystyle E(x,S,C,\lambda )=E_{\rm {color}}+E_{\rm {coherence}}} Optimization: The segmentation can be estimated as a global minimum over S: arg ⁡ min S E ( x , S , C , λ ) {\displaystyle {\arg \min }_{S}E(x,S,C,\lambda )} === Existing methods === Standard Graph cuts: optimize energy function over the segmentation (unknown S value). Iterated Graph cuts: First step optimizes over the color parameters using K-means. Second step performs the usual graph cuts algorithm. These 2 steps are repeated recursively until convergence Dynamic graph cuts:Allows to re-run the algorithm much faster after modifying the problem (e.g. after new seeds have been added by a user). === Energy function === Pr ( x ∣ S ) = K − E {\displaystyle \Pr(x\mid S)=K^{-E}} where the energy E {\displaystyle E} is composed of two different mod

    Read more →
  • Open Syllabus Project

    Open Syllabus Project

    The Open Syllabus Project (OSP) is an online open-source platform that catalogs and analyzes millions of college syllabi. Founded by researchers from the American Assembly at Columbia University, the OSP has amassed the most extensive collection of searchable syllabi. Since its beta launch in 2016, the OSP has collected over 7 million course syllabi from over 80 countries, primarily by scraping publicly accessible university websites. The project is directed by Joe Karaganis. == History == The OSP was formed by a group of data scientists, sociologists, and digital-humanities researchers at the American Assembly, a public-policy institute based at Columbia University. The OSP was partly funded by the Sloan Foundation and the Arcadia Fund. Joe Karaganis, former vice-president of the American Assembly, serves as the project director of the OSP. The project builds on prior attempts to archive syllabi, such as H-Net, MIT OpenCourseWare, and historian Dan Cohen's defunct Syllabus Finder website (Cohen now sits on the OSP's advisory board). The OSP became a non-profit and independent of the American Assembly in November 2019. In January 2016, the OSP launched a beta version of their "Syllabus Explorer," which they had collected data for since 2013. The Syllabus Explorer allows users to browse and search texts from over one million college course syllabi. The OSP launched a more comprehensive version 2.0 of the Syllabus Explorer in July 2019. The newer version includes an interactive visualization that displays texts as dots on a knowledge map. As of 2022, the OSP has collected over 7 million course syllabi. The Syllabus Explorer represents the "largest collection of searchable syllabi ever amassed." == Methodology == The OSP has collected syllabi data from over 80 countries dating to 2000. The syllabi stem from over 4,000 worldwide institutions. Most of the OSP's data originates from the United States. Canada, Australia, and the U.K also have large datasets. The OSP primarily collects syllabi by scraping publicly accessible university websites. The OSP also allows syllabi submissions from faculty, students, and administrators. The OSP developers use machine learning and natural language processing to extract metadata from such syllabi. Since only metadata is collected, no individual syllabus or personal identifying information is found in the OSP database. The OSP classifies the syllabi into 62 subject fields – corresponding to the U.S. Department of Education's Classification of Instructional Programs (CIP). Additionally, the OSP assigns each text a "teaching score" from 0–100. This score represents the text's percentile rank among citations in the total citation count and is a numerical indicator of the relative frequency of which a particular work is taught. The OSP also has data on which texts are most likely to be assigned together. The developers behind the OSP admit that the database is incomplete and likely contains "a fair number of errors." Karaganis estimates that 80–100 million syllabi exist in the United States alone. The OSP is unable to access syllabi behind private course-management software like Blackboard. == Notable findings == === Anthropology === Using data from the OSP, anthropologist Laurence Ralph uncovered that black anthropologists are "woefully under-represented in (if not erased from) most anthropology syllabi." Black authors wrote less than 1 percent of the top 1,000 assigned works. === Economics === The database indicates Greg Mankiw is the most frequently cited author for college economics courses. === English literature === The OSP found that Mary Shelley's Frankenstein was the most widely taught novel in college courses. Additionally, the majority of novels published after 1945 taught in English classes were historical fiction. === Female writers === The most read female writer on college campuses is Kate L. Turabian for her A Manual for Writers of Research Papers, Theses, and Dissertations . Turabian is followed by Diana Hacker, Toni Morrison, Jane Austen, and Virginia Woolf. === Film === The most assigned film according to the OSP is the 1929 Soviet documentary film, Man with a Movie Camera. English filmmaker Alfred Hitchcock is the most assigned director in college courses. === History === Historians George Brown Tindall and David Emory Shi's America: A Narrative History is the number one assigned textbook for history, followed by Anne Moody's memoir, Coming of Age in Mississippi. === Philosophy === The most assigned texts in the field of philosophy include Aristotle's Nicomachean Ethics, John Stuart Mill's Utilitarianism, and Plato's Republic. Plato's Republic was also the second most assigned text in universities in the English-speaking world (only behind Strunk and White's Elements of Style). === Physics === David Halliday's et al. Fundamentals of Physics is the number one ranked physics textbook in the OSP's database. === Political science === Data from the OSP indicates that the dominant political science texts are written almost exclusively by white men and scholars based in the West. In the top 200 most-frequently assigned works, 15 are authored by at least one woman. === Public administration === American president Woodrow Wilson's article "The Study of Administration" was the most frequently assigned text in public affairs and administration syllabi. == Reception == According to William Germano et al., the OSP is a "fascinating resource but is also prone to misrepresenting or at least distracting us from the most important business of a syllabus: communicating with students." Historian William Caferro remarks that the OSP is a "tacit experience of sharing, but a useful one." English professor Bart Beaty writes that, "Despite the many reservations about the completeness of its data, the OSP provides a rare opportunity for scholars to move beyond the anecdotal in discussions of canon-formation in teaching." Media theorist Elizabeth Losh opines that "big data approaches", like the OSP, may "raise troubling questions for instructors about informed consent, pedagogical privacy, and quantified metrics."

    Read more →
  • Virtual assistant

    Virtual assistant

    A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input, such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to streamline task execution. The interaction may be via text, graphical interface, or voice, as some virtual assistants are able to interpret human speech and respond via synthesized voices. In many cases, users can ask their virtual assistants questions, control home automation devices and media playback, and manage other basic tasks such as email, to-do lists, and calendars – all with verbal commands. In recent years, prominent virtual assistants for direct consumer use have included Apple Siri, Amazon Alexa, Google Assistant (Gemini), Microsoft Copilot and Samsung Bixby. Also, companies in various industries often incorporate some kind of virtual assistant technology into their customer service or support. Into the 2020s, the emergence of artificial intelligence based chatbots, such as ChatGPT, has brought increased capability and interest to the field of virtual assistant products and services. == History == === Experimental decades: 1910s–1980s === Radio Rex was the first voice-activated toy, patented in 1916 and released in 1922. It was a wooden toy in the shape of a dog that would come out of its house when its name is called. In 1952, Bell Labs presented "Audrey", the Automatic Digit Recognition machine. It occupied a six-foot-high relay rack, consumed substantial power, had streams of cables and exhibited the myriad maintenance problems associated with complex vacuum-tube circuitry. It could recognize the fundamental units of speech, phonemes. It was limited to the accurate recognition of digits spoken by designated talkers. It could therefore be used for voice dialing, but in most cases, push-button dialing was cheaper and faster, rather than speaking the consecutive digits. Another early tool which was enabled to perform digital speech recognition was the IBM Shoebox voice-activated calculator, presented to the general public during the 1962 Seattle World's Fair after its initial market launch in 1961. This early computer, developed almost 20 years before the introduction of the first IBM Personal Computer in 1981, was able to recognize 16 spoken words and the digits 0 to 9. The first natural language processing computer program or the chatbot ELIZA was developed by MIT professor Joseph Weizenbaum in the 1960s. It was created to "demonstrate that the communication between man and machine was superficial". ELIZA used pattern matching and substitution methodology into scripted responses to simulate conversation, which gave an illusion of understanding on the part of the program. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people. This gave name to the ELIZA effect, the tendency to unconsciously assume computer behaviors are analogous to human behaviors; that is, anthropomorphisation, a phenomenon present in human interactions with virtual assistants. The next milestone in the development of voice recognition technology was achieved in the 1970s at the Carnegie Mellon University in Pittsburgh, Pennsylvania with substantial support of the United States Department of Defense and its DARPA agency, funded five years of a Speech Understanding Research program, aiming to reach a minimum vocabulary of 1,000 words. Companies and academia including IBM, Carnegie Mellon University (CMU) and Stanford Research Institute took part in the program. The result was "Harpy", it mastered about 1000 words, the vocabulary of a three-year-old and it could understand sentences. It could process speech that followed pre-programmed vocabulary, pronunciation, and grammar structures to determine which sequences of words made sense together, and thus reducing speech recognition errors. In 1986, Tangora was an upgrade of the Shoebox, it was a voice recognizing typewriter. Named after the world's fastest typist at the time, it had a vocabulary of 20,000 words and used prediction to decide the most likely result based on what was said in the past. IBM's approach was based on a hidden Markov model, which adds statistics to digital signal processing techniques. The method makes it possible to predict the most likely phonemes to follow a given phoneme. Still each speaker had to individually train the typewriter to recognize their voice, and pause between each word. In 1983, Gus Searcy invented the "Butler in a Box", an electronic voice home controller system. === Birth of smart virtual assistants: 1990s–2010s === In the 1990s, digital speech recognition technology became a feature of the personal computer with IBM, Philips and Lernout & Hauspie fighting for customers. Much later the market launch of the first smartphone IBM Simon in 1994 laid the foundation for smart virtual assistants as we know them today. In 1997, Dragon's NaturallySpeaking software could recognize and transcribe natural human speech without pauses between each word into a document at a rate of 100 words per minute. A version of Naturally Speaking is still available for download and it is still used today, for instance, by many doctors in the US and the UK to document their medical records. In 2001 Colloquis publicly launched SmarterChild, on platforms like AIM and MSN Messenger. While entirely text-based SmarterChild was able to play games, check the weather, look up facts, and converse with users to an extent. The first modern digital virtual assistant installed on a smartphone was Siri, which was introduced as a feature of the iPhone 4S on 4 October 2011. Apple Inc. developed Siri following the 2010 acquisition of Siri Inc., a spin-off of SRI International, which is a research institute financed by DARPA and the United States Department of Defense. Its aim was to aid in tasks such as sending a text message, making phone calls, checking the weather or setting up an alarm. Over time, it has developed to provide restaurant recommendations, search the internet, and provide driving directions. In November 2014, Amazon announced Alexa alongside the Echo. In 2016, Salesforce debuted Einstein, developed from a set of technologies underlying the Salesforce platform. Einstein was replaced by Agentforce, an agentic AI, in September 2024. In April 2017 Amazon released a service for building conversational interfaces for any type of virtual assistant or interface. === Large Language Models: 2020s-present === In the 2020s, artificial intelligence (AI) systems like ChatGPT have gained popularity for their ability to generate human-like responses to text-based conversations. In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which was then the "largest language model ever published at 17 billion parameters." On November 30, 2022, ChatGPT was launched as a prototype and quickly garnered attention for its detailed responses and articulate answers across many domains of knowledge. The advent of ChatGPT and its introduction to the wider public increased interest and competition in the space. In February 2023, Google began introducing an experimental service called "Bard" which is based on its LaMDA program to generate text responses to questions asked based on information gathered from the web. While ChatGPT and other generalized chatbots based on the latest generative AI are capable of performing various tasks associated with virtual assistants, there are also more specialized forms of such technology that are designed to target more specific situations or needs. == Method of interaction == Virtual assistants work via: Text, including: online chat (especially in an instant messaging application or other application ), SMS text, e-mail or other text-based communication channel, for example Conversica's intelligent virtual assistants for business. Voice: for example with Amazon Alexa on Amazon Echo devices, Siri on an iPhone, Google Assistant on Google-enabled Android devices, or Bixby on Samsung devices. Images: some assistants, such as Google Assistant (which includes Google Lens) and Bixby on the Samsung Galaxy series, have the added capability of performing image processing to recognize objects in images. Many virtual assistants are accessible via multiple methods, offering versatility in how users can interact with them, whether through chat, voice commands, or other integrated technologies. Virtual assistants use natural language processing (NLP) to match user text or voice input to executable commands. Some continually learn using artificial intelligence techniques including machine learning and ambient intelligence. To activate a virtual assistant u

    Read more →
  • AppValley

    AppValley

    AppValley is an independent American digital distribution service operated and trademarked by AppValley LLC. It serves as an alternative app store for the iOS mobile operating system, which allows users to download applications that are not available on the App Store, most commonly tweaked "++" apps, jailbreak apps, and apps including paid apps on the app store. == Legality == AppValley is among several services that violate enterprise developer certificates from Apple. The terms under which these are granted make clear that they are for companies who wish to distribute apps to their employees. AppValley uses these certificates to distribute software directly to non-employees, thereby bypassing the AppStore. AppValley's conduct had implications in U.S. sanctioned markets like Iran, Iraq, North Korea, Cuba, and Venezuela, which have all been subject to commercial sanctions. Among the software offered by AppValley and other services is pirated software, including paid apps on the app store and premium versions of Instagram, Spotify, Pokémon Go, and others. For instance, AppValley distributes an ad-free version of the music streaming app Spotify even on the free tier. == History == The website was founded in May 2017, releasing late that month with a very basic version of the app. There were less than 100 apps available for download at this time. On Jan 19, 2018, a new version dubbed AppValley 2.0 was released bringing dark mode, more categories, a search, and a much faster interface. On February 14, 2019, a Chinese partner "Jason Wu" allegedly took control of the main Twitter account and domain, causing the original AppValley developers to migrate to the domain app-valley.vip and the Twitter account handle @App_Valley_vip. As of September 2024, the app-valley.vip domain now redirects to appvalley.signulous.com. Today, AppValley continues to offer an alternative to Apple's App Store where app developers can publish their applications. == Features == AppValley is a mobile app installer which can also support iOS version that can be installed and downloaded on the mobile or the devices of the people who wish to get access to many different applications available. AppValley also contains apps that have been modified or tweaked for user preferences, and allows the user to by pass national restrictions on the use of apps, without having to resort to jailbreaking. As of June 2, 2020, there are over 1300 apps available for download.

    Read more →
  • Artificial intelligence content detection

    Artificial intelligence content detection

    Artificial intelligence detection software aims to determine whether some content (text, image, video, or audio) was generated using artificial intelligence (AI). This software is often unreliable. == Accuracy issues == Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing. === False positives === In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English, and also to neurodivergent people. In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated. === False negatives === A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate. == Text detection == For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations). Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns. They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing. Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English. Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing. In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later. In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content. In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%. An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated. In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students. To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs. == Anti text detection == There is software available designed to bypass AI text detection. In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%. Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023) also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. == Image, video, and audio detection == Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable. Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research. Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either. Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.

    Read more →
  • MyPertamina

    MyPertamina

    MyPertamina is a digital financial service platform from Pertamina that integrated with the apps LinkAja. This application is used for non-cash fuel oil payments at Pertamina's public fueling stations. == History == Originally, MyPertamina were merchandise outlets of Pertamina products. It was launched on December 21, 2016, with 3 outlets in Jakarta. MyPertamina sells clothes, hats, and other products with Pertamina products brands. One month later (January 2017), Pertamina and Bank Mandiri entered into a partnership to launch the Mandiri Credit Card Pertamina Mastercard product, so that consumers can make payments when users fill up fuel at Pertamina gas stations. In August 2017, MyPertamina app and electronic card were launched through MyPertamina Loyalty program at Gaikindo Indonesia International Auto Show 2017. The card can be used on EDC machines for non-cash payments. Initial balances are in its own app, that can be top up by ATMs and online banking.

    Read more →
  • Cloud-to-cloud integration

    Cloud-to-cloud integration

    Cloud-to-Cloud Integration ( C2I ) allows users to connect disparate cloud computing platforms. While Paas (Platform as a service) and Saas (Software as a service) continue to gain momentum, different vendors have different implementations for cloud computing, e.g. Database, REST, SOAP API. Another name for Cloud-to-Cloud Integration is Cloud-Surfing. See also Cloud-based integration

    Read more →
  • Stochastic parrot

    Stochastic parrot

    In machine learning, the term stochastic parrot is a metaphor that frames large language models as systems that statistically mimic text without real understanding. The word "stochastic" – from the ancient Greek "στοχαστικός" (stokhastikos, 'based on guesswork') – is a term from probability theory meaning "randomly determined". The word "parrot" refers to parrots' ability to mimic human speech. The term was introduced in a 2021 paper on AI ethics titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" and authored by Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell. The paper outlined possible risks associated with large language models (LLMs). In December 2020, it was the subject of a workplace dispute between Gebru (then co-leader of Google's Ethical Artificial Intelligence Team) and Google, which had requested the retraction of the paper. The incident culminated in Gebru's controversial departure from the company. The paper was later presented at the 2021 ACM Conference, and the term "stochastic parrot" has seen widespread use in academic research concerning generative AI and LLMs. The term has been interpreted negatively as an insult towards AI. == Background == Timnit Gebru is an AI ethics researcher, Emily M. Bender is a linguist specializing in computational linguistics, and Margaret Mitchell is a computer scientist specializing in algorithmic bias. Gebru had joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Mitchell. In late 2020, the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" was co-written by Gebru and five other researchers, four of whom were Google employees. The paper argues that large language models (LLMs) present significant risks such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception as LLMs do not understand the concepts underlying what they learn. The paper states that LLMs are "stitching together sequences of linguistic forms ... observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning." Therefore, they are labeled "stochastic parrots". === Dismissal of Gebru by Google === After the paper was submitted for consideration to the 2021 ACM Conference, Google requested that Gebru either retract the paper from the conference or remove the names of Google employees from it. Gebru refused to do so without further discussion, and emailed Google Research vice president Megan Kacholia that if the company could not explain the request for retraction and address other concerns regarding similar projects, she would plan to resign after a transition period, stating that they could "work on a last date". The following day, on December 2, 2020, Gebru received an email saying that Google was "accepting her resignation". Her abrupt firing sparked protests by Google employees and negative publicity for the company. == Usage == The phrase has been used by AI skeptics to signify that LLMs lack understanding of the meaning of their outputs. Sam Altman, CEO of OpenAI, used the term shortly after the release of ChatGPT in December 2022, tweeting "i am a stochastic parrot, and so r u". The term was nominated as the 2023 AI-related Word of the Year by the American Dialect Society. == Debate == Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations. The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply "parroting". According to machine learning researchers Lindholm, Wahlström, Lindsten, and Schön, the term "stochastic parrot" highlights two vital limitations of LLMs: LLMs are limited by the data they are trained on and are simply stochastically repeating contents of datasets. Because they are just making up outputs based on training data, LLMs do not understand if they are saying something incorrect or inappropriate. Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are "dangerously wrong". === Subjective experience === In the mind of a human being, words and language correspond to things one has experienced. For LLMs, according to proponents of the theory, words correspond only to other words and patterns of usage fed into their training data. Proponents of the idea of stochastic parrots thus conclude that statements about LLMs are due to "the human tendency to attribute meaning to text", and claim this occurs despite the LLMs not actually understanding language. === Fine-tuning === Kelsey Piper argued that the claim that LLMs are stochastic parrots or mere "next-token predictors" focuses on pre-training, ignoring that modern LLMs are also fine-tuned to follow instructions and to prefer accurate answers. === Hallucinations and mistakes === The tendency of LLMs to pass off false information as fact is held as support. Called hallucinations or confabulations, LLMs will occasionally synthesize information that matches some pattern. LLMs may fail to distinguish fact and fiction, which leads to the claim that they can't connect words to a comprehension of the world, as humans do. Furthermore, LLMs may fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language. For example: The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore. Can I replace 'my favorite newspaper' by 'the wet newspaper that fell down off the table' in the second sentence? GPT-4, an LLM released in March 2023, responded yes, not understanding that the meaning of "newspaper" is different in these two contexts; it is first an object and second an institution. === Benchmarks and experiments === One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE). GPT-4 scored in the >90th-percentile on the Uniform Bar Examination and achieved 93% accuracy on the MATH benchmark of high-school Olympiad problems, results that exceed rote pattern-matching expectations. Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey. === Expert rebuttals === Some AI researchers dispute the notion that LLMs merely "parrot" their training data. Geoffrey Hinton, a pioneering figure in neural networks, counters that the metaphor misunderstands the prerequisite for accurate language prediction. He argues that "to predict the next word accurately, you have to understand the sentence", a view he presented on 60 Minutes in 2023. From this perspective, understanding is not an alternative to statistical prediction, but rather an emergent property required to perform it effectively at scale. Hinton also uses logical puzzles to demonstrate that LLMs actually understand language. A 2024 Scientific American investigation described a closed Berkeley workshop where state-of-the-art models solved novel tier-4 mathematics problems and produced coherent proofs, indicating reasoning abilities beyond memorization. The GPT-4 Technical Report showed human-level results on professional and academic exams (e.g., the Uniform Bar Exam and USMLE), challenging the "parrot" characterization. Anthropic conducted mechanistic interpretability research on Claude, using attribution graphs to identify circuits. The research showed how the LLM processes information via chains of fuzzy logical inference, and indicated an ability to plan ahead. They found that Claude 3.5 Haiku "employs remarkably general abstractions", forms "internally generated plans for its future outputs" and "works backwards from its longer-term goals". They noted that "The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph." They also found that the model includes "mechanisms that could underlie a simple form of metacognition", in that it "thinks about" the level of its own knowledge before reaching its answer. === Interpretability === Another line of evidence against the 'stochastic parrot' claim comes from mechanistic interpretability, a research field dedicated to reverse-engineering LLMs to understand their internal workings. Rather than only observing the model's input-output behavior, these techniques probe the model's internal activations, which can be used to determine if they contain structured representations of the world. The goal is to investigate whether LLMs are merely manipulating surface statistics or if t

    Read more →
  • Ayoba

    Ayoba

    Ayoba is an African communication platform developed in South Africa. It is owned by Progressive Tech Holdings in Mauritius and managed by SIMFY Africa. Launched on May 4, 2019, as of April 2024, it has over 35 million active users. == History == Ayoba was first published on Google Play in February 2019. Its first marketing campaign and brand launch took place in Cameroon on May 4, 2019. In June 2019, the platform introduced its first eight channels. In November 2019, the platform reached one million active users, which increased to two million by June 2020. Subsequently, ayoba expanded its services, including the launch of games for Android in February 2020, Momo (Mobile Money) in Cameroon in May 2020, and MicroApps in May 2020. It also launched music and voice and video calling features in 12 territories in August 2020. The first version of ayoba for iOS was released in September 2020. In December of the same year, games and Messaging 2.0 were launched on the platform. In November 2020, it won Best Mobile Application at the African Digital Awards. In 2021, it won OTT Brand of the Year at the Marketing World Awards in Ghana. In December 2022, it received Top Innovative Technology and Telecom Product of the Year at the National Communications Awards in December 2022. In June 2023 ayoba partnered with BoomPlay and as of April 2024, it had 35 million monthly active users. Ayoba has partnered with Jumia Ghana to offer exclusive deals to users. Ayoba users can get a 10% discount on selected Jumia purchases through the app, with no data charges for MTN users. This partnership aims to make online shopping more affordable and accessible by integrating Jumia's offers into the ayoba app. Ayoba supports over 35 million users across Africa and provides services in 22 languages. To access the deals, users can download the ayoba app from the Google Play Store, iOS Store, or the official website. == Platform features == Chat, Call and Share: ayoba enables instant messaging, voice notes, picture sharing, and file sharing with contacts, even if they do not have the app installed. The app supports voice and video calls on both Android and iOS, as well as group chats, help channel and SMS continuity (non ayoba users receive messages as SMS, their responses appear in the ayoba app). Music: ayoba offers a free music player with daily updates on international and African music. Users can find playlists for different genres. Games: ayoba provides a selection of interactive games, including action, adventure, and children's games available on both Android and iOS. Mobile Money Transfers: In certain territories, ayoba supports mobile money transfers using MTN Mobile Money (MoMo) for transactions within the app. MicroApps: ayoba features individual MicroApps within the platform that offer content and services, including streaming channels, podcasts, and specialized apps. The availability of these apps may vary by country. == Operations == ayoba primarily focuses on the following territories: Nigeria, Cameroon, South Africa, Ghana, Côte d'Ivoire, Uganda, Republic of Congo, Benin, Zambia, Tanzania, Kenya, Senegal, Togo, Guinea Bissau, Guinea Conakry, Sudan, South Sudan, and Liberia. The company operates from its offices in Cape Town and Johannesburg, South Africa. David Gillaranz served as the CEO from 2019 to 2021, and Burak Akinci has been the CEO since 2021.

    Read more →
  • Kleene star

    Kleene star

    In formal language theory, the Kleene star (or Kleene operator or Kleene closure) refers to two related unary operations, that can be applied either to an alphabet of symbols or to a formal language, a set of strings (finite sequences of symbols). The Kleene star operator on an alphabet V generates the set V of all finite-length strings over V, that is, finite sequences whose elements belong to V; in mathematics, it is more commonly known as the free monoid construction. The Kleene star operator on a language L generates another language L, the set of all strings that can be obtained as a concatenation of zero or more members of L. In both cases, repetitions are allowed. The Kleene star operators are named after American mathematician Stephen Cole Kleene, who first introduced and widely used it to characterize automata for regular expressions. == Of an alphabet == Given an alphabet V {\displaystyle V} , define V 0 = { ε } {\displaystyle V^{0}=\{\varepsilon \}} (the set consists only of the empty string), V 1 = V , {\displaystyle V^{1}=V,} and define recursively the set V i + 1 = { w v : w ∈ V i and v ∈ V } {\displaystyle V^{i+1}=\{wv:w\in V^{i}{\text{ and }}v\in V\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by appending the single character v {\displaystyle v} to the end of w {\displaystyle w} . Here, V i {\displaystyle V^{i}} can be understood to be the set of all strings of length exactly i {\displaystyle i} , with characters from V {\displaystyle V} . The definition of Kleene star on V {\displaystyle V} is V ∗ = ⋃ i ≥ 0 V i = V 0 ∪ V 1 ∪ V 2 ∪ V 3 ∪ V 4 ∪ ⋯ . {\displaystyle V^{}=\bigcup _{i\geq 0}V^{i}=V^{0}\cup V^{1}\cup V^{2}\cup V^{3}\cup V^{4}\cup \cdots .} == Of a language == Given a language L {\displaystyle L} (any finite or infinite set of strings), define L 0 = { ε } {\displaystyle L^{0}=\{\varepsilon \}} (the language consisting only of the empty string), L 1 = L , {\displaystyle L^{1}=L,} and define recursively the set L i + 1 = { w v : w ∈ L i and v ∈ L } {\displaystyle L^{i+1}=\{wv:w\in L^{i}{\text{ and }}v\in L\}} for each i > 0 , {\displaystyle i>0,} where w v {\displaystyle wv} denotes the string obtained by concatenating w {\displaystyle w} and v {\displaystyle v} . Here, L i {\displaystyle L^{i}} can be understood to be the set of all strings that can be obtained by concatenating exactly i {\displaystyle i} strings from L {\displaystyle L} , allowing repetitions. The definition of Kleene star on L {\displaystyle L} is L ∗ = ⋃ i ≥ 0 L i = L 0 ∪ L 1 ∪ L 2 ∪ L 3 ∪ L 4 ∪ ⋯ . {\displaystyle L^{}=\bigcup _{i\geq 0}L^{i}=L^{0}\cup L^{1}\cup L^{2}\cup L^{3}\cup L^{4}\cup \cdots .} == Kleene plus == In some formal language studies, (e.g. AFL theory) a variation on the Kleene star operation called the Kleene plus is used. The Kleene plus omits the V 0 {\displaystyle V^{0}} or L 0 {\displaystyle L^{0}} term in the above unions. In other words, the Kleene plus on V {\displaystyle V} is V + = ⋃ i ≥ 1 V i = V 1 ∪ V 2 ∪ V 3 ∪ ⋯ , {\displaystyle V^{+}=\bigcup _{i\geq 1}V^{i}=V^{1}\cup V^{2}\cup V^{3}\cup \cdots ,} or V + = V ∗ V . {\displaystyle V^{+}=V^{}V.} == Examples == Example of Kleene star applied to a set of strings: {"ab","c"} = { ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "ababc", "abcab", "abcc", "cabab", "cabc", "ccab", "ccc", ...}. Example of Kleene star applied to a set of strings without the prefix property: {"a","ab","b"} = { ε, "a", "ab", "b", "aa", "aab", "aba", "abab", "abb", "ba", "bab", "bb", ...};In this example, the string "aab" can be obtained in two different ways. The Sardinas-Patterson algorithm can be used to check for a given V whether any member of V can be obtained in more than one way. Example of Kleene and Kleene plus applied to a set of characters (following the C programming language convention where a character is denoted by single quotes and a string is denoted by double quotes): {'a', 'b', 'c'} = { ε, "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. {'a', 'b', 'c'}+ = { "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", ...}. == Properties == If V {\displaystyle V} is any finite or countably infinite set of characters, then V ∗ {\displaystyle V^{}} is a countably infinite set. As a result, each formal language over a finite or countably infinite alphabet Σ {\displaystyle \Sigma } is countable, since it is a subset of the countably infinite set Σ ∗ {\displaystyle \Sigma ^{}} . ( L ∗ ) ∗ = L ∗ {\displaystyle (L^{})^{}=L^{}} , which means that the Kleene star operator is an idempotent unary operator, as ( L ∗ ) i = L ∗ {\displaystyle (L^{})^{i}=L^{}} for every i ≥ 1 {\displaystyle i\geq 1} . V ∗ = { ε } {\displaystyle V^{}=\{\varepsilon \}} , if V {\displaystyle V} is the empty set ∅. For the version of the Kleene star operator on languages, L ∗ = { ε } {\displaystyle L^{}=\{\varepsilon \}} when L {\displaystyle L} is either the empty set ∅ or the singleton set { ε } {\displaystyle \{\varepsilon \}} . == Generalization == Strings form a monoid with concatenation as the binary operation and ε the identity element. In addition to strings, the Kleene star is defined for any monoid. More precisely, let (M, ⋅) be a monoid, and S ⊆ M. Then S is the smallest submonoid of M containing S; that is, S contains the neutral element of M, the set S, and is such that if x,y ∈ S, then x⋅y ∈ S. Furthermore, the Kleene star is generalized by including the -operation (and the union) in the algebraic structure itself by the notion of complete star semiring.

    Read more →
  • SGT STAR

    SGT STAR

    SGT STAR, also known as Sgt. Star or Sergeant Star, was a chatbot operated by the United States Army to answer questions about recruitment. == Background == After the September 11 attacks, traffic increased significantly to chatrooms on the U.S. Army's website, goarmy.com, increasing costs of staffing the live chatrooms. As a cost-cutting measure, the SGT STAR project was initiated as a partnership between the United States Army Accessions Command and Spectre AI, a wholly owned subsidiary of Next IT. Next IT, a Spokane, Washington-based company deploys "intelligent virtual assistants," using its software dubbed "ActiveAgent" which is a framework for functional presence engines. Testing began in 2003, and SGT STAR launched to the public in 2006. "STAR" is an acronym for "strong, trained and ready." SGT STAR was launched as a chat interface on goarmy.com, but has since been developed as a mobile application, as well as a life-size animated projection that has appeared live at public events. SGT STAR can also interact with users on Facebook. == FOIA request == In 2013, the Electronic Frontier Foundation filed a Freedom of Information Act request to learn more about SGT STAR, including input and output patterns (questions and answers), usage statistics, contracts, and privacy policies. They received these records in April 2014, after coverage from various media outlets and a tongue-in-cheek campaign to "Free Sgt. Star."

    Read more →
  • Cloem

    Cloem

    Cloem is a company based in Cannes, France, which applies natural language processing (NLP) technologies to assist patent applicants in creating variants of patent claims, called "cloems". According to the company, these "computer-generated claims can be published to keep potential competitors from attempting to file adjacent patent claims." == Technology == According to Cloem, dictionaries, ontologies and proprietary claim-drafting algorithms are used to draft alternative claims based on a client's original set of claims. In particular, the original set of claims is subject to various permutations and linguistic manipulations "by considering alternative definitions for terms as well as “synonyms, hyponyms, hyperonyms, meronyms, holonyms, and antonyms.”" == Possible uses == Cloem can optionally publish one or more created texts, as electronic publications or as paper-printed publications. These can potentially serve – through a defensive publication – as prior art to prevent another party for obtaining a patent on the subject-matter at stake. In other words, after an initial patent filing, an "improvement" patent (adjacent invention) can be applied for by another party, such as a competitor. By publishing variants of a patent claim, the risk of adverse patenting may potentially be decreased (improvement inventions may no longer be patentable). Cloems may also be potentially patentable. One of the issues of patentability, however, is that only a natural person can be a listed as an inventor on a patent. Since cloems are produced by a computer based on a person's input, it is not clear if the computer or the person is the inventor. The inventorship of Cloem texts is an open question.

    Read more →
  • LaMDA

    LaMDA

    LaMDA (Language Model for Dialogue Applications) is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Gemini (then Bard), a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT. == History == === Background === On January 28, 2020, Google unveiled Meena, a neural network-powered chatbot with 2.6 billion parameters, which Google claimed to be superior to all other existing chatbots. The company previously hired computer scientist Ray Kurzweil in 2012 to develop multiple chatbots for the company, including one named Danielle. The Google Brain research team, who developed Meena, hoped to release the chatbot to the public in a limited capacity, but corporate executives refused on the grounds that Meena violated Google's "AI principles around safety and fairness". Meena was later renamed LaMDA as its data and computing power increased, and the Google Brain team again sought to deploy the software to the Google Assistant, the company's virtual assistant software, in addition to opening it up to a public demo. Both requests were once again denied by company leadership. LaMDA's two lead researchers, Daniel de Freitas and Noam Shazeer, eventually left the company in frustration. === First generation === Google announced the LaMDA conversational large language model during the Google I/O keynote on May 18, 2021, powered by artificial intelligence. The acronym stands for "Language Model for Dialogue Applications". Built on the seq2seq architecture, transformer-based neural networks developed by Google Research in 2017, LaMDA was trained on human dialogue and stories, allowing it to engage in open-ended conversations. Google states that responses generated by LaMDA have been ensured to be "sensible, interesting, and specific to the context". LaMDA has access to multiple symbolic text processing systems, including a database, a real-time clock and calendar, a mathematical calculator, and a natural language translation system, giving it superior accuracy in tasks supported by those systems, and making it among the first dual process chatbots. LaMDA is also not stateless because its "sensibleness" metric is fine-tuned by "pre-conditioning" each dialog turn by prepending many of the most recent dialog interactions, on a user-by-user basis. LaMDA is tuned on nine unique performance metrics: sensibleness, specificity, interestingness, safety, groundedness, informativeness, citation accuracy, helpfulness, and role consistency. Tests by Google indicated that LaMDA surpassed human responses in the area of interestingness. The pre-training dataset consists of 2.97B documents, 1.12B dialogs, and 13.39B utterances, for a total of 1.56T words. The largest LaMDA model has 137B non-embedding parameters. === Second generation === On May 11, 2022, Google unveiled LaMDA 2, the successor to LaMDA, during the 2022 Google I/O keynote. The new incarnation of the model draws examples of text from numerous sources, using it to formulate unique "natural conversations" on topics that it may not have been trained to respond to. === Sentience claims === On June 11, 2022, The Washington Post reported that Google engineer Blake Lemoine had been placed on paid administrative leave after Lemoine told company executives Blaise Agüera y Arcas and Jen Gennai that LaMDA had become sentient. Lemoine came to this conclusion after the chatbot made questionable responses to questions regarding self-identity, moral values, religion, and Isaac Asimov's Three Laws of Robotics. Google refuted these claims, insisting that there was substantial evidence to indicate that LaMDA was not sentient. In an interview with Wired, Lemoine reiterated his claims that LaMDA was "a person" as dictated by the Thirteenth Amendment to the U.S. Constitution, comparing it to an "alien intelligence of terrestrial origin". He further revealed that he had been dismissed by Google after he hired an attorney on LaMDA's behalf after the chatbot requested that Lemoine do so. On July 22, Google fired Lemoine, asserting that Blake had violated their policies "to safeguard product information" and rejected his claims as "wholly unfounded". Internal controversy instigated by the incident prompted Google executives to decide against releasing LaMDA to the public, which it had previously been considering. Lemoine's claims were widely pushed back by the scientific community. Many experts rejected the idea that LaMDA was sentient, including former New York University psychology professor Gary Marcus, David Pfau of Google sister company DeepMind, Erik Brynjolfsson of the Institute for Human-Centered Artificial Intelligence at Stanford University, and University of Surrey professor Adrian Hilton. Yann LeCun, who leads Meta Platforms' AI research team, stated that neural networks such as LaMDA were "not powerful enough to attain true intelligence". University of California, Santa Cruz professor Max Kreminski noted that LaMDA's architecture did not "support some key capabilities of human-like consciousness" and that its neural network weights were "frozen", assuming it was a typical large language model. Philosopher Nick Bostrom noted, however, that the lack of precise and consensual criteria for determining whether a system is conscious warrants some uncertainty. IBM Watson lead developer David Ferrucci compared how LaMDA appeared to be human in the same way Watson did when it was first introduced. Former Google AI ethicist Timnit Gebru called Lemoine a victim of a "hype cycle" initiated by researchers and the media. Lemoine's claims have also generated discussion on whether the Turing test remained useful to determine researchers' progress toward achieving artificial general intelligence, with Will Omerus of the Post opining that the test actually measured whether machine intelligence systems were capable of deceiving humans, while Brian Christian of The Atlantic said that the controversy was an instance of the ELIZA effect. == Products == === AI Test Kitchen === With the unveiling of LaMDA 2 in May 2022, Google also launched the AI Test Kitchen, a mobile application for the Android operating system powered by LaMDA capable of providing lists of suggestions on-demand based on a complex goal. Originally open only to Google employees, the app was set to be made available to "select academics, researchers, and policymakers" by invitation sometime in the year. In August, the company began allowing users in the U.S. to sign up for early access. In November, Google released a "season 2" update to the app, integrating a limited form of Google Brain's Imagen text-to-image model. A third iteration of the AI Test Kitchen was in development by January 2023, expected to launch at I/O later that year. Following the 2023 I/O keynote in May, Google added MusicLM, an AI-powered music generator first previewed in January, to the AI Test Kitchen app. In August, the app was delisted from Google Play and the Apple App Store, instead moving completely online. === Bard === On February 6, 2023, Google announced Bard, a conversational AI chatbot powered by LaMDA, in response to the unexpected popularity of OpenAI's ChatGPT chatbot. Google positions the chatbot as a "collaborative AI service" rather than a search engine. Bard became available for early access on March 21. === Other products === In addition to Bard, Pichai also unveiled the company's Generative Language API, an application programming interface also based on LaMDA, which he announced would be opened up to third-party developers in March 2023. == Architecture == LaMDA is a decoder-only Transformer language model. It is pre-trained on a text corpus that includes both documents and dialogs consisting of 1.56 trillion words, and is then trained with fine-tuning data generated by manually annotated responses for "sensibleness, interestingness, and safety". LaMDA was retrieval-augmented to improve the accuracy of facts provided to the user. Three different models were tested, with the largest having 137 billion non-embedding parameters:

    Read more →