AI Assistant Editor

AI Assistant Editor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Empirical dynamic modeling

Empirical dynamic modeling (EDM) is a framework for analysis and prediction of nonlinear dynamical systems. Applications include population dynamics, ecosystem service, medicine, neuroscience, dynamical systems, geophysics, and human-computer interaction. EDM was originally developed by Robert May and George Sugihara. It can be considered a methodology for data modeling, predictive analytics, dynamical system analysis, machine learning and time series analysis. == Description == Mathematical models have tremendous power to describe observations of real-world systems. They are routinely used to test hypothesis, explain mechanisms and predict future outcomes. However, real-world systems are often nonlinear and multidimensional, in some instances rendering explicit equation-based modeling problematic. Empirical models, which infer patterns and associations from the data instead of using hypothesized equations, represent a natural and flexible framework for modeling complex dynamics. Donald DeAngelis and Simeon Yurek illustrated that canonical statistical models are ill-posed when applied to nonlinear dynamical systems. A hallmark of nonlinear dynamics is state-dependence: system states are related to previous states governing transition from one state to another. EDM operates in this space, the multidimensional state-space of system dynamics rather than on one-dimensional observational time series. EDM does not presume relationships among states, for example, a functional dependence, but projects future states from localised, neighboring states. EDM is thus a state-space, nearest-neighbors paradigm where system dynamics are inferred from states derived from observational time series. This provides a model-free representation of the system naturally encompassing nonlinear dynamics. A cornerstone of EDM is recognition that time series observed from a dynamical system can be transformed into higher-dimensional state-spaces by time-delay embedding with Takens's theorem. The state-space models are evaluated based on in-sample fidelity to observations, conventionally with Pearson correlation between predictions and observations. == Methods == Primary EDM algorithms include Simplex projection, Sequential locally weighted global linear maps (S-Map) projection, Multivariate embedding in Simplex or S-Map, Convergent cross mapping (CCM), and Multiview Embeding, described below. Nearest neighbors are found according to: NN ( y , X , k ) = ‖ X N i E − y ‖ ≤ ‖ X N j E − y ‖ if 1 ≤ i ≤ j ≤ k {\displaystyle {\text{NN}}(y,X,k)=\|X_{N_{i}}^{E}-y\|\leq \|X_{N_{j}}^{E}-y\|{\text{ if }}1\leq i\leq j\leq k} === Simplex === Simplex projection is a nearest neighbor projection. It locates the k {\displaystyle k} nearest neighbors to the location in the state-space from which a prediction is desired. To minimize the number of free parameters k {\displaystyle k} is typically set to E + 1 {\displaystyle E+1} defining an E + 1 {\displaystyle E+1} dimensional simplex in the state-space. The prediction is computed as the average of the weighted phase-space simplex projected T p {\displaystyle Tp} points ahead. Each neighbor is weighted proportional to their distance to the projection origin vector in the state-space. Find k {\displaystyle k} nearest neighbor: N k ← NN ( y , X , k ) {\displaystyle N_{k}\gets {\text{NN}}(y,X,k)} Define the distance scale: d ← ‖ X N 1 E − y ‖ {\displaystyle d\gets \|X_{N_{1}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ⁡ ( − ‖ X N i E − y ‖ / d ) {\displaystyle w_{i}\gets \exp(-\|X_{N_{i}}^{E}-y\|/d)} Average of state-space simplex: y ^ ← ∑ i = 1 k ( w i X N i + T p ) / ∑ i = 1 k w i {\displaystyle {\hat {y}}\gets \sum _{i=1}^{k}\left(w_{i}X_{N_{i}+T_{p}}\right)/\sum _{i=1}^{k}w_{i}} === S-Map === S-Map extends the state-space prediction in Simplex from an average of the E + 1 {\displaystyle E+1} nearest neighbors to a linear regression fit to all neighbors, but localised with an exponential decay kernel. The exponential localisation function is F ( θ ) = exp ( − θ d / D ) {\displaystyle F(\theta )={\text{exp}}(-\theta d/D)} , where d {\displaystyle d} is the neighbor distance and D {\displaystyle D} the mean distance. In this way, depending on the value of θ {\displaystyle \theta } , neighbors close to the prediction origin point have a higher weight than those further from it, such that a local linear approximation to the nonlinear system is reasonable. This localisation ability allows one to identify an optimal local scale, in-effect quantifying the degree of state dependence, and hence nonlinearity of the system. Another feature of S-Map is that for a properly fit model, the regression coefficients between variables have been shown to approximate the gradient (directional derivative) of variables along the manifold. These Jacobians represent the time-varying interaction strengths between system variables. Find k {\displaystyle k} nearest neighbor: N ← NN ( y , X , k ) {\displaystyle N\gets {\text{NN}}(y,X,k)} Sum of distances: D ← 1 k ∑ i = 1 k ‖ X N i E − y ‖ {\displaystyle D\gets {\frac {1}{k}}\sum _{i=1}^{k}\|X_{N_{i}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ⁡ ( − θ ‖ X N i E − y ‖ / D ) {\displaystyle w_{i}\gets \exp(-\theta \|X_{N_{i}}^{E}-y\|/D)} Reweighting matrix: W ← diag ( w i ) {\displaystyle W\gets {\text{diag}}(w_{i})} Design matrix: A ← [ 1 X N 1 X N 1 − 1 … X N 1 − E + 1 1 X N 2 X N 2 − 1 … X N 2 − E + 1 ⋮ ⋮ ⋮ ⋱ ⋮ 1 X N k X N k − 1 … X N k − E + 1 ] {\displaystyle A\gets {\begin{bmatrix}1&X_{N_{1}}&X_{N_{1}-1}&\dots &X_{N_{1}-E+1}\\1&X_{N_{2}}&X_{N_{2}-1}&\dots &X_{N_{2}-E+1}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&X_{N_{k}}&X_{N_{k}-1}&\dots &X_{N_{k}-E+1}\end{bmatrix}}} Weighted design matrix: A ← W A {\displaystyle A\gets WA} Response vector at T p {\displaystyle Tp} : b ← [ X N 1 + T p X N 2 + T p ⋮ X N k + T p ] {\displaystyle b\gets {\begin{bmatrix}X_{N_{1}+T_{p}}\\X_{N_{2}+T_{p}}\\\vdots \\X_{N_{k}+T_{p}}\end{bmatrix}}} Weighted response vector: b ← W b {\displaystyle b\gets Wb} Least squares solution (SVD): c ^ ← argmin c ‖ A c − b ‖ 2 2 {\displaystyle {\hat {c}}\gets {\text{argmin}}_{c}\|Ac-b\|_{2}^{2}} Local linear model c ^ {\displaystyle {\hat {c}}} is prediction: y ^ ← c ^ 0 + ∑ i = 1 E c ^ i y i {\displaystyle {\hat {y}}\gets {\hat {c}}_{0}+\sum _{i=1}^{E}{\hat {c}}_{i}y_{i}} === Multivariate Embedding === Multivariate Embedding recognizes that time-delay embeddings are not the only valid state-space construction. In Simplex and S-Map one can generate a state-space from observational vectors, or time-delay embeddings of a single observational time series, or both. === Convergent Cross Mapping === Convergent cross mapping (CCM) leverages a corollary to the Generalized Takens Theorem that it should be possible to cross predict or cross map between variables observed from the same system. Suppose that in some dynamical system involving variables X {\displaystyle X} and Y {\displaystyle Y} , X {\displaystyle X} causes Y {\displaystyle Y} . Since X {\displaystyle X} and Y {\displaystyle Y} belong to the same dynamical system, their reconstructions (via embeddings) M x {\displaystyle M_{x}} , and M y {\displaystyle M_{y}} , also map to the same system. The causal variable X {\displaystyle X} leaves a signature on the affected variable Y {\displaystyle Y} , and consequently, the reconstructed states based on Y {\displaystyle Y} can be used to cross predict values of X {\displaystyle X} . CCM leverages this property to infer causality by predicting X {\displaystyle X} using the M y {\displaystyle M_{y}} library of points (or vice versa for the other direction of causality), while assessing improvements in cross map predictability as larger and larger random samplings of M y {\displaystyle M_{y}} are used. If the prediction skill of X {\displaystyle X} increases and saturates as the entire M y {\displaystyle M_{y}} is used, this provides evidence that X {\displaystyle X} is casually influencing Y {\displaystyle Y} . === Multiview Embedding === Multiview Embedding is a Dimensionality reduction technique where a large number of state-space time series vectors are combitorially assessed towards maximal model predictability. == Extensions == Extensions to EDM techniques include: Generalized Theorems for Nonlinear State Space Reconstruction Extended Convergent Cross Mapping Dynamic stability S-Map regularization Visual analytics with EDM Convergent Cross Sorting Expert system with EDM hybrid Sliding windows based on the extended convergent cross-mapping Empirical Mode Modeling Accounting for missing data and variable step sizes Accounting for observation noise Hierarchical Bayesian EDM via Gaussian processes Intelligent and Adaptive Control Optimal control via Empirical dynamic programming Multiview distance regularised S-map
Read more →
ACM SIGEVO

The ACM SIGEVO is a Special Interest Group of the Association of Computing Machinery for members of that organization who are practitioners, academics, students or others with interests in evolutionary computation and related algorithms. == History == ACM SIGEVO was founded in 2005 when the International Society for Genetic and Evolutionary Computation (ISGEC) became an ACM Special Interest Group under its present title. The ISGEC had been formed in 1999 by the merger of the Genetic Programming conference organization with the International Conference on Genetic Algorithms (ICGA) leading to the first Genetic and Evolutionary Computation Conference (GECCO). == Membership == Members of this SIG pay a small fee in addition to the ACM membership fee. In return they have access to a quarterly online newsletter, but more importantly can obtain reduced registration rates at the two conferences organised by ACM SIGEVO: GECCO and the Foundations of Genetic Algorithms conference (FOGA). They can also access material on evolutionary computation and related topics in the ACM Digital Library. In addition they can subscribe to email mailing lists in order to keep informed about news over time. For students, ACM SIGEVO sponsors Travel Awards for attendance at the GECCO Conference and FOGA (the Foundations of Genetic Algorithms conference). ACM SIGEVO also sponsors a Graduate Student Workshop. ACM also sponsors Awards to be competed for by attendees at the conferences it organises. == Conferences == ACM SIGEVO organises two major conferences in the field of evolutionary computation. The Genetic and Evolutionary Conference (GECCO) is held annually, while the Foundations of Genetic Algorithms conference (FOGA) is held biennially. === GECCO === The first GECCO conference was held prior to the formation of ACM SIGEVO but since 2005 (see History above) it has been organised annually by ACM SIGEVO. The latest (2025) was held in Málaga, Spain. The next (2026) will be held in San José, Costa Rica. === FOGA === Foundations of Genetic Algorithms (FOGA) is a biennial peer-reviewed research conference focusing on the theoretical principles underlying genetic algorithms, other evolutionary algorithms and related heuristics. It is organized by ACM SIGEVO. Its relevance to the computer science research community has been reflected in an A-rating in the CORE computer science conference assessment system. The Foundations of Genetic Algorithms (FOGA) conference originated as a workshop in 1990 in order to create an opportunity for researchers on genetic algorithms and related areas of evolutionary computation to focus on the theoretical principles underlying their field. From the start its multi-day duration made it comparable to conferences in the field, and since 2015 its proceedings have used conference rather than workshop in their titles. In 2005 ACM SIGEVO the Association for Computing Machinery Special Interest Group on Genetic and Evolutionary Computation was formed and every FOGA conference since then has been supported by SIGEVO. The table below shows FOGA conferences by year, location, websites (where available) and publisher of proceedings. A citation follows the reference to the publisher giving the full details of each FOGA proceedings. Papers accepted at recent conferences have been presented as digital or print posters in poster sessions at the conference, before being published in written form in the conference proceedings. FOGA is comparable in its multi-day duration to other conferences on evolutionary computation such as CEC, GECCO and PPSN. The main difference is that FOGA focuses on the theoretical basis of evolutionary computation and related subjects. While the above conferences devote some time to theory they also cover a wide range of other topics including competitions and applications. This focus on theoretical computer science was reflected in the CORE computer science conference assessment exercise, where FOGA was given an A-ranking in the 2023 assessment. GECCO and PPSN also obtained A-rankings, but many other conferences in the field of evolutionary computation obtained lower rankings. This suggests that FOGA is a relevant conference in its field, comparable with others including the much larger CEC or GECCO. Keynote speakers at past conferences have been: == Awards == ACM SIGEVO sponsors a number of awards. === SIGEVO Outstanding Contribution Award === The SIGEVO Outstanding Contribution Award commenced in 2023, and these awards are designed to recognise distinctive contributions to the field of evolutionary computation when evaluated over a period of at least 15 years. As a result many recipients to date are notable academics or industrial practitioners, and include Anne Auger, Kalyanmoy Deb, Stephanie Forrest, Emma Hart and Hans-Paul Schwefel. === SIGEVO Dissertation Award === The SIGEVO Dissertation Award recognises thesis research in the field of evolutionary computation completed at least by the year prior to a GECCO conference. Theses are submitted and reviewed by a panel that selects one winner and a maximum of two honourable mentions. Awards will be made to the winner and any others at the next GECCO conference. === SIGEVO Chair Award === The SIGEVO Chair Award, established in 2016 is a lecture sponsored by ACM SIGEVO, to take place on the last day of the GECCO conference. It recognizes through the lectures that the lecturers are influential researchers in the field of evolutionary computation. The more recent lectures are available online. The 2024 Award winner was Una-May O'Reilly. === SIGEVO Impact Award === The SIGEVO Impact Award looks back to the GECCO conference ten years earlier and recognizes up to three papers a year which are considered by the current ACM SIGEVO Executive Committee to have had significant impact over the period since their first publication at the GECCO conference. An example (originally published in GECCO 2010) received this award in 2020. === GECCO Best Paper Award === The ACM SIGEVO sponsors awards for the best papers presented at the GECCO conference. Because GECCO conferences have very many parallel tracks there are multiple awards recognising presentations in the different tracks. At GECCO 2025 Best Paper Awards were presented across 12 tracks. === FOGA Best Paper Award === The ACM SIGEVO sponsors awards for the best papers presented at the FOGA conference. Because FOGA operates on a single track, it is easier to compare papers. Since 2019 this Award has been made (suggesting only four awards up to the latest conference in 2025). ACM SIGEVO records the 2019 award. === Humie Award === The Humies Awards are rewards for the best form of human-competitive results using evolutionary computation or related algorithms and published in the wider literature (they do not need to be published at a conference or in a journal sponsored by ACM SIGEVO or even the ACM.) They were established through a gift from John Koza and have been in operation from 2004 to the present. The link with ACM SIGEVO is that the winners of the competition (submissions are evaluated in advance) are presented with Humie Awards at GECCO conferences. The Humie Awards website provides full details for the rules and how to submit entries to the competition. == Journals == ACM SIGEVO sponsors the main journal covering evolutionary computation published by the ACM: ACM Transactions on Evolutionary Learning and Optimization. ACM SIGEVO refers to the preceding ISGEC organisation (see History above) as sponsoring two other important journals in the field: The Evolutionary Computation journal. Genetic Programming and Evolvable Machines. While these journals continue to be important in the field, the wording on the website of ACM SIGEVO suggests that ACM SIGEVO is not involved in their publication. == References and notes ==
Read more →
Deep learning speech synthesis

Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.
Read more →
Computer-automated design

Design Automation usually refers to electronic design automation, or Design Automation which is a Product Configurator. Extending Computer-Aided Design (CAD), automated design and Computer-Automated Design (CAutoD) are more concerned with a broader range of applications, such as automotive engineering, civil engineering, composite material design, control engineering, dynamic system identification and optimization, financial systems, industrial equipment, mechatronic systems, steel construction, structural optimisation, and the invention of novel systems. The concept of CAutoD perhaps first appeared in 1963, in the IBM Journal of Research and Development, where a computer program was written. to search for logic circuits having certain constraints on hardware design to evaluate these logics in terms of their discriminating ability over samples of the character set they are expected to recognize. More recently, traditional CAD simulation is seen to be transformed to CAutoD by biologically-inspired machine learning, including heuristic search techniques such as evolutionary computation, and swarm intelligence algorithms. == Guiding designs by performance improvements == To meet the ever-growing demand of quality and competitiveness, iterative physical prototyping is now often replaced by 'digital prototyping' of a 'good design', which aims to meet multiple objectives such as maximised output, energy efficiency, highest speed and cost-effectiveness. The design problem concerns both finding the best design within a known range (i.e., through 'learning' or 'optimisation') and finding a new and better design beyond the existing ones (i.e., through creation and invention). This is equivalent to a search problem in an almost certainly, multidimensional (multivariate), multi-modal space with a single (or weighted) objective or multiple objectives. == Normalized objective function: cost vs. fitness == Using single-objective CAutoD as an example, if the objective function, either as a cost function J ∈ [ 0 , ∞ ) {\displaystyle J\in [0,\infty )} , or inversely, as a fitness function f ∈ ( 0 , 1 ] {\displaystyle f\in (0,1]} , where f = J 1 + J {\displaystyle f={\tfrac {J}{1+J}}} , is differentiable under practical constraints in the multidimensional space, the design problem may be solved analytically. Finding the parameter sets that result in a zero first-order derivative and that satisfy the second-order derivative conditions would reveal all local optima. Then comparing the values of the performance index of all the local optima, together with those of all boundary parameter sets, would lead to the global optimum, whose corresponding 'parameter' set will thus represent the best design. However, in practice, the optimization usually involves multiple objectives and the matters involving derivatives are a lot more complex. == Dealing with practical objectives == In practice, the objective value may be noisy or even non-numerical, and hence its gradient information may be unreliable or unavailable. This is particularly true when the problem is multi-objective. At present, many designs and refinements are mainly made through a manual trial-and-error process with the help of a CAD simulation package. Usually, such a posteriori learning or adjustments need to be repeated many times until a ‘satisfactory’ or ‘optimal’ design emerges. == Exhaustive search == In theory, this adjustment process can be automated by computerised search, such as exhaustive search. As this is an exponential algorithm, it may not deliver solutions in practice within a limited period of time. == Search in polynomial time == One approach to virtual engineering and automated design is evolutionary computation such as evolutionary algorithms. === Evolutionary algorithms === To reduce the search time, the biologically-inspired evolutionary algorithm (EA) can be used instead, which is a (non-deterministic) polynomial algorithm. The EA based multi-objective "search team" can be interfaced with an existing CAD simulation package in a batch mode. The EA encodes the design parameters (encoding being necessary if some parameters are non-numerical) to refine multiple candidates through parallel and interactive search. In the search process, 'selection' is performed using 'survival of the fittest' a posteriori learning. To obtain the next 'generation' of possible solutions, some parameter values are exchanged between two candidates (by an operation called 'crossover') and new values introduced (by an operation called 'mutation'). This way, the evolutionary technique makes use of past trial information in a similarly intelligent manner to the human designer. The EA based optimal designs can start from the designer's existing design database, or from an initial generation of candidate designs obtained randomly. A number of finely evolved top-performing candidates will represent several automatically optimized digital prototypes. There are websites that demonstrate interactive evolutionary algorithms for design. allows you to evolve 3D objects online and have them 3D printed. allows you to do the same for 2D images.
Read more →
DAVI

The Dutch Automated Vehicle Initiative (DAVI) is a research and demonstration initiative developing automated vehicles for use on public roads. The project is unique in that, besides simply making driverless cars, it also focuses on having automated vehicles share information among each other. The aim is to have the cars help to avoid traffic congestion by reducing the safety distance between the cars (from 2 seconds to 0.5 seconds) and avoiding sudden traffic slow-downs due to maneuvers undertaken by drivers.
Read more →
European Conference on Artificial Intelligence

The European Conference on Artificial Intelligence (ECAI) is the leading conference in the field of Artificial Intelligence in Europe, and is commonly listed together with IJCAI and AAAI as one of the three major general AI conferences worldwide. The conference series has been held without interruption since 1974, originally under the name AISB. The conference was originally held biennially, but has been organized annually since ECAI 2022. The conferences are held under the auspices of the European Coordinating Committee for Artificial Intelligence (ECCAI) and organized by one of the member societies. The journal AI Communications, sponsored by the same society, regularly publishes special issues in which conference attendees report on the conference. Publication of a paper in ECAI is considered by some journals to be archival: the paper should be considered equivalent to a journal publication and that the contents of ECAI papers cannot be reformulated as separate journal submissions unless a significant amount of new material is added. == List of ECAI conferences == ECAI-1992 took place in Vienna, Austria. ECAI-1996 took place in Budapest, Hungary. ECAI-1998 tool place in Brighton, United Kingdom. ECAI-2000 took place in Berlin, Germany. ECAI-2004 took place in Valencia, Spain. ECAI-2006 took place in Riva del Garda, Italy. ECAI-2008 took place in Patras, Greece. ECAI-2010 took place in Lisbon, Portugal. ECAI-2012 took place in Montpellier, France. ECAI-2014 took place in Prague, Czech Republic. ECAI-2016 took place in The Hague, Netherlands. ECAI-2018 took place in Stockholm, Sweden. ECAI-2020 took place in Santiago de Compostela, Spain. ECAI-2022 took place in Vienna, Austria. ECAI-2023 took place in Kraków, Poland. ECAI-2024 took place in Santiago de Compostela, Spain. ECAI-2025 took place in Bologna, Italy.
Read more →
Deep learning speech synthesis

Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.
Read more →
Artificial intelligence in customer experience

Artificial intelligence in customer experience is the use and development of artificial intelligence (AI) to aid and improve customer experience (sometimes abbreviated to CX AI). Chatbots are often seen as the first step in the development of AI within the industry, but more tailored offerings are slowly becoming available. The use of artificial intelligence in the space has since become more diverse than simply chatbots, with AI underpinning entire CX cloud platforms now used at major corporations. Contact center as a service (CCaaS) has become a core solution of the CX (customer experience) industry, with the CCaaS market size expected to reach $17.19 Billion by 2030 in the United States alone. == History == As with many AI applications, CX AI early implementation case studies have demonstrated that AI can increase the quality of customer interactions and therefore the overall experience that organizations can provide. This in turn has suggested a higher return on investment and/or revenue as a result. The beginning of the revolution of customer experience and the use of machine learning was with chatbots. The use of this type of AI can be traced back to Alan Turing in 1950, when the Church–Turing thesis suggested that computers could use "formal reasoning" to reach conclusions. In 2017, Meta produced one of the first breakthroughs for everyday use of AI for customer experience when it allowed Facebook users to create their own messaging bots for free on its Facebook messenger platform. The main focus of this was to both automate and improve customer experience and interaction. In 2023, CCaaS vendors began announcing the integration of ChatGPT’s generative AI into their CX solutions. Generative AI adds a layer of semantics into AI outputs. This was a major breakthrough for conversational AI. Using natural language processing and conversational AI, chatbots could enhance the level of service they could provide, speaking to customers in an easy-to-understand and conversational tone. == Applications == Currently the main location for the application of CX AI in the sector is in contact centers. Historically, contact centers were simply known as call centers, but in recent years differentiation developed between the two terms. Call centers provide phone support, while contact centers also provide support via digital channels in addition to analogue phone systems. Contact centers are therefore seen as a complete customer service solution, where as call centers simply cover one aspect of customer interactions. As a part of improving CX, AI is also improving the employee experience. AI is able to automate tasks to free up time for contact center agents to focus on higher priority tasks. For example, AI can be used for auto summarization. This means that instead of human agents having to summarize customer interactions now AI can do it, saving organizations time and money.
Read more →
Exploration–exploitation dilemma

The exploration–exploitation dilemma, also known as the explore–exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best option based on current knowledge of the system (which may be incomplete or misleading), while exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making problems whose goal is to maximize long-term benefits. == Application in machine learning == In the context of machine learning, the exploration–exploitation tradeoff is fundamental in reinforcement learning (RL), a type of machine learning that involves training agents to make decisions based on feedback from the environment. Crucially, this feedback may be incomplete or delayed. The agent must decide whether to exploit the current best-known policy or explore new policies to improve its performance. === Multi-armed bandit methods === The multi-armed bandit (MAB) problem was a classic example of the tradeoff, and many methods were developed for it, such as epsilon-greedy, Thompson sampling, and the upper confidence bound (UCB). See the page on MAB for details. In more complex RL situations than the MAB problem, the agent can treat each choice as a MAB, where the payoff is the expected future reward. For example, if the agent performs an epsilon-greedy method, then the agent will often "pull the best lever" by picking the action that had the best predicted expected reward (exploit). However, it would pick a random action with probability epsilon (explore). Monte Carlo tree search, for example, uses a variant of the UCB method. === Exploration problems === There are some problems that make exploration difficult. Sparse reward. If rewards occur only once a long while, then the agent might not persist in exploring. Furthermore, if the space of actions is large, then the sparse reward would mean the agent would not be guided by the reward to find a good direction for deeper exploration. A standard example is Montezuma's Revenge. Deceptive reward. If some early actions give immediate small reward, but other actions give later large reward, then the agent might be lured away from exploring the other actions. Noisy TV problem. If certain observations are irreducibly noisy (such as a television showing random images), then the agent might be trapped exploring those observations (watching the television). === Exploration reward === This section based on. The exploration reward (also called exploration bonus) methods convert the exploration-exploitation dilemma into a balance of exploitations. That is, instead of trying to get the agent to balance exploration and exploitation, exploration is simply treated as another form of exploitation, and the agent simply attempts to maximize the sum of rewards from exploration and exploitation. The exploration reward can be treated as a form of intrinsic reward. We write these as r t i , r t e {\displaystyle r_{t}^{i},r_{t}^{e}} , meaning the intrinsic and extrinsic rewards at time step t {\displaystyle t} . However, exploration reward is different from exploitation in two regards: The reward of exploitation is not freely chosen, but given by the environment, but the reward of exploration may be picked freely. Indeed, there are many different ways to design r t i {\displaystyle r_{t}^{i}} described below. The reward of exploitation is usually stationary (i.e. the same action in the same state gives the same reward), but the reward of exploration is non-stationary (i.e. the same action in the same state should give less and less reward). Count-based exploration uses N n ( s ) {\displaystyle N_{n}(s)} , the number of visits to a state s {\displaystyle s} during the time-steps 1 : n {\displaystyle 1:n} , to calculate the exploration reward. This is only possible in small and discrete state space. Density-based exploration extends count-based exploration by using a density model ρ n ( s ) {\displaystyle \rho _{n}(s)} . The idea is that, if a state has been visited, then nearby states are also partly-visited. In maximum entropy exploration, the entropy of the agent's policy π {\displaystyle \pi } is included as a term in the intrinsic reward. That is, r t i = − ∑ a π ( a | s t ) ln ⁡ π ( a | s t ) + ⋯ {\displaystyle r_{t}^{i}=-\sum _{a}\pi (a|s_{t})\ln \pi (a|s_{t})+\cdots } . === Prediction-based === This section based on. The forward dynamics model is a function for predicting the next state based on the current state and the current action: f : ( s t , a t ) ↦ s t + 1 {\displaystyle f:(s_{t},a_{t})\mapsto s_{t+1}} . The forward dynamics model is trained as the agent plays. The model becomes better at predicting state transition for state-action pairs that had been done many times. A forward dynamics model can define an exploration reward by r t i = ‖ f ( s t , a t ) − s t + 1 ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-s_{t+1}\|_{2}^{2}} . That is, the reward is the squared-error of the prediction compared to reality. This rewards the agent to perform state-action pairs that had not been done many times. This is however susceptible to the noisy TV problem. Dynamics model can be run in latent space. That is, r t i = ‖ f ( s t , a t ) − ϕ ( s t + 1 ) ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-\phi (s_{t+1})\|_{2}^{2}} for some featurizer ϕ {\displaystyle \phi } . The featurizer can be the identity function (i.e. ϕ ( x ) = x {\displaystyle \phi (x)=x} ), randomly generated, the encoder-half of a variational autoencoder, etc. A good featurizer improves forward dynamics exploration. The Intrinsic Curiosity Module (ICM) method trains simultaneously a forward dynamics model and a featurizer. The featurizer is trained by an inverse dynamics model, which is a function for predicting the current action based on the features of the current and the next state: g : ( ϕ ( s t ) , ϕ ( s t + 1 ) ) ↦ a t {\displaystyle g:(\phi (s_{t}),\phi (s_{t+1}))\mapsto a_{t}} . By optimizing the inverse dynamics, both the inverse dynamics model and the featurizer are improved. Then, the improved featurizer improves the forward dynamics model, which improves the exploration of the agent. Random Network Distillation (RND) method attempts to solve this problem by teacher–student distillation. Instead of a forward dynamics model, it has two models f , f ′ {\displaystyle f,f'} . The f ′ {\displaystyle f'} teacher model is fixed, and the f {\displaystyle f} student model is trained to minimize ‖ f ( s ) − f ′ ( s ) ‖ 2 2 {\displaystyle \|f(s)-f'(s)\|_{2}^{2}} on states s {\displaystyle s} . As a state is visited more and more, the student network becomes better at predicting the teacher. Meanwhile, the prediction error is also an exploration reward for the agent, and so the agent learns to perform actions that result in higher prediction error. Thus, we have a student network attempting to minimize the prediction error, while the agent attempting to maximize it, resulting in exploration. The states are normalized by subtracting a running average and dividing a running variance, which is necessary since the teacher model is frozen. The rewards are normalized by dividing with a running variance. Exploration by disagreement trains an ensemble of forward dynamics models, each on a random subset of all ( s t , a t , s t + 1 ) {\displaystyle (s_{t},a_{t},s_{t+1})} tuples. The exploration reward is the variance of the models' predictions. === Noise === For neural network–based agents, the NoisyNet method changes some of its neural network modules by noisy versions. That is, some network parameters are random variables from a probability distribution. The parameters of the distribution are themselves learnable. For example, in a linear layer y = W x + b {\displaystyle y=Wx+b} , both W , b {\displaystyle W,b} are sampled from Gaussian distributions N ( μ W , Σ W ) , N ( μ b , Σ b ) {\displaystyle {\mathcal {N}}(\mu _{W},\Sigma _{W}),{\mathcal {N}}(\mu _{b},\Sigma _{b})} at every step, and the parameters μ W , Σ W , μ b , Σ b {\displaystyle \mu _{W},\Sigma _{W},\mu _{b},\Sigma _{b}} are learned via the reparameterization trick.
Read more →
The Stories of Ibis

The Stories of Ibis (アイの物語, Ai no Monogatari) is a Japanese science-fiction light novel by Hiroshi Yamamoto (山本弘) and translated by Takami Nieda. Yamamoto considered this to be an easier read than his earlier science fiction novel 'God Never Keeps Silent' because of its "light novel touch". The light novel was published in Japanese by Kadokawa Shoten and in English by Viz Media under their 'Haikasoru' imprint. The Stories of Ibis is told through a collection of short stories. All but two had been previously published. The two that Yamamoto wrote for the novel were 'The Day Shion Came' and 'AI's Story'. This is similar to The Illustrated Man by Ray Bradbury. Yamamoto drew from Bradbury's idea of short stories that were loosely connected. He represented this influence in the novel by giving Ibis a facial tattoo. == Plot == The Stories of Ibis begins with a wandering storyteller who encounters Ibis. He has the mindset that all robots are a threat to humanity and must be fought against for survival. He attacks the robot Ibis, not aware of who she is, as a result of his mindset. Ibis tells the storyteller that she is far more proficient in battle. During the battle the storyteller becomes injured and Ibis takes him to an android hospital to care for him. While he is recovering Ibis offers to tell him stories. While originally skeptical he agrees after Ibis makes it clear that the stories are not taboo. The space after each story is referred to as intermission and is a time for Ibis to comment on the story she just told. === The Universe on my Hands === The story is about a group of friends who are writing a science fiction story over the internet. One of the group members kills someone in real life. The rest of the short story is about how the group fights to convince this man to not commit suicide, but to turn himself in. He resolves to turn himself in, being hopeful to the future because he knows he has friends who care about him. The ending words of the story are a commentary. While the story they were writing was not real, the emotions they were feeling were real. === A Romance in Virtual Space === This is another story about human interactions over the internet. The device that allows people to enter virtual reality (VR) is MUGEN Net. Such devices are extremely expensive and most people need to go to a public server to use one. However the girl's parents in this story are wealthy enough to own one. This girl is shopping in VR when a boy meets her and asks her out for ice cream. All goes well and they plan for another. After some time of VR dating and awesome adventures with a female heroine, they agree to meet up in real life. He discovers that in reality, she is blind, yet he thinks she is brave and they continue dating. It's a wonderful short story of a secret utopia inside a dystopian culture of technology. === Mirror Girl === A short story about an artificial intelligence that grows over time with human interaction. The inspiration for this story was Ray Bradbury's I Sing the Body Electric. The mirror girl Shalice starts off with basic knowledge and by interacting with her owner develops. The owner grows up and marries a technician who incubates Shalice by teaching her in the virtual world at many thousand times faster than average life. When he is done, Strong Eye is created. Strong Eye is the fully developed and completely intelligent AI. === Black Hole Diver === A futuristic story about an artificial space station and people who go diving into a black hole. The space station cannot stop people but is sorry that they go to their deaths because none of them get past the event horizon. Then one girl comes who has the space ship, the training, and the research necessary to attempt to dive into the black hole. As she goes into the black hole the space station can no longer observe. She may have made it, she could have been destroyed. === A World Where Justice is Just === An anime flavored story about the intelligence of people being scanned onto a computer network. The AIs in the network fight crime and live repeating lives. At the end of each year they start anew, but different story lines. Thousands of 'extras' populate the network and are the ones subject to harm and deletion. The protagonist has a pen pal in real life who explains to her that the real world is under attack and that there are no respawns and no extras. The AI finds this so cruel that people would willingly kill each other when they can't come back. === The Day Shion Came === The stories leading up to this were all relatively short. This and the next took up over 100 pages each. This is a story about an android named Shion who works in a Japanese nursing facility. Shion comes with only extensive nursing training but lacks the knowledge of how to communicate with the residents. After months of training she informs her adviser that she believes all humans have dementia, which explains their irrational behavior. Near the end of the story one of the residents threatens suicide but Shion convinces him to step down and be rational. === AI's Story === The culminating story of the entire novel. It is about Ibis herself. She starts off as a virtual reality fighting program and over time develops intelligence. Her master gains enough funds to create her a body in the real world or level 0. There is significant hate against TAIs (True Artificial Intelligence) in the real world. Ibis and her friend Raven rebel against their masters to make a point. Human hatred was destroying them. After many years robots took prevalence and most humans realized they were not worthy to be the guardians of Earth and died in peace. The remaining population was stubborn and fought against the robots for centuries. The storyteller is a child of this generation, being raised in hatred and ignorance. The robots sought to take him captive, and teach him the truth so that he could go to the villages where people lived and teach them the truth. The whole point was they cared for the humans and wanted them to live in peace, rather than fighting for their survival. == Reception == It was reviewed by the Denver Post to be an "excellent novel". Being a Japanese novel translated to English, it has a small audience. The novel was given a 3.85 of 5 by the reviewers at Librarything.com. The reviewers of Google Books gave it a 4.33 of 5.
Read more →
Cruel World of Dreams and Fears

Cruel World of Dreams and Fears is the debut album from Ukrainian-born Czech black metal artist Draugveil, released independently on 13 June 2025. The album became notable among metal fans due to its cover, featuring Draugveil in a suit of armour and corpse paint, and lying in a field of red roses. The cover was the subject of parodying internet memes, as well as accusations of using artificial intelligence (AI) to make it. These claims were later expanded to suggest that AI was used to make the album's music. == Memes and AI accusations == Upon the album being released on YouTube on the channel Black Metal Promotion, the album attracted attention due to its cover, depicting Draugveil lying in a field of roses, dressed in armour, wearing corpse paint and having a sword stuck in the ground. Some compared it to covers where other artists are lying on the ground, such as Michael Jackson's Thriller, Luther Vandross's Give Me the Reason, and the UK cover of Lionel Richie's You Are. Critics of the album, however, suggested that AI was used to make the cover. This was partly due to suggestions that the rose stems in the picture come out from the ground in an unrealistic way. This later resulted in claims from some fans that AI was also used to produce the music, and later the lyrics and vocals. These claims began on a Facebook page entitled "AI Generated Nonsense", which was later deleted. No definitive evidence, however, was produced to back these claims. Derek McArthur, a journalist for Glasgow-based newspaper The Herald, wrote: "The music is in line with what one would expect from a one-man black metal project in the vein of Judas Iscariot and Burzum, but then if AI was asked to create music in a black metal style, that is probably what it would decide to generically produce and spit out." Draugveil's reaction to the claims was: "Let people decide." The result of the claims of AI has led to some writers to claim that artists in the future will have to prove they are human to be taken seriously, and that members of the public will be increasing doubt as to whether creative works are produced by either humans or AI. == Track listing ==
Read more →
Argument Web

The Argument Web is a large-scale Web of interconnected arguments created by individuals as they express their opinions and interact with the opinions of others. The Argument Web aims to make online debate intuitive for participants such as mediators, students, academics, broadcasters and bloggers, to create a Web infrastructure that allows for the storage, automatic retrieval and analysis of linked argument data, and to improve the quality of online argument and debate. The Argument Web can be described as a portion of a larger Semantic Web. == AIFdb == AIFdb is a database implementation or ‘reification’ of the Argument Interchange Format (AIF), which allows for the storage and retrieval of AIF compliant argument structures. This database solution was provided as a foundation for an open, integrated Argument Web. It offers an extensive range of web services for interacting with stored argument data, while also offering search and argument visualisation features that are all consistent with the formal ontology of AIF. At a basic level, the AIFdb web services allow for the insertion and querying of basic components of an AIF argument, such as nodes, edges and schemes. Building upon this basis, it also facilitates more complex interactions with these AIF argument structures. Such complex queries could make it possible, for example, to determine all the statements made by a particular person in support a given I-Node. While, at its highest level of interaction, AIFdb can handle the import and export of many standard file formats, including SVG, DOT, RDF/XML and other formats of argument theory tools, like Carneades, Rationale and Araucaria. == Argument blogging == ArguBlogging is software which allows its users to select portions of hypertext on webpages in their Web browsers and to agree or disagree with the selected content, posting their arguments to their blogs with linked argument data. It is implemented as a bookmarklet, adding functionality to Web browsers and interoperating with blogging platforms such as Blogger and Tumblr.
Read more →
Private cloud computing infrastructure

Private cloud computing infrastructure is a category of cloud computing that provides comparable benefits to public cloud systems, such as self-service and scalability, but it does so via a proprietary framework. In contrast to public clouds, which cater to multiple entities, a private cloud is specifically designed for the requirements and objectives of one organization. == Definition == A private cloud computing infrastructure constitutes a distinctive model of cloud computing that facilitates a secure and distinct cloud environment where only the intended client can function. It can either be physically housed in the organization's in-house data center or be managed by a third-party provider. In a private cloud, the infrastructure and services are always sustained on a private network, and both the hardware and software are devoted exclusively to a single organization. == History == The concept of private cloud infrastructure started to take shape around the mid-2000s, coinciding with the rise of other cloud computing forms. It came into existence as a solution to the shortcomings of public clouds, particularly concerns over data control, security, and network performance. IT departments began to mirror the automation and self-service features of the public cloud in their data centers. Over time, these services became more advanced, and private cloud technology has been refined to address businesses and organizations' diverse needs. == Architecture == Private cloud computing infrastructure generally involves a mix of hardware, network infrastructure, and virtualization software. The hardware, often referred to as a cloud server or cloud array, consists of a server rack or a collection of server racks containing the storage and processors that constitute the cloud. The virtualization software, such as Hyper-V, OpenStack, or VMWare, establishes and oversees virtual machines with which users interact. The network infrastructure connects the private cloud to users and may facilitate connectivity with other on-premises data centers or clouds. == Applications == Private cloud infrastructures are usually utilized by medium to large businesses and organizations that need robust control over their data, have extensive computing needs, or have specific regulatory or compliance obligations. This includes healthcare organizations, government agencies, financial institutions, and any business that needs to process and store large data volumes.
Read more →
Document AI

Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP). These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review. Through NLP, computer systems are able to understand relationships and contextual nuances in document contents, which facilitates the extraction of information and insights. Additionally, this technology enables the categorization and organization of the documents themselves. The applications of Document AI extend to processing and parsing a variety of semi-structured documents, such as forms, tables, receipts, invoices, tax forms, contracts, loan agreements, and financial reports. == Key features == Machine learning is utilized in Document AI to extract information from both printed and digital documents. This technology recognizes images, text, and characters in various languages, aiding in the extraction of insights from unstructured documents. The use of this technology can improve the speed and quality of decision-making in document analysis. Additionally, the automation of data extraction and validation can contribute to increased efficiency in document analysis processes. Since the early 2020s, the integration of large language models has extended Document AI beyond extraction toward generative tasks, including the automated drafting of forms, contracts, and document summaries. == Example == A business letter contains information in the form of text, as well as other types of information, such as the position of the text. For instance, a typical letter contains two addresses before the body of the text. The address at the very top (sometimes aligned to the right) is the sender address. This is normally followed by the date of the letter, with the place of writing. After this, the receiver address is listed. The distinction between the sender address and the receiver address is conveyed solely by the position of the address on the page, i.e. there is no textual indication like Sender: in front of the addresses. == Data dimensions and ML architecture == Data is typically distinguished into spatial data and time-series data, the former includes things like images, maps and graphs, while the latter includes signals such as stock prices or voice recordings. Document AI combines text data, which has a time dimension, with other types of data, such as the position of an address in a business letter, which is spatial. Historically in machine learning spatial data was analyzed using a convolutional neural network, and temporal data using a recurrent neural network. With the advent of dimension-type agnostic transformer architecture, these two different types of dimension can be more easily combined, Document AI is an example of this. == Benchmarks == Several public datasets are used to evaluate Document AI systems. FUNSD (Form Understanding in Noisy Scanned Documents) contains 199 annotated forms with token- and block-level labels for form understanding tasks. CORD (Consolidated Receipt Dataset) supports key information extraction from receipts. DocVQA contains approximately 50,000 questions over 12,000 document images for layout-aware visual question answering. == Common uses == Document AI systems are used to automate document processing and information extraction in business and financial workflows, including invoice and receipt processing, data entry automation, anomaly detection, mortgage processing, loan portfolio monitoring, credit risk management, and fraud detection such as counterfeit currency and fraudulent checks. They are also applied in regulatory compliance and contract analysis, including assessing changes in legal and regulatory documents. In real estate, Document AI supports document classification and structured information extraction for standardized processing and analytics. With the adoption of generative AI, Document AI systems can also generate and pre-fill structured documents such as contracts or business forms from natural language prompts.
Read more →
Fragile Dreams: Farewell Ruins of the Moon

Fragile Dreams: Farewell Ruins of the Moon (フラジール～さよなら月の廃墟～, Furajīru: Sayonara Tsuki no Haikyo; known in Japan as Fragile) is an action role-playing game for the Wii developed by Namco Bandai Games in co-operation with Tri-Crescendo. The game was released by Namco Bandai Games in Japan on January 22, 2009. It was later published by Xseed Games in North America on March 16, 2010, and in Europe by Rising Star Games on March 19, 2010, followed by its release in Australia on April 1, 2010. == Gameplay == In Fragile Dreams, the player character, Seto, must traverse the ruins of Tokyo and the surrounding areas, fighting off ghosts that lurk within these ruins. The game's heads-up display includes a mini-map and HP gauge for Seto's location and health, respectively. Seto will fall unconscious if his HP reaches zero, resulting in a game over. The player controls Seto from a third-person perspective with the Wii Remote and Nunchuk. Seto can use his flashlight (controlled by the Wii Remote pointer) to illuminate his surroundings or solve puzzles and interact with the environment. When searching for certain objectives or hidden enemies, pointing Seto's light in their direction picks up and plays their sounds through the Wii Remote's mini speaker. The Wii Nunchuk, meanwhile, directly controls Seto's movement: aside of basic movement, he can crouch to hide and crawl through small spaces. Seto will often come across damaged floors, which require slow movement (and for heavily damaged floors, crouching) to cross without falling through. As Seto, the player can use weapons found throughout the world to fight off ghosts, ranging from slingshots and golf clubs to crossbows and katanas. Each weapon can only take a certain amount of use: once a weapon reaches its limit, it will break after battle. The player can also find other usable and collectable items in the field, marked with fireflies. The player can only save their game by resting at small fire pits scattered throughout the world: used fire pits are marked with a bonfire. The player can also examine and identify Mystery Items, organize their inventory, as well as after encountering the Merchant, buy and sell items. As stated by the producer of the game, Kentarō Kawashima, Fragile Dreams is not strictly a survival horror: rather, its story focuses on human drama. In Fragile Dreams, aside of the main story, the player can find and examine objects and graffiti throughout the world. Objects called memory items (ranging from origami and stones to cell phones and books) hold the memories of their former owners (only accessible at bonfires), while the graffiti contains messages only seen by pointing at them in first-person. By examining these messages, the player can piece together hints to the game's backstory. == Story == === Setting and characters === Fragile Dreams is set in a post-apocalyptic version of Earth in the near-future. Almost all the world's population has vanished, leaving the surviving buildings and structures abandoned. The game is set in and near the ruins of Tokyo, Japan, where the event that nearly wiped out humanity may have originated. The protagonist, Seto, is a 15-year-old boy who searches the world for other living humans. He encounters Ren, a silver-haired girl who often leaves behind large, cryptic drawings. Other characters include: Sai, the ghost of a young woman; Crow, a mischievous and straightforward amnesiac boy; Personal Frame (P.F.), a portable computer who loves having conversations more than anything else; Chiyo, the ghost of a little girl; and the Merchant, a mysterious yet merry man who trades various goods. The game's host of enemies mainly consist of ghosts, but also include humanoid robots and security proxies. The main antagonist, Shin, is the AI of a scientist who considers speech to be an inferior means of communication. Various memory items include a greater set of characters, each giving hints to the game's backstory. === Plot === At the end of Seto's fifteenth summer, his grandfather dies. Seto buries him in front of their home, an old observatory, and that from then on he became "truly alone". At night, he searches for anything the old man had left for him and discovers a letter, along with a strange blue stone in a locket. Suddenly, a mask-like ghost appears and attacks Seto. After driving the creature off, Seto reads the old man's letter, who tells him to "reach a tall red tower" east of the observatory, where he might find other survivors. After departing for the tower, Seto reaches an old subway entrance in the Azabudai district and finds Ren sitting on a collapsed pillar, singing to the stars. He accidentally startles her and the frightened Ren flees into the subway station: getting over the shock of meeting another person, Seto follows her. While searching the station, he discovers a Personal Frame, who guides him towards Ren. Unfortunately, just as they reach the exit, P.F.'s battery dies out: Seto buries the device, keeping a screw from it in his locket. From the underground, Seto finds himself at an abandoned amusement park and encounters Crow, who steals Seto's locket. After a long chase across the park and another encounter with the masked ghost, Crow returns Seto's locket and directs him to a hotel nearby, where he saw a girl who might know something about Ren. Crow also gives Seto his skull ring to keep in his locket and kisses him. At the hotel, Seto encounters Sai and fights the masked ghost again. After laying to rest the spirit of an old woman named Chiyo, the two discover Ren's drawings by a sewer. Returning to the underground, Seto and Sai find themselves at a hydropower dam. While searching for Ren, Seto discovers that Crow is actually a robot, but his battery begins to fail and Seto mourns for him as he "die[s]". Finally, they encounter Ren in a cell: although glad to see him again, Ren runs off after Shin calls. Sai explains to Seto that most of humanity died because of a "human empathy expansion project" called Glass Cage. The project was meant to make human thoughts transparent, meaning that no one would need words to communicate. However, after Glass Cage activated, people who went to sleep never woke up again. Sai reveals that she was Glass Cage's first catalyst: this time, Shin intends to use Ren as the catalyst. After exiting the dam, a demolition crane attempts to destroy it. Hearing both Shin's and the masked ghost's voices from the crane — saying, "Any threat to the project must be eliminated." — the player realizes both are manifestations of Glass Cage. After Seto destroys the crane, Sai leads him to the facility where Ren was taken to. Entering the laboratory, Seto and Sai are confronted by Shin, who coldly dismisses Sai's attempts at reasoning with him and is adamant about proceeding with his plans. As they traverse the laboratory, they overhear a voice announcing "Glass Cage Launch Preparations Complete", strengthening their resolve to save Ren. Making it into the room where Ren is being held, Shin tells them of his intention to use Glass Cage to "obliterate corporeal beings". After Seto defeats him, Shin disappears and Seto releases Ren from the device holding her. Their reunion is cut short as Sai tells them that the backup system has "finished copying her psyche to the AI", allowing Glass Cage to proceed. Ren reveals Shin has escaped to the top of the Tokyo Tower and Seto asks Ren to wait at the base of the tower and for Sai to accompany her. On his way up the tower, Seto hears the voices of P.F., Chiyo and Crow wishing him luck. He confronts and defeats Shin a second time, who reveals his motivations: he had secretly used himself as the first test subject of the human empathy expansion project and gained the ability to hear the thoughts of those around him. Despite his initial belief in the project as a way for humans to empathize with one another, all he heard around him was "jealousy and contempt" and he soon grew disillusioned with the world as even his parents turned against him. Believing no person loved him, Shin wants to put an end to humanity. His words meet with a vehement response from Sai, as she tells him that she loves him, having developed those feelings while she was the catalyst and all she ever wanted was to be part of his life. Hearing this, Shin finds peace, tossing the AI mainframe away so Glass Cage can never be reactivated and vanishes together with Sai, hand-in-hand, after thanking Seto. Descending from the tower, Seto finally learns Ren's name and they resolve to look for other survivors together. == Development == Fragile Dreams was developed by the team at Namco Bandai Games. Director and producer Kentarō Kawashima came up with the concept for the game in 2003, before the Wii console was revealed. When the Wii was unveiled, it became the obvious choice as the game's platform as the Wii remote could be used to control the flashlight. Kawashima wrote the main scenario for the title, w
Read more →