AI Data Center

AI Data Center — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Site Security Handbook

    Site Security Handbook

    The Site Security Handbook, RFC 2196, is a guide on setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be useful to sites not yet connected to the Internet). The guide lists issues and factors that a site must consider when setting their own policies. It makes a number of recommendations and provides discussions of relevant areas. This guide is only a framework for setting security policies and procedures. In order to have an effective set of policies and procedures, a site will have to make many decisions, gain agreement, and then communicate and implement these policies. The guide is a product of the IETF SSH working group, and was published in 1997, obsoleting the earlier RFC 1244 from 1991.

    Read more →
  • Human–AI interaction

    Human–AI interaction

    Human–AI interaction is a developing field of research and a sub-field of human–computer interaction (HCI). HCI is a field of research that explores the interactions between humans and computer-based technology, focusing on design implementation, user experience, and psychological factors. With the proliferation of artificial intelligence (AI), there has developed a sub-section of HCI research dedicated specifically to artificial intelligence and how people interact with and are impacted by it. This is human–AI interaction, abbreviated either as HAX or HAII. == Introduction == Artificial intelligence (AI), in general, has fluid definitions and varied research applications, but in brief can be applied to mechanizing tasks that would require human intelligence to complete. AI are tools designed to replicate the human abilities of navigating uncertainty, active learning, and processing information in different contexts. Within the context of HCI and HAX research, artificial intelligence can be broken into two sub-fields, natural language processing (NLP) and computer vision (CV). AI technologies notably include machine-learning, deep-learning and neural networks, and large-language models (LLMs). As a new and rapidly developing technology, AI is changing how computers work and therefore changing how humans interact with computers. Unlike the traditional human-computer interaction, where a human directs a machine, human-AI interaction is characterized by a more collaborative relationship between the computer program (the AI) and the human user, as AI is perceived as an active agent rather than a tool. This changing dynamic creates new questions and necessitates new research methods that are not present in traditional HCI research. According to a scoping review on the state of the discipline, the HAX field comprises research on the "design, development, and evaluation of AI systems" and encompasses the themes of human-AI collaboration, human-AI competition, human-AI conflict, and human-AI symbiosis. == Design == Machine learning and artificial intelligence have been used for decades in targeted advertising and to recommend content in social media. Ethical Guidelines (Framework for ethical AI development) == User Experience (UX) == This section should handle research on how users interact with tools. What techniques do they use, do they develop habits, what types of programs and devices are they using to access these tools, what do they use these tools to do exactly. === Cognitive Frameworks in AI Tool Users === AI has been viewed with various expectations, attributions, and often misconceptions. Many people exclusively understand AI as the LLM chatbots they interact with, like ChatGPT or Claude, or other generative AI programs. [Insert section: discuss how people interact with these specific AI tools as a connection to the following paragraphs] Most fundamentally, humans have a mental model of understanding AI's reasoning and motivation for its decision recommendations, and building a holistic and precise mental model of AI helps people create prompts to receive more valuable responses from AI. However, these mental models are not whole because people can only gain more information about AI through their limited interaction with it; more interaction with AI builds a better mental model that a person may build to produce better prompt outcomes. Research on human-AI interaction has emphasized that users develop mental models of AI systems and revise those models through repeated use, feedback, and explanation, while design research has stressed the importance of communicating capabilities and limitations early and supporting trust calibration through explanation and correction. In a 2025 SSRN working paper, John DeVadoss proposed "Hypothetico-Deductive Interaction" (HDI), a framework that describes human-AI interaction as a mutual process of conjecture and refutation in which users test assumptions about an AI system's capabilities while the system infers and updates assumptions about user goals through its responses and clarifying questions. DeVadoss argued that this framing helps explain prompt iteration, weak capability awareness, and trust miscalibration, and suggested design responses such as clearer communication of uncertainty, easier correction, actionable explanations, and safer failure modes. == Research themes == === Human-AI collaboration === Human-AI collaboration occurs when the human and AI supervise the task on the same level and extent to achieve the same goal. Some collaboration occurs in the form of augmenting human capability. AI may help human ability in analysis and decision-making through providing and weighing a volume of information, and learning to defer to the human decision when it recognizes its unreliability. It is especially beneficial when the human can detect a task that AI can be trusted to make few errors so that there is not a lot of excessive checking process required on the human's end. Some findings show signs of human-AI augmentation, or human–AI symbiosis, in which AI enhances human ability in a way that co-working on a task with AI produces better outcomes than a human working alone. For example: the quality and speed of customer service tasks increase when a human agent collaborates with AI, training on specific models allows AI to improve diagnoses in clinical settings, and AI with human-intervention can improve creativity of artwork while fully AI-generated haikus were rated negatively. Human-AI synergy, a concept in which human-AI collaboration would produce more optimal outcomes than either human or AI working alone could explain why AI does not always help with performance. Some AI features and development may accelerate human-AI synergy, while others may stagnate it. For example, when AI updates for better performance, it sometimes worsens the team performance with human and AI by reducing the compatibility with the new model and the mental model a user has developed on the previous version. Research has found that AI often supports human capabilities in the form of human-AI augmentation and not human-AI synergy, potentially because people rely too much on AI and stop thinking on their own. Prompting people to actively engage in analysis and think when to follow AI recommendations reduces their over-reliance, especially for individuals with higher need for cognition. === Human-AI competition === Robots and computers have substituted routine tasks historically completed by humans, but agentic AI has made it possible to also replace cognitive tasks including taking phone calls for appointments and driving a car. At the point of 2016, research has estimated that 45% of paid activities could be replaced by AI by 2030. Perceived autonomy of robots is known to increase people's negative attitude toward them, and worry about the technology taking over leads people to reject it. There has been a consistent tendency of algorithm aversion in which people prefer human advice over AI advice. However, people are not always able to tell apart tasks completed by AI or other humans. See AI takeover for more information. It is also notable that this sentiment is more prominent in the Western cultures as Westerners tend to show less positive views about AI compared to East Asians. == Research on the psychological impacts of AI == === Perception on others who use AI === As much as people perceive and make judgment about AI itself, they also form impressions of themselves and others who use AI. In the workplace, employees who disclose the use of AI in their tasks are more likely to receive feedback that they are not as hardworking as those who are in the same job who receive non-AI help to complete the same tasks. AI use disclosure diminishes the perceived legitimacy in the employee's task and decision making which ultimately leads observers to distrust people who use AI. Although these negative effects of AI use disclosure are weakened by the observers who use AI frequently themselves, the effect is still not attenuated by the observers' positive attitude towards AI. === Bias, AI, and human === Although AI provides a wide range of information and suggestions to its users, AI itself is not free of biases and stereotypes, and it does not always help people reduce their cognitive errors and biases. People are prone to such errors by failing to see other potential ideas and cases that are not listed by AI responses and committing to a decision suggested by AI that directly contradicts the correct information and directions that they are already aware of. Gender bias is also reflected as the female gendering of AI technologies which conceptualizes females as a helpful assistant. == Emotional connection with AI == Human-AI interaction has been theorized in the context of interpersonal relationships mainly in social psychology, communications and media studies, and as a technology interface through the lens of hu

    Read more →
  • Proximal gradient methods for learning

    Proximal gradient methods for learning

    Proximal gradient (forward backward splitting) methods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is ℓ 1 {\displaystyle \ell _{1}} regularization (also known as Lasso) of the form min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , where x i ∈ R d and y i ∈ R . {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},\quad {\text{ where }}x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application. Such customized penalties can help to induce certain structure in problem solutions, such as sparsity (in the case of lasso) or group structure (in the case of group lasso). == Relevant background == Proximal gradient methods are applicable in a wide variety of scenarios for solving convex optimization problems of the form min x ∈ H F ( x ) + R ( x ) , {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x),} where F {\displaystyle F} is convex and differentiable with Lipschitz continuous gradient, R {\displaystyle R} is a convex, lower semicontinuous function which is possibly nondifferentiable, and H {\displaystyle {\mathcal {H}}} is some set, typically a Hilbert space. The usual criterion of x {\displaystyle x} minimizes F ( x ) + R ( x ) {\displaystyle F(x)+R(x)} if and only if ∇ ( F + R ) ( x ) = 0 {\displaystyle \nabla (F+R)(x)=0} in the convex, differentiable setting is now replaced by 0 ∈ ∂ ( F + R ) ( x ) , {\displaystyle 0\in \partial (F+R)(x),} where ∂ φ {\displaystyle \partial \varphi } denotes the subdifferential of a real-valued, convex function φ {\displaystyle \varphi } . Given a convex function φ : H → R {\displaystyle \varphi :{\mathcal {H}}\to \mathbb {R} } an important operator to consider is its proximal operator prox φ : H → H {\displaystyle \operatorname {prox} _{\varphi }:{\mathcal {H}}\to {\mathcal {H}}} defined by prox φ ⁡ ( u ) = arg ⁡ min x ∈ H φ ( x ) + 1 2 ‖ u − x ‖ 2 2 , {\displaystyle \operatorname {prox} _{\varphi }(u)=\operatorname {arg} \min _{x\in {\mathcal {H}}}\varphi (x)+{\frac {1}{2}}\|u-x\|_{2}^{2},} which is well-defined because of the strict convexity of the ℓ 2 {\displaystyle \ell _{2}} norm. The proximal operator can be seen as a generalization of a projection. We see that the proximity operator is important because x ∗ {\displaystyle x^{}} is a minimizer to the problem min x ∈ H F ( x ) + R ( x ) {\displaystyle \min _{x\in {\mathcal {H}}}F(x)+R(x)} if and only if x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) , {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right),} where γ > 0 {\displaystyle \gamma >0} is any positive real number. === Moreau decomposition === One important technique related to proximal gradient methods is the Moreau decomposition, which decomposes the identity operator as the sum of two proximity operators. Namely, let φ : X → R {\displaystyle \varphi :{\mathcal {X}}\to \mathbb {R} } be a lower semicontinuous, convex function on a vector space X {\displaystyle {\mathcal {X}}} . We define its Fenchel conjugate φ ∗ : X → R {\displaystyle \varphi ^{}:{\mathcal {X}}\to \mathbb {R} } to be the function φ ∗ ( u ) := sup x ∈ X ⟨ x , u ⟩ − φ ( x ) . {\displaystyle \varphi ^{}(u):=\sup _{x\in {\mathcal {X}}}\langle x,u\rangle -\varphi (x).} The general form of Moreau's decomposition states that for any x ∈ X {\displaystyle x\in {\mathcal {X}}} and any γ > 0 {\displaystyle \gamma >0} that x = prox γ φ ⁡ ( x ) + γ prox φ ∗ / γ ⁡ ( x / γ ) , {\displaystyle x=\operatorname {prox} _{\gamma \varphi }(x)+\gamma \operatorname {prox} _{\varphi ^{}/\gamma }(x/\gamma ),} which for γ = 1 {\displaystyle \gamma =1} implies that x = prox φ ⁡ ( x ) + prox φ ∗ ⁡ ( x ) {\displaystyle x=\operatorname {prox} _{\varphi }(x)+\operatorname {prox} _{\varphi ^{}}(x)} . The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections. In certain situations it may be easier to compute the proximity operator for the conjugate φ ∗ {\displaystyle \varphi ^{}} instead of the function φ {\displaystyle \varphi } , and therefore the Moreau decomposition can be applied. This is the case for group lasso. == Lasso regularization == Consider the regularized empirical risk minimization problem with square loss and with the ℓ 1 {\displaystyle \ell _{1}} norm as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{1},} where x i ∈ R d and y i ∈ R . {\displaystyle x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .} The ℓ 1 {\displaystyle \ell _{1}} regularization problem is sometimes referred to as lasso (least absolute shrinkage and selection operator). Such ℓ 1 {\displaystyle \ell _{1}} regularization problems are interesting because they induce sparse solutions, that is, solutions w {\displaystyle w} to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \|w\|_{0},} where ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", which is the number of nonzero entries of the vector w {\displaystyle w} . Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors. === Solving for L1 proximity operator === For simplicity we restrict our attention to the problem where λ = 1 {\displaystyle \lambda =1} . To solve the problem min w ∈ R d 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 + ‖ w ‖ 1 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\|w\|_{1},} we consider our objective function in two parts: a convex, differentiable term F ( w ) = 1 n ∑ i = 1 n ( y i − ⟨ w , x i ⟩ ) 2 {\displaystyle F(w)={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}} and a convex function R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} . Note that R {\displaystyle R} is not strictly convex. Let us compute the proximity operator for R ( w ) {\displaystyle R(w)} . First we find an alternative characterization of the proximity operator prox R ⁡ ( x ) {\displaystyle \operatorname {prox} _{R}(x)} as follows: u = prox R ⁡ ( x ) ⟺ 0 ∈ ∂ ( R ( u ) + 1 2 ‖ u − x ‖ 2 2 ) ⟺ 0 ∈ ∂ R ( u ) + u − x ⟺ x − u ∈ ∂ R ( u ) . {\displaystyle {\begin{aligned}u=\operatorname {prox} _{R}(x)\iff &0\in \partial \left(R(u)+{\frac {1}{2}}\|u-x\|_{2}^{2}\right)\\\iff &0\in \partial R(u)+u-x\\\iff &x-u\in \partial R(u).\end{aligned}}} For R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} it is easy to compute ∂ R ( w ) {\displaystyle \partial R(w)} : the i {\displaystyle i} th entry of ∂ R ( w ) {\displaystyle \partial R(w)} is precisely ∂ | w i | = { 1 , w i > 0 − 1 , w i < 0 [ − 1 , 1 ] , w i = 0. {\displaystyle \partial |w_{i}|={\begin{cases}1,&w_{i}>0\\-1,&w_{i}<0\\\left[-1,1\right],&w_{i}=0.\end{cases}}} Using the recharacterization of the proximity operator given above, for the choice of R ( w ) = ‖ w ‖ 1 {\displaystyle R(w)=\|w\|_{1}} and γ > 0 {\displaystyle \gamma >0} we have that prox γ R ⁡ ( x ) {\displaystyle \operatorname {prox} _{\gamma R}(x)} is defined entrywise by ( prox γ R ⁡ ( x ) ) i = { x i − γ , x i > γ 0 , | x i | ≤ γ x i + γ , x i < − γ , {\displaystyle \left(\operatorname {prox} _{\gamma R}(x)\right)_{i}={\begin{cases}x_{i}-\gamma ,&x_{i}>\gamma \\0,&|x_{i}|\leq \gamma \\x_{i}+\gamma ,&x_{i}<-\gamma ,\end{cases}}} which is known as the soft thresholding operator S γ ( x ) = prox γ ‖ ⋅ ‖ 1 ⁡ ( x ) {\displaystyle S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)} . === Fixed point iterative schemes === To finally solve the lasso problem we consider the fixed point equation shown earlier: x ∗ = prox γ R ⁡ ( x ∗ − γ ∇ F ( x ∗ ) ) . {\displaystyle x^{}=\operatorname {prox} _{\gamma R}\left(x^{}-\gamma \nabla F(x^{})\right).} Given that we have computed the form of the proximity operator explicitly, then we can define a standard fixed point iteration procedure. Namely, fix some initial w 0 ∈ R d {\displaystyle w^{0}\in \mathbb {R} ^{d}} , and for k = 1 , 2 , … {\displaystyle k=1,2,\ldots } define w k + 1 = S γ ( w k − γ ∇ F ( w k ) ) . {\displaystyle w^{k+1}=S_{\gamma }\left(w^{k}-\gamma \nabla F\l

    Read more →
  • Multi-task learning

    Multi-task learning

    Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints". In a widely cited 1997 paper, Rich Caruana gave the following characterization:Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. In the classification context, MTL aims to improve the performance of multiple classification tasks by learning them jointly. One example is a spam-filter, which can be treated as distinct but related classification tasks across different users. To make this more concrete, consider that different people have different distributions of features which distinguish spam emails from legitimate ones, for example an English speaker may find that all emails in Russian are spam, not so for Russian speakers. Yet there is a definite commonality in this classification task across users, for example one common feature might be text related to money transfer. Solving each user's spam classification problem jointly via MTL can let the solutions inform each other and improve performance. Further examples of settings for MTL include multiclass classification and multi-label classification. Multi-task learning works because regularization induced by requiring an algorithm to perform well on a related task can be superior to regularization that prevents overfitting by penalizing all complexity uniformly. One situation where MTL may be particularly helpful is if the tasks share significant commonalities and are generally slightly under sampled. However, as discussed below, MTL has also been shown to be beneficial for learning unrelated tasks. == Methods == The key challenge in multi-task learning, is how to combine learning signals from multiple tasks into a single model. This may strongly depend on how well different task agree with each other, or contradict each other. There are several ways to address this challenge: === Task grouping and overlap === Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a linear combination of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example, with sparsity, overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases. Task relatedness can be imposed a priori or learned from the data. Hierarchical task relatedness can also be exploited implicitly without assuming a priori knowledge or learning relations explicitly. For example, the explicit learning of sample relevance across tasks can be done to guarantee the effectiveness of joint learning across multiple domains. === Exploiting unrelated tasks: Auxiliary learning === In auxiliary learning, one attempts learning a group of principal tasks using a group of auxiliary tasks, unrelated to the principal ones. With the right unrelated tasks, joint learning of unrelated tasks which use the same input data have been shown to be beneficial, and provide significant improvement over standard MTL. The reason is that prior knowledge about task relatedness can lead to sparser and more informative representations for each task grouping, essentially by screening out idiosyncrasies of the data distribution. It has been proposed to build on a prior multitask methodology by favoring a shared low-dimensional representation within each task grouping, and imposing a penalty on tasks from different groups which encourages the two representations to be orthogonal. Learning with auxiliary unrelated tasks poses two major challenges: Finding useful auxiliary tasks and combining losses of all tasks in a useful way. Some methods can learn these from data together with the training process, and combine tasks efficiently. === Transfer of knowledge === Related to multi-task learning is the concept of knowledge transfer. Whereas traditional multi-task learning implies that a shared representation is developed concurrently across tasks, transfer of knowledge implies a sequentially shared representation. Large scale machine learning projects such as the deep convolutional neural network GoogLeNet, an image-based object classifier, can develop robust representations which may be useful to further algorithms learning related tasks. For example, the pre-trained model can be used as a feature extractor to perform pre-processing for another learning algorithm. Or the pre-trained model can be used to initialize a model with similar architecture which is then fine-tuned to learn a different classification task. === Multiple non-stationary tasks === Traditionally Multi-task learning and transfer of knowledge are applied to stationary learning settings. Their extension to non-stationary environments is termed Group online adaptive learning (GOAL). Sharing information could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to quickly adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. === Multi-task optimization === Multi-task optimization focuses on solving optimizing the whole process. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics. The key motivation behind multi-task optimization is that if optimization tasks are related to each other in terms of their optimal solutions or the general characteristics of their function landscapes, the search progress can be transferred to substantially accelerate the search on the other. The success of the paradigm is not necessarily limited to one-way knowledge transfers from simpler to more complex tasks. In practice an attempt is to intentionally solve a more difficult task that may unintentionally solve several smaller problems. There is a direct relationship between multitask optimization and multi-objective optimization. In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models. Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. It has been reported that meta-knowledge transfer could help avoid negative transfer.Besides, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. There are several common approaches for multi-task optimization: Bayesian optimization, evolutionary computation, and approaches based on Game theory. ==== Multi-task Bayesian optimization ==== Multi-task Bayesian optimization is a modern model-based approach that leverages the concept of knowledge transfer to speed up the automatic hyperparameter optimization process of machine learning algorithms. The method builds a multi-task Gaussian process model on the data originating from different searches progressing in tandem. The captured inter-task dependencies are thereafter utilized to better inform the subsequent sampling of candidate solutions in respective search spaces. ==== Evolutionary multi-tasking ==== Evolutionary multi-tasking has been explored as a means of exploiting the implicit parallelism of population-based search algorithms to simultaneously progress multiple distinct optimization tasks. By mapping all task

    Read more →
  • Space-based data center

    Space-based data center

    Space-based data centers or orbital AI infrastructure are proposed concepts to build AI data centers in the sun-synchronous orbit or other orbits utilizing space-based solar power. Electric power has become the main bottleneck for terrestrial AI infrastructure. Space-based edge computing has historical roots in military architectures designed to bypass the latency of ground-based targeting networks. In the 1980s, the Strategic Defense Initiative's Brilliant Pebbles program first envisioned autonomous on-orbit data processing for missile defense. In 2019, the Space Development Agency (SDA) began to revive this decentralized approach through its Proliferated Warfighter Space Architecture (PWSA). This ambitious "sensor-to-shooter" infrastructure is treated as a prerequisite for the modern Golden Dome program, which would rely on space-based data processing to continuously track targets. == History == Early thinking about space-based computing infrastructure grew out of mid-20th-century visions for large orbital industrial systems, most notably proposals for space-based solar power, which were popularized in both technical literature and science writing by figures such as Isaac Asimov in the 1940s. These ideas emphasized exploiting the vacuum, continuous solar energy, and thermal characteristics of space to support power-intensive activities that would be difficult or inefficient on Earth. In the 21st century, advances in small satellites, reusable launch vehicles, and high-performance computing revived interest in space-based data centers, with governments and private companies exploring orbital or near-space platforms for edge computing, secure data handling, and low-latency processing of Earth-observation data. In September 2024, Y Combinator-backed Starcloud released a white paper detailing plans to build multiple gigawatts of AI compute in orbit. It was the first widely cited proposal to actually start building large orbital data centers. In 2025, Starcloud deployed an NVIDIA H100-class system and became the first company to train an LLM in space and run a version of Google Gemini in space. In March 2025, Lonestar deployed a data backup machine on the surface of the moon. In early January 2026, a team from the University of Pennsylvania presented a tether-based architecture for orbital data centers at the AIAA SciTech conference. The design relied on gravity gradient tension and solar-pressure-based passive attitude stabilization to minimize the mass of MW-scale orbital data centers. In January 2026, SpaceX filed plans with the Federal Communications Commission (FCC) for millions of satellites, leveraging reusable launches and Starlink integration to extend cloud and AI computing into orbit. Around the same time, Blue Origin announced the TeraWave constellation of about 5,400 satellites, designed to provide high‑throughput networking for data centers, enterprise, and government customers. Meanwhile, China announced a 200,000‑satellite constellation, focusing on state coordination, data sovereignty, and in-orbit processing for secure, time-critical applications. In February 2026, Starcloud submitted a proposal to the FCC for a constellation of up to 88,000 satellites for orbital data centers. In March, it announced intentions to be the first to mine Bitcoin in space, flying bitcoin mining ASICs on its second satellite, Starcloud-2. In May 2026, Edge Aerospace was awarded a contract by the European Space Agency under its Space Cloud program to study use cases, architectures and implementation roadmap for orbital data centers. == Feasibility == In October 2025, Nature Electronics published a study led by a research group at Nanyang Technological University on the development of carbon-neutral data centres in space. In November 2025, Google published a feasibility study on space-based data centers. The authors argued that if launch costs to low earth orbit reached US$200/kg, the launch cost for data center satellites could be cost effective relative to current energy costs for ground-based data centers. They project this may occur around 2035 if SpaceX's Starship project scales to 180 launches/year by then. == Advantages == Some sun-synchronous orbit (SSO) planes have constant sunlight in the dawn/dusk which could provide continuous solar energy. SSO is a limited resource and proper management and sharing of it is required. Solar irradiance is 36% higher in Earth orbit than on the surface No Earth weather storms or clouds, however more exposed to Solar storms. No property tax or land-use regulation. Saves space for other land use. Ample space for scalability. Won't strain the power grid. Direct access to power source without additional infrastructure. == Disadvantages == The deployment of space-based data centers raises several technical, economic, and environmental concerns. Existing launch costs are substantial and remains main cost of space infrastructure deployment Cooling is limited to heat dissipation through radiation only, which made in inefficient in comparison to convection in terrestrial data centers Space infrastructure must be designed to survive launch and to work under environment conditions of radiation, wide range of temperatures, in vacuum and in microgravity In-space assembly is on early development stage to enable deployment of mega-structures Megastructures are particularly exposed to orbital debris Solar arrays efficiency decrease 0.5% to 0.8% per year due to exposure of ultraviolet rays, space weather and orbital thermal cycles Hardware is designed for limited lifespan. Maintenance and repair in space (known as On-Orbit Servicing (OOS)) is still on early stage of practical implementation. Disposable data centre: technology obsolescence of AI data centre being a concern and difficult maintenance in space imply the single-use purpose of those space data centres. To extend lifetime, space infrastructure will require either refueling or orbit rasie by the servicer, which is going to increase its operational costs The environmental impact on Earth has its own challenges: The environmental impact of launches need to be addressed. Deployment consumes Earth resources that cannot be recovered or recycled. Computers require lots of resources, some of which are strategic. Recycling e-waste is already a challenge on Earth and extremely unlikely in space. Space debris (orbit pollution) is another sustainability challenge for space: Orbits are, like any resources, a limited physical and electromagnetic resource and available for all mankind. The accumulation of satellites on a particular orbit reduces the use of space for other purposes. A consequence of the increase of satellite in orbit is a higher risk of the runaway of space debris (see Kessler syndrome). This means some orbits could become unusable. Latency and bandwidth are constrained in space, and consumes limited electromagnetic resources. Satellite flares could inhibit ground-based and space-based observational astronomy. == Size and power generated == It would take ~1 square mile solar array in earth orbit to produce 1 gigawatt of power at 30% cell efficiency. == Companies pursuing space-based AI infrastructure == Blue Origin Cowboy Space Corporation (formerly Aetherflux) Edge Aerospace Google – Project Suncatcher Nvidia OpenAI SpaceX Starcloud

    Read more →
  • Exploration–exploitation dilemma

    Exploration–exploitation dilemma

    The exploration–exploitation dilemma, also known as the explore–exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best option based on current knowledge of the system (which may be incomplete or misleading), while exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making problems whose goal is to maximize long-term benefits. == Application in machine learning == In the context of machine learning, the exploration–exploitation tradeoff is fundamental in reinforcement learning (RL), a type of machine learning that involves training agents to make decisions based on feedback from the environment. Crucially, this feedback may be incomplete or delayed. The agent must decide whether to exploit the current best-known policy or explore new policies to improve its performance. === Multi-armed bandit methods === The multi-armed bandit (MAB) problem was a classic example of the tradeoff, and many methods were developed for it, such as epsilon-greedy, Thompson sampling, and the upper confidence bound (UCB). See the page on MAB for details. In more complex RL situations than the MAB problem, the agent can treat each choice as a MAB, where the payoff is the expected future reward. For example, if the agent performs an epsilon-greedy method, then the agent will often "pull the best lever" by picking the action that had the best predicted expected reward (exploit). However, it would pick a random action with probability epsilon (explore). Monte Carlo tree search, for example, uses a variant of the UCB method. === Exploration problems === There are some problems that make exploration difficult. Sparse reward. If rewards occur only once a long while, then the agent might not persist in exploring. Furthermore, if the space of actions is large, then the sparse reward would mean the agent would not be guided by the reward to find a good direction for deeper exploration. A standard example is Montezuma's Revenge. Deceptive reward. If some early actions give immediate small reward, but other actions give later large reward, then the agent might be lured away from exploring the other actions. Noisy TV problem. If certain observations are irreducibly noisy (such as a television showing random images), then the agent might be trapped exploring those observations (watching the television). === Exploration reward === This section based on. The exploration reward (also called exploration bonus) methods convert the exploration-exploitation dilemma into a balance of exploitations. That is, instead of trying to get the agent to balance exploration and exploitation, exploration is simply treated as another form of exploitation, and the agent simply attempts to maximize the sum of rewards from exploration and exploitation. The exploration reward can be treated as a form of intrinsic reward. We write these as r t i , r t e {\displaystyle r_{t}^{i},r_{t}^{e}} , meaning the intrinsic and extrinsic rewards at time step t {\displaystyle t} . However, exploration reward is different from exploitation in two regards: The reward of exploitation is not freely chosen, but given by the environment, but the reward of exploration may be picked freely. Indeed, there are many different ways to design r t i {\displaystyle r_{t}^{i}} described below. The reward of exploitation is usually stationary (i.e. the same action in the same state gives the same reward), but the reward of exploration is non-stationary (i.e. the same action in the same state should give less and less reward). Count-based exploration uses N n ( s ) {\displaystyle N_{n}(s)} , the number of visits to a state s {\displaystyle s} during the time-steps 1 : n {\displaystyle 1:n} , to calculate the exploration reward. This is only possible in small and discrete state space. Density-based exploration extends count-based exploration by using a density model ρ n ( s ) {\displaystyle \rho _{n}(s)} . The idea is that, if a state has been visited, then nearby states are also partly-visited. In maximum entropy exploration, the entropy of the agent's policy π {\displaystyle \pi } is included as a term in the intrinsic reward. That is, r t i = − ∑ a π ( a | s t ) ln ⁡ π ( a | s t ) + ⋯ {\displaystyle r_{t}^{i}=-\sum _{a}\pi (a|s_{t})\ln \pi (a|s_{t})+\cdots } . === Prediction-based === This section based on. The forward dynamics model is a function for predicting the next state based on the current state and the current action: f : ( s t , a t ) ↦ s t + 1 {\displaystyle f:(s_{t},a_{t})\mapsto s_{t+1}} . The forward dynamics model is trained as the agent plays. The model becomes better at predicting state transition for state-action pairs that had been done many times. A forward dynamics model can define an exploration reward by r t i = ‖ f ( s t , a t ) − s t + 1 ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-s_{t+1}\|_{2}^{2}} . That is, the reward is the squared-error of the prediction compared to reality. This rewards the agent to perform state-action pairs that had not been done many times. This is however susceptible to the noisy TV problem. Dynamics model can be run in latent space. That is, r t i = ‖ f ( s t , a t ) − ϕ ( s t + 1 ) ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-\phi (s_{t+1})\|_{2}^{2}} for some featurizer ϕ {\displaystyle \phi } . The featurizer can be the identity function (i.e. ϕ ( x ) = x {\displaystyle \phi (x)=x} ), randomly generated, the encoder-half of a variational autoencoder, etc. A good featurizer improves forward dynamics exploration. The Intrinsic Curiosity Module (ICM) method trains simultaneously a forward dynamics model and a featurizer. The featurizer is trained by an inverse dynamics model, which is a function for predicting the current action based on the features of the current and the next state: g : ( ϕ ( s t ) , ϕ ( s t + 1 ) ) ↦ a t {\displaystyle g:(\phi (s_{t}),\phi (s_{t+1}))\mapsto a_{t}} . By optimizing the inverse dynamics, both the inverse dynamics model and the featurizer are improved. Then, the improved featurizer improves the forward dynamics model, which improves the exploration of the agent. Random Network Distillation (RND) method attempts to solve this problem by teacher–student distillation. Instead of a forward dynamics model, it has two models f , f ′ {\displaystyle f,f'} . The f ′ {\displaystyle f'} teacher model is fixed, and the f {\displaystyle f} student model is trained to minimize ‖ f ( s ) − f ′ ( s ) ‖ 2 2 {\displaystyle \|f(s)-f'(s)\|_{2}^{2}} on states s {\displaystyle s} . As a state is visited more and more, the student network becomes better at predicting the teacher. Meanwhile, the prediction error is also an exploration reward for the agent, and so the agent learns to perform actions that result in higher prediction error. Thus, we have a student network attempting to minimize the prediction error, while the agent attempting to maximize it, resulting in exploration. The states are normalized by subtracting a running average and dividing a running variance, which is necessary since the teacher model is frozen. The rewards are normalized by dividing with a running variance. Exploration by disagreement trains an ensemble of forward dynamics models, each on a random subset of all ( s t , a t , s t + 1 ) {\displaystyle (s_{t},a_{t},s_{t+1})} tuples. The exploration reward is the variance of the models' predictions. === Noise === For neural network–based agents, the NoisyNet method changes some of its neural network modules by noisy versions. That is, some network parameters are random variables from a probability distribution. The parameters of the distribution are themselves learnable. For example, in a linear layer y = W x + b {\displaystyle y=Wx+b} , both W , b {\displaystyle W,b} are sampled from Gaussian distributions N ( μ W , Σ W ) , N ( μ b , Σ b ) {\displaystyle {\mathcal {N}}(\mu _{W},\Sigma _{W}),{\mathcal {N}}(\mu _{b},\Sigma _{b})} at every step, and the parameters μ W , Σ W , μ b , Σ b {\displaystyle \mu _{W},\Sigma _{W},\mu _{b},\Sigma _{b}} are learned via the reparameterization trick.

    Read more →
  • Psychology of reasoning

    Psychology of reasoning

    The psychology of reasoning (also known as the cognitive science of reasoning) is the study of how people reason, often broadly defined as the process of drawing conclusions to inform how people solve problems and make decisions. It overlaps with psychology, philosophy, linguistics, cognitive science, artificial intelligence, logic, and probability theory. Psychological experiments on how humans and other animals reason have been carried out for over 100 years. An enduring question is whether or not people have the capacity to be rational. Current research in this area addresses various questions about reasoning, rationality, judgments, intelligence, relationships between emotion and reasoning, and development. == Everyday reasoning == One of the most obvious areas in which people employ reasoning is with sentences in everyday language. Most experimentation on deduction has been carried out on hypothetical thought, in particular, examining how people reason about conditionals, e.g., If A then B. Participants in experiments make the modus ponens inference, given the indicative conditional If A then B, and given the premise A, they conclude B. However, given the indicative conditional and the minor premise for the modus tollens inference, not-B, about half of the participants in experiments conclude not-A and the remainder concludes that nothing follows. The ease with which people make conditional inferences is affected by context, as demonstrated in the well-known selection task developed by Peter Wason. Participants are better able to test a conditional in an ecologically relevant context, e.g., if the envelope is sealed then it must have a 50 cent stamp on it compared to one that contains symbolic content, e.g., if the letter is a vowel then the number is even. Background knowledge can also lead to the suppression of even the simple modus ponens inference Participants given the conditional if Lisa has an essay to write then she studies late in the library and the premise Lisa has an essay to write make the modus ponens inference 'she studies late in the library', but the inference is suppressed when they are also given a second conditional if the library stays open then she studies late in the library. Interpretations of the suppression effect are controversial Other investigations of propositional inference examine how people think about disjunctive alternatives, e.g., A or else B, and how they reason about negation, e.g., It is not the case that A and B. Many experiments have been carried out to examine how people make relational inferences, including comparisons, e.g., A is better than B. Such investigations also concern spatial inferences, e.g. A is in front of B and temporal inferences, e.g. A occurs before B. Other common tasks include categorical syllogisms, used to examine how people reason about quantifiers such as All or Some, e.g., Some of the A are not B. For example if all A are B and some B are C, what (if anything) follows? == Theories of reasoning == There are several alternative theories of the cognitive processes that human reasoning is based on. One view is that people rely on a mental logic consisting of formal (abstract or syntactic) inference rules similar to those developed by logicians in the propositional calculus. Another view is that people rely on domain-specific or content-sensitive rules of inference. A third view is that people rely on mental models, that is, mental representations that correspond to imagined possibilities. A fourth view is that people compute probabilities. One controversial theoretical issue is the identification of an appropriate competence model, or a standard against which to compare human reasoning. Initially classical logic was chosen as a competence model. Subsequently, some researchers opted for non-monotonic logic and Bayesian probability. Research on mental models and reasoning has led to the suggestion that people are rational in principle but err in practice. Connectionist approaches towards reasoning have also been proposed. Despite the ongoing debate about the cognitive processes involved in human reasoning, recent research has shown that multiple approaches can be useful in modeling human thinking. For instance, studies have found that people's reasoning is often influenced by their prior beliefs, which can be modeled using Bayesian probability theory. Additionally, research on mental models has shown that people tend to reason about problems by constructing multiple mental representations of the situation, which can help them to identify relevant features and make inferences based on their understanding of the problem. Moreover, connectionist approaches to reasoning have also gained attention, which focus on the neural network models that can learn from data and generalize to new situations. == Development of reasoning == It is an active question in psychology how, why, and when the ability to reason develops from infancy to adulthood. Jean Piaget's theory of cognitive development posited general mechanisms and stages in the development of reasoning from infancy to adulthood. According to the neo-Piagetian theories of cognitive development, changes in reasoning with development come from increasing working memory capacity, increasing speed of processing, and enhanced executive functions and control. Increasing self-awareness is also an important factor. In their book The Enigma of Reason, the cognitive scientists Hugo Mercier and Dan Sperber put forward an "argumentative" theory of reasoning, claiming that humans evolved to reason primarily to justify our beliefs and actions and to convince others in a social environment. Key evidence for their theory includes the errors in reasoning that solitary individuals are prone to when their arguments are not criticized, such as logical fallacies, and how groups become much better at performing cognitive reasoning tasks when they communicate with one another and can evaluate each other's arguments. Sperber and Mercier offer one attempt to resolve the apparent paradox that the confirmation bias is so strong despite the function of reasoning naively appearing to be to come to veridical conclusions about the world. The study of the development of reasoning abilities is an ongoing area of research in psychology, and multiple factors have been proposed to explain how, why, and when reasoning develops from infancy to adulthood. Recent research has suggested that early experiences and social interactions play a critical role in the development of reasoning abilities. For example, studies have shown that infants as young as six months old can engage in basic logical reasoning, such as reasoning about the relationship between objects and their properties. Furthermore, research has highlighted the importance of parental interaction and cognitive stimulation in the development of children's reasoning abilities. Additionally, studies have suggested that cultural factors, such as educational practices and the emphasis on critical thinking, can also influence the development of reasoning skills across different populations. == Different sorts of reasoning == Philip Johnson-Laird trying to taxonomize thought, distinguished between goal-directed thinking and thinking without goal, noting that association was involved in unrelated reading. He argues that goal directed reasoning can be classified based on the problem space involved in a solution, citing Allen Newell and Herbert A. Simon. Inductive reasoning makes broad generalizations from specific cases or observations. In this process of reasoning, general assertions are made based on past specific pieces of evidence. This kind of reasoning allows the conclusion to be false even if the original statement is true. For example, if one observes a college athlete, one makes predictions and assumptions about other college athletes based on that one observation. Scientists use inductive reasoning to create theories and hypotheses. Philip Johnson-Laird distinguished inductive from deductive reasoning, in that the former creates semantic information while the later does not . In opposition, deductive reasoning is a basic form of valid reasoning. In this reasoning process a person starts with a known claim or a general belief and from there asks what follows from these foundations or how will these premises influence other beliefs. In other words, deduction starts with a hypothesis and examines the possibilities to reach a conclusion. Deduction helps people understand why their predictions are wrong and indicates that their prior knowledge or beliefs are off track. An example of deduction can be seen in the scientific method when testing hypotheses and theories. Although the conclusion usually corresponds and therefore proves the hypothesis, there are some cases where the conclusion is logical, but the generalization is not. For example, the argument, "All young girls wear skirts; Julie is a young

    Read more →
  • Similarity learning

    Similarity learning

    Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

    Read more →
  • TinyML

    TinyML

    TinyML (short for tiny machine learning) is an area of machine learning that focuses on deploying and running models on low-power, resource-constrained embedded systems such as microcontrollers and edge devices. TinyML supports on-device inference with low latency and minimal reliance on cloud connectivity, which makes it suitable for applications in the Internet of Things (IoT), wearable devices, and real-time systems. == History == The idea of running machine learning models on embedded systems has gained traction in the late 2010s, as model compression, quantization, and efficient neural network architectures progressed. The term TinyML was popularized in 2019 with the publication of the book TinyML by Pete Warden and Daniel Situnayake and the creation of the TinyML Foundation.

    Read more →
  • Computational humor

    Computational humor

    Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996. The first "computer model of a sense of humor" was suggested by Suslov as early as 1992. Investigation of the general scheme of the information processing show a possibility of a specific malfunction, conditioned by the necessity of a quick deletion from consciousness of a false version. This specific malfunction can be identified with a humorous effect on the psychological grounds; however, an essentially new ingredient, a role of timing, is added to a well known role of ambiguity. In biological systems, a sense of humour inevitably develops in the course of evolution, because its biological function consists in quickening the transmission of processed information into consciousness and in a more effective use of brain resources. A realization of this algorithm in neural networks explains naturally the mechanism of laughter: deletion of a false version corresponds to zeroing of some part of the neural network and excessive energy of neurons is thrown out to the motor cortex, arousing muscular contractions. Unfortunately, a practical realization of this algorithm needs extensive databases, whose creation in the automatic regime was suggested only recently . As a result, this magistral direction was not developed properly and subsequent investigations (see below) accepted somewhat specialized colouring. == Joke generators == === Pun generation === An approach to analysis of humor is classification of jokes. A further step is an attempt to generate jokes basing on the rules that underlie classification. Simple prototypes for computer pun generation were reported in the early 1990s, based on a natural language generator program, VINCI. Graeme Ritchie and Kim Binsted in their 1994 research paper described a computer program, JAPE, designed to generate question-answer-type puns from a general, i.e., non-humorous, lexicon. (The program name is an acronym for "Joke Analysis and Production Engine".) Some examples produced by JAPE are: Q: What is the difference between leaves and a car? A: One you brush and rake, the other you rush and brake. Q: What do you call a strange market? A: A bizarre bazaar. Since then the approach has been improved, and the latest report, dated 2007, describes the STANDUP joke generator, implemented in the Java programming language. The STANDUP generator was tested on children within the framework of analyzing its usability for language skills development for children with communication disabilities, e.g., because of cerebral palsy. (The project name is an acronym for "System To Augment Non-speakers' Dialog Using Puns" and an allusion to standup comedy.) Children responded to this "language playground" with enthusiasm, and showed marked improvement on certain types of language tests. The two young people, who used the system over a ten-week period, regaled their peers, staff, family and neighbors with jokes such as: "What do you call a spicy missile? A hot shot!" Their joy and enthusiasm at entertaining others was inspirational. === Other === Stock and Strapparava described a program to generate funny acronyms. == Joke recognition == A statistical machine learning algorithm to detect whether a sentence contained a "That's what she said" double entendre was developed by Kiddon and Brun (2011). There is an open-source Python implementation of Kiddon & Brun's TWSS system. A program to recognize knock-knock jokes was reported by Taylor and Mazlack. This kind of research is important in analysis of human–computer interaction. An application of machine learning techniques for the distinguishing of joke texts from non-jokes was described by Mihalcea and Strapparava (2006). Takizawa et al. (1996) reported on a heuristic program for detecting puns in the Japanese language. == Applications == A possible application for assistance in language acquisition is described in the section "Pun generation". Another envisioned use of joke generators is in cases of a steady supply of jokes where quantity is more important than quality. Another obvious, yet remote, direction is automated joke appreciation. It is known that humans interact with computers in ways similar to interacting with other humans that may be described in terms of personality, politeness, flattery, and in-group favoritism. Therefore, the role of humor in human–computer interaction is being investigated. In particular, humor generation in user interface to ease communications with computers was suggested. Craig McDonough implemented the Mnemonic Sentence Generator, which converts passwords into humorous sentences. Based on the incongruity theory of humor, it is suggested that the resulting meaningless but funny sentences are easier to remember. For example, the password AjQA3Jtv is converted into "Arafat joined Quayle's Ant, while TARAR Jeopardized thurmond's vase," an example chosen by combining politicians names with verbs and common nouns. == Related research == John Allen Paulos is known for his interest in mathematical foundations of humor. His book Mathematics and Humor: A Study of the Logic of Humor demonstrates structures common to humor and formal sciences (mathematics, linguistics) and develops a mathematical model of jokes based on catastrophe theory. Conversational systems which have been designed to take part in Turing test competitions generally have the ability to learn humorous anecdotes and jokes. Because many people regard humor as something particular to humans, its appearance in conversation can be quite useful in convincing a human interrogator that a hidden entity, which could be a machine or a human, is in fact a human.

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • Inductive probability

    Inductive probability

    Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world. There are three sources of knowledge: inference, communication, and deduction. Communication relays information found using other methods. Deduction establishes new facts based on existing facts. Inference establishes new facts from data. Its basis is Bayes' theorem. Information describing the world is written in a language. For example, a simple mathematical language of propositions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer it is possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the most commonly used sentences are the shortest. This internal language implicitly represents probabilities of statements. Occam's razor says the "simplest theory, consistent with the data is most likely to be correct". The "simplest theory" is interpreted as the representation of the theory written in this internal language. The theory with the shortest encoding in this internal language is most likely to be correct. == History == Probability and statistics was focused on probability distributions and tests of significance. Probability was formal, well defined, but limited in scope. In particular its application was limited to situations that could be defined as an experiment or trial, with a well defined population. Bayes's theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application of probability to many situations where a population was not well defined. But Bayes' theorem always depended on prior probabilities, to generate new probabilities. It was unclear where these prior probabilities should come from. Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964. Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developed the minimum description length circa 1978. These methods allow information theory to be related to probability, in a way that can be compared to the application of Bayes' theorem, but which give a source and explanation for the role of prior probabilities. Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998. === Minimum description/message length === The program with the shortest length that matches the data is the most likely to predict future data. This is the thesis behind the minimum message length and minimum description length methods. At first sight Bayes' theorem appears different from the minimimum message/description length principle. At closer inspection it turns out to be the same. Bayes' theorem is about conditional probabilities, and states the probability that event B happens if firstly event A happens: P ( A ∧ B ) = P ( B ) ⋅ P ( A | B ) = P ( A ) ⋅ P ( B | A ) {\displaystyle P(A\land B)=P(B)\cdot P(A|B)=P(A)\cdot P(B|A)} becomes in terms of message length L, L ( A ∧ B ) = L ( B ) + L ( A | B ) = L ( A ) + L ( B | A ) . {\displaystyle L(A\land B)=L(B)+L(A|B)=L(A)+L(B|A).} This means that if all the information is given describing an event then the length of the information may be used to give the raw probability of the event. So if the information describing the occurrence of A is given, along with the information describing B given A, then all the information describing A and B has been given. ==== Overfitting ==== Overfitting occurs when the model matches the random noise and not the pattern in the data. For example, take the situation where a curve is fitted to a set of points. If a polynomial with many terms is fitted then it can more closely represent the data. Then the fit will be better, and the information needed to describe the deviations from the fitted curve will be smaller. Smaller information length means higher probability. However, the information needed to describe the curve must also be considered. The total information for a curve with many terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information to describe the polynomial. === Inference based on program complexity === Solomonoff's theory of inductive inference is also inductive inference. A bit string x is observed. Then consider all programs that generate strings starting with x. Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x. The method used here to give probabilities for inductive inference is based on Solomonoff's theory of inductive inference. ==== Detecting patterns in the data ==== If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1 also. This is described as learning from, or detecting a pattern in the data. Such a pattern may be represented by a computer program. A short computer program may be written that produces a series of bits which are all 1. If the length of the program K is L ( K ) {\displaystyle L(K)} bits then its prior probability is, P ( K ) = 2 − L ( K ) {\displaystyle P(K)=2^{-L(K)}} The length of the shortest program that represents the string of bits is called the Kolmogorov complexity. Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortest program some programs may go into an infinite loop. ==== Considering all theories ==== The Greek philosopher Epicurus is quoted as saying "If more than one theory is consistent with the observations, keep all theories". As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probability all programs must be considered in determining the likely future bits arising from the stream of bits. Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern of bits is random (has no pattern) is 2 − n {\displaystyle 2^{-n}} . Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with a probability of 2 − k {\displaystyle 2^{-k}} where k is the length of the program. The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probability of receiving y given x, which is the probability of x with y appended, divided by the probability of x. ==== Universal priors ==== The programming language affects the predictions of the next bit in the string. The language acts as a prior probability. This is particularly a problem where the programming language codes for numbers and other data types. Intuitively we think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex than numbers that may be composite. Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number. As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, after applying some transformation function to the raw numbers. Another agent might have the same input device with a different transformation function. The agents do not see or know about these transformation functions. Then there appears no rational basis for preferring one function over another. A universal prior insures that although two agents may have different initial probability distributions for the data input, the difference will be bounded by a constant. So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event in a language, either using a natural language or other, the language has encoded in it our prior expectations. So some reliance on prior probabilities are inevitable. A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem. === Universal artificial intelligence === The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows how the best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence. It is a fundamental theory of intelligence, which optimizes the agents behavior in, Exploring the environment; performing actions to get responses that broaden the agents knowledge. Competing or co-operating with another agent; games. Balancing short and long term rewards. In general no agent will always provi

    Read more →
  • Real-time computer graphics

    Real-time computer graphics

    Real-time computer graphics or real-time rendering is the sub-field of computer graphics focused on producing and analyzing images in real time. The term can refer to anything from rendering an application's graphical user interface (GUI) to real-time image analysis, but is most often used in reference to interactive 3D computer graphics, typically using a graphics processing unit (GPU). One example of this concept is a video game that rapidly renders changing 3D environments to produce an illusion of motion. Computers have been capable of generating 2D images such as simple lines, images and polygons in real time since their invention. However, quickly rendering detailed 3D objects is a daunting task for traditional Von Neumann architecture-based systems. An early workaround to this problem was the use of sprites, 2D images that could imitate 3D graphics. Different techniques for rendering now exist, such as ray-tracing and rasterization. Using these techniques and advanced hardware, computers can now render images quickly enough to create the illusion of motion while simultaneously accepting user input. This means that the user can respond to rendered images in real time, producing an interactive experience. == Principles of real-time 3D computer graphics == The goal of computer graphics is to generate computer-generated images, or frames, using certain desired metrics. One such metric is the number of frames generated in a given second. Real-time computer graphics systems differ from traditional (i.e., non-real-time) rendering systems in that non-real-time graphics typically rely on ray tracing. In this process, millions or billions of rays are traced from the camera to the world for detailed rendering—this expensive operation can take hours or days to render a single frame. Real-time graphics systems must render each image in less than 1/30th of a second. Ray tracing is far too slow for these systems; instead, they employ the technique of z-buffer triangle rasterization. In this technique, every object is decomposed into individual primitives, usually triangles. Each triangle gets positioned, rotated and scaled on the screen, and rasterizer hardware (or a software emulator) generates pixels inside each triangle. These triangles are then decomposed into atomic units called fragments that are suitable for displaying on a display screen. The fragments are drawn on the screen using a color that is computed in several steps. For example, a texture can be used to "paint" a triangle based on a stored image, and then shadow mapping can alter that triangle's colors based on line-of-sight to light sources. === Video game graphics === Real-time graphics optimizes image quality subject to time and hardware constraints. GPUs and other advances increased the image quality that real-time graphics can produce. GPUs are capable of handling millions of triangles per frame, and modern DirectX/OpenGL class hardware is capable of generating complex effects, such as shadow volumes, motion blurring, and triangle generation, in real-time. The advancement of real-time graphics is evidenced in the progressive improvements between actual gameplay graphics and the pre-rendered cutscenes traditionally found in video games. Cutscenes are typically rendered in real-time—and may be interactive. Although the gap in quality between real-time graphics and traditional off-line graphics is narrowing, offline rendering remains much more accurate. === Advantages === Real-time graphics are typically employed when interactivity (e.g., player feedback) is crucial. When real-time graphics are used in films, the director has complete control of what has to be drawn on each frame, which can sometimes involve lengthy decision-making. Teams of people are typically involved in the making of these decisions. In real-time computer graphics, the user typically operates an input device to influence what is about to be drawn on the display. For example, when the user wants to move a character on the screen, the system updates the character's position before drawing the next frame. Usually, the display's response-time is far slower than the input device—this is justified by the immense difference between the (fast) response time of a human being's motion and the (slow) perspective speed of the human visual system. This difference has other effects too: because input devices must be very fast to keep up with human motion response, advancements in input devices (e.g., the current Wii remote) typically take much longer to achieve than comparable advancements in display devices. Another important factor controlling real-time computer graphics is the combination of physics and animation. These techniques largely dictate what is to be drawn on the screen—especially where to draw objects in the scene. These techniques help realistically imitate real world behavior (the temporal dimension, not the spatial dimensions), adding to the computer graphics' degree of realism. Real-time previewing with graphics software, especially when adjusting lighting effects, can increase work speed. Some parameter adjustments in fractal generating software may be made while viewing changes to the image in real time. == Rendering pipeline == The graphics rendering pipeline ("rendering pipeline" or simply "pipeline") is the foundation of real-time graphics. Its main function is to render a two-dimensional image in relation to a virtual camera, three-dimensional objects (an object that has width, length, and depth), light sources, lighting models, textures and more. === Architecture === The architecture of the real-time rendering pipeline can be divided into conceptual stages: application, geometry and rasterization. === Application stage === The application stage is responsible for generating "scenes", or 3D settings that are drawn to a 2D display. This stage is implemented in software that developers optimize for performance. This stage may perform processing such as collision detection, speed-up techniques, animation and force feedback, in addition to handling user input. Collision detection is an example of an operation that would be performed in the application stage. Collision detection uses algorithms to detect and respond to collisions between (virtual) objects. For example, the application may calculate new positions for the colliding objects and provide feedback via a force feedback device such as a vibrating game controller. The application stage also prepares graphics data for the next stage. This includes texture animation, animation of 3D models, animation via transforms, and geometry morphing. Finally, it produces primitives (points, lines, and triangles) based on scene information and feeds those primitives into the geometry stage of the pipeline. === Geometry stage === The geometry stage manipulates polygons and vertices to compute what to draw, how to draw it and where to draw it. Usually, these operations are performed by specialized hardware or GPUs. Variations across graphics hardware mean that the "geometry stage" may actually be implemented as several consecutive stages. ==== Model and view transformation ==== Before the final model is shown on the output device, the model is transformed onto multiple spaces or coordinate systems. Transformations move and manipulate objects by altering their vertices. Transformation is the general term for the four specific ways that manipulate the shape or position of a point, line or shape. ==== Lighting ==== In order to give the model a more realistic appearance, one or more light sources are usually established during transformation. However, this stage cannot be reached without first transforming the 3D scene into view space. In view space, the observer (camera) is typically placed at the origin. If using a right-handed coordinate system (which is considered standard), the observer looks in the direction of the negative z-axis with the y-axis pointing upwards and the x-axis pointing to the right. ==== Projection ==== Projection is a transformation used to represent a 3D model in a 2D space. The two main types of projection are orthographic projection (also called parallel) and perspective projection. The main characteristic of an orthographic projection is that parallel lines remain parallel after the transformation. Perspective projection utilizes the concept that if the distance between the observer and model increases, the model appears smaller than before. Essentially, perspective projection mimics human sight. ==== Clipping ==== Clipping is the process of removing primitives that are outside of the view box in order to facilitate the rasterizer stage. Once those primitives are removed, the primitives that remain will be drawn into new triangles that reach the next stage. ==== Screen mapping ==== The purpose of screen mapping is to find out the coordinates of the primitives during the clipping stage. ==== Rasterizer stage ==== The rasterizer

    Read more →
  • Slopaganda

    Slopaganda

    Slopaganda is a portmanteau of "AI slop" and "propaganda", referring to AI-generated content designed to manipulate beliefs, emotions, and political decision-making at scale. The term is credited to Michał Klincewicz, an assistant professor in the Department of Computational Cognitive Science at Tilburg University, in 2025. == Definition == Slopaganda is distinguished from traditional propaganda by three features: scale, scope, and speed. Generative AI makes it possible to produce large volumes of content quickly and at low cost, allows for highly personalised and targeted messaging to specific sub-audiences, and leverages the hyper-connectivity of social networks to accelerate dissemination beyond what conventional media could achieve. Unlike traditional propaganda, which delivers a uniform message to all recipients, slopaganda can be micro-targeted — tailored to individuals based on estimated prior beliefs to reinforce political biases or emotional associations. The authors note that it need not aim at literal deception: much slopaganda is expressive rather than truth-apt, designed to create emotional associations rather than false factual beliefs. == Relation to AI slop == Slopaganda is a subset of AI slop — low-quality, mass-produced AI-generated content — distinguished by intent. Where AI slop may be produced indifferently for commercial or engagement-farming purposes, slopaganda is deployed with a deliberate political or ideological goal. == Notable examples == Examples discussed by the term's originators include Donald Trump's prolific use of AI in Truth Social posts and Iranian Lego-themed music videos. AI-generated videos posted by the White House mixing real military footage with clips from films and video games; and deepfake audio imitating political candidates during the 2024 US presidential campaign have also been given the label slopaganda.

    Read more →
  • Artificial reproduction

    Artificial reproduction

    Artificial reproduction is the re-creation of life brought about by means other than natural ones. It is new life built by human plans and projects. Examples include artificial selection, artificial insemination, in vitro fertilization, artificial womb, artificial cloning, and kinematic replication. Artificial reproduction is one aspect of artificial life. Artificial reproduction can be categorized into one of two classes according to its capacity to be self-sufficient: non-assisted reproductive technology and assisted reproductive technology. Cutting plants' stems and placing them in compost is a form of assisted artificial reproduction, xenobots are an example of a more autonomous type of reproduction, while the artificial womb presented in the movie the Matrix illustrates a non assisted hypothetical technology. The idea of artificial reproduction has led to various technologies. == Theology == Humans have aspired to create life since immemorial times. Most theologies and religions have conceived this possibility as exclusive of deities. Christian religions consider the possibility of artificial reproduction, in most cases, as heretical and sinful. == Philosophy == Although ancient Greek philosophy raised the concept that man could imitate the creative capacity of nature, classic Greeks thought that if possible, human beings would reproduce things as nature does, and vice versa, nature would do the things that man does in the same way. Aristotle, for example, wrote that if nature made tables, it would make them just as men do. In other words, Aristotle said that if nature were to create a table, such table will look like a human-made table. Correspondingly, Descartes envisioned the human body, and nature, as a machine. Cartesian philosophy does not stop seeing a perfect mirror between nature and the artificial. However, Kant revolutionized this old idea by criticizing such naturalism. Kant pedagogically wrote: "Reason, in order to be taught by nature, must approach nature with its principles in one hand, according to which the agreement among appearances can count as laws, and, in the other hand, the experiment thought out in accord with these principles—in order to be instructed by nature not like a pupil, who has recited to him whatever the teacher wants to say, but like an appointed judge who compels witnesses to answer the questions he puts to them.". Humans are not instructed by nature but rather use nature as raw material to invent. Humans find alternatives to the natural restrictions imposed by natural laws thus, nature is not necessarily mirrored. In accordance with Kant (and contrary to what Aristotle thought) Karl Marx, Alfred Whitehead, Jaques Derrida and Juan David García Bacca noticed that nature is incapable of reproducing tables; or airplanes, or submarines, or computers. If nature tried to create airplanes, it would produce birds. If nature tried to create submarines, it would get fishes. If nature tried to create computers, brains would grow. And if nature tried to create man, modern man, monkeys will be evolved. According to Whitehead, if we look for something natural in artificial life, in the most elaborate cases, if anything, only atoms remain natural. Juan David Garcia Bacca summarized, “It will not come out from wood, it will not be born, a galley; from clay, a vessel; from linen, a dress; from iron, a lever,...From natural, artificial. In the artificial, the natural is reduced to a simple raw material, even though it is perfectly specified with natural specification. The artificial is the real, positive, and original negation of the natural: of species, of genus and of essence. Thus, its ontology is superior to natural ontology. And for this very reason Marx did not attach any importance to Darwin, whose evolutionism is confined to the natural order: to changes, at most, from variety to variety, from species to species... natural. For the same reason, nature has no dialectics, even though continuous evolution and selection can occur. The dialectic cannot emerge from the natural, for deeper reasons than, using today's terms, from a bird, an airplane cannot emerge; from fish, a submarine; from ears, a telephone; from eyes, a television; from a brain, a digital computer; from feet, a car; from hands, an engine; from Euclid, Descartes; from Aristotle, Newton; from Plato, Marx.” According to García Bacca, the major difference between natural causes and artificial causes is that nature does not have plans and projects, while humans design things following plans and projects. In contrast, other influential authors such as Michael Behe have depicted the concept and promoted the idea of intelligent design, a notion that has aroused several doubts and heated controversies, as it reframe natural causes in accordance with a natural plan. Previous ideas that have also provided a positive 'sense' to natural reproduction, are orthogenesis, syntropy, orgone and morphic resonance, among others. Although, these ideas have been historically marginalized and often called pseudoscience, recently Bio-semioticians are reconsidering some of them under symbolic approaches. Current metaphysics of science actually recognizes that the artificial ways of reproduction are diverse from nature, i.e., unnatural, anti-natural or supernatural. Because Biosemiotics does not focus on the function of life but on its meaning, it has a better understanding of the artificial than classic biology. == Science == Biology, being the study of cellular life, addresses reproduction in terms of growth and cellular division (i.e., binary fission, mitosis and meiosis); however, the science of artificial reproduction is not restricted by the mirroring of these natural processes.The science of artificial reproduction is actually transcending the natural forms, and natural rules, of reproduction. For example, xenobots have redefined the classical conception of reproduction. Although xenobots are made of eukariotic cells they do not reproduce by mitosis, but rather by kinematic replication. Such constructive replication does not involve growing but rather building. == Assisted reproductive technologies == Assisted reproductive technology (ART)'s purpose is to assist the development of a human embryo, commonly because of medical concerns due to fertility limitations. == Non-assisted reproductive technologies == Non-assisted reproductive technologies (NART) could have medical motivations but are mostly driven by a wider heterotopic ambition. Although, NARTs are initially designed by humans, they are programed to become independent of humans to a relative or absolute extent. James Lovelock proposed that such novelties could overcome humans. === Artificial cloning === Cloning is the cellular reproductive processes where two or more genetically identical organisms are created, either by natural or artificial means. Artificial cloning normally involves editing the genetic code, somatic cell nuclear transfer and 3D bioprinting. === Non-assisted artificial womb === A non-assisted artificial womb or artificial uterus is a device that allow for ectogenesis or extracorporeal pregnancy by growing an embryonic form outside the body of an organism (that would normally carry the embryo to term) without any human assistance. The aspect of non-assistance is the key distinction between the current artificial womb technology (AWT) in modern medical research, which still relies on human assistance. With this non-assisted hypothetical technology, a zygote or stem cells are used to create an embryo that is then incubated and monitored by artificial intelligence (AI) within a chamber composed of biocompatible material. The AI maintains the necessary conditions for the embryo to develop and thrive, proceeding to mimic organic labor and childbirth in order to best help the embryo adjust to the outside world. Ectogenesis—gestation, depicted in the science fiction movie The Matrix, is a fast approaching reality. This type of innovation presupposes that vertebrate wombs are not the only way for bearing humans or other similar forms of life. === Kinematic replication === Self-replication without binary fission, meiosis, mitosis (or any other form of cellular reproduction that involves division and growing) can be achieved. Xenobots are an example of kinematic replication. They are biobots, named after the African clawed frog (Xenopus laevis). Xenobots are cellular life forms designed by using artificial intelligence to build more of themselves by combining frog cells in a liquid medium. The term kinematic replication is usually reserved for biomolecules (e.g. DNA, RNA, prions, etc.) and artificially designed cellular forms (e.g. xenobots). === Machine constructive replication === Machine constructive replication mimics human traditional manufacturing but is entirely self-automated. Such constructive replication is a more general form of kinematic replication, which does not necessarily

    Read more →