AI Coding Interview Questions

AI Coding Interview Questions — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • LTX (text-to-video model)

    LTX (text-to-video model)

    LTX is a family of open source artificial intelligence video foundation models developed by Lightricks, and first released in November 2024. The latest models, LTX-2, create videos based on user prompts. They were preceded by LTX Video, which was released in 2024 as the company's first text-to-video model. LTX-2 is part of the LTX family of video generation models, which form the core technology, alongside LTX Studio, of the LTX ecosystem. == History == === Origins: LTX Video (2024–2025) === In November 2024 Lightricks publicly released its first text-to-video model, LTX Video. It was a 2-billion parameter model, available as open source. In May 2025 Lightricks launched LTXV-13b, a version with 13-billion parameters. Two months later, the model broke the 60 second barrier for generated video. === Release of LTX-2 (2025) === In October 2025 Lightricks announced its latest model, and renamed it LTX-2. The model was described as capable of generating synchronized audio and video at native 4K resolution and up to 50 frames per second (fps), using a variety of conditions and prompts, including text-to-video and image-to-video. Google highlighted the fact that LTX-2 was trained on its infrastructure, and saying it was "The first open source AI video generation model, powered by Google Cloud". Upon its release it was ranked in the top-3 models for image-to-video creation by Artificial Analysis, behind Kling 3.5 by Kling AI and Veo 3.1 by Google. Its text-to-image option was ranked 7th. In addition to its open-source release, Lightricks offers API access to LTX-2, allowing developers to generate videos from text and image prompts through a hosted service without running the model locally. === Open Source Release (2026) === In January 2026, Lightricks officially released the full open-source version of LTX-2, making the model’s complete codebase, weights, and associated tooling publicly available. In March 2026 the company released LTX-2.3, which was accompanied by a desktop video editor enabling the entire model to run locally on consumer hardware. == Technical features == === Advancements over LTX Video === LTX-2 builds upon the LTX Video architecture with several major improvements: Unified audio-video generation producing synchronized dialogue, ambience, and motion Native 4K rendering 50-fps output for cinematic motion Three operational modes (Fast, Pro, Ultra) More efficient diffusion pipelines enabling high fidelity on consumer GPUs === Core capabilities === Text-to-video generation Image-to-video generation Multimodal audiovisual synthesis High-resolution spatial and temporal coherence Configurable quality/performance settings Open-source distribution of weights and datasets == Reception == Initial reception to LTX-2 was broadly positive, with several technology and media outlets highlighting its open-source approach and multimodal capabilities. Open Source For You described LTX-2 as “one of the first AI video systems to combine 4K output, synchronized audio, and an open model release,” noting that it positioned Lightricks as a significant competitor to proprietary systems such as OpenAI's Sora and Google's Veo. IEA Green said that the model “could rewrite the AI filmmaking game,” emphasizing that its 50-fps rendering and unified audio-video generation made it suitable for professional studios and independent creators alike. AI News characterized LTX-2 as a “major step forward in the democratization of cinematic-quality video generation,” praising its consumer-grade hardware efficiency and multi-tier generation modes, while also noting ongoing challenges in long-form temporal stability. FinancialContent reported strong interest among creative agencies, attributing the attention to Lightricks’ decision to release model weights and datasets, which reviewers said enabled “a level of transparency not typically seen in commercial AI video models.” === Benchmarks and rankings === Upon release, LTX-2 ranked third for image-to-video creation in the Artificial Analysis benchmark, behind Kling 3.5 and Veo 3.1, while its text-to-video option ranked seventh. As of early 2026, it was the highest-ranked open-source model in the benchmark. === Limitations === Some early reviewers also pointed out quality limitations. The Ray3 technical review noted occasional inconsistencies in lip-sync and motion tracking during long scenes, though it stated these were “in line with the challenges faced by all current AI video diffusion models” and expected to improve with continued iteration. Like other diffusion-based video generators, LTX-2 can produce artifacts in complex multi-person scenes and may struggle with precise text rendering within generated video.

    Read more →
  • Automated machine learning

    Automated machine learning

    Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning. Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models. Common techniques used in AutoML include hyperparameter optimization, meta-learning and neural architecture search. == Comparison to the standard approach == In a typical machine learning application, practitioners have a set of input data points to be used for training. The raw data may not be in a form that all algorithms can be applied to. To make the data amenable for machine learning, an expert may have to apply appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods. After these steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their model. If deep learning is used, the architecture of the neural network must also be chosen manually by the machine learning expert. Each of these steps may be challenging, resulting in significant hurdles to using machine learning. AutoML aims to simplify these steps for non-experts, and to make it easier for them to use machine learning techniques correctly and effectively. AutoML plays an important role within the broader approach of automating data science, which also includes challenging tasks such as data engineering, data exploration and model interpretation and prediction. == Targets of automation == Automated machine learning can target various stages of the machine learning process. Steps to automate are: Data preparation and ingestion (from raw data and miscellaneous formats) Column type detection; e.g., Boolean, discrete numerical, continuous numerical, or text Column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature Task detection; e.g., binary classification, regression, clustering, or ranking Feature engineering Feature selection Feature extraction Meta-learning and transfer learning Detection and handling of skewed data and/or missing values Model selection - choosing which machine learning algorithm to use, often including multiple competing software implementations Ensembling - a form of consensus where using multiple models often gives better results than any single model Hyperparameter optimization of the learning algorithm and featurization Neural architecture search Pipeline selection under time, memory, and complexity constraints Selection of evaluation metrics and validation procedures Problem checking Leakage detection Misconfiguration detection Analysis of obtained results Creating user interfaces and visualizations == Challenges and Limitations == There are a number of key challenges being tackled around automated machine learning. A big issue surrounding the field is referred to as "development as a cottage industry". This phrase refers to the issue in machine learning where development relies on manual decisions and biases of experts. This is contrasted to the goal of machine learning which is to create systems that can learn and improve from their own usage and analysis of the data. Basically, it's the struggle between how much experts should get involved in the learning of the systems versus how much freedom they should be giving the machines. However, experts and developers must help create and guide these machines to prepare them for their own learning. To create this system, it requires labor intensive work with knowledge of machine learning algorithms and system design. Additionally, other challenges include meta-learning and computational resource allocation.

    Read more →
  • Knowledge integration

    Knowledge integration

    Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

    Read more →
  • Embodied cognitive science

    Embodied cognitive science

    Embodied cognitive science is an interdisciplinary field of research, the aim of which is to explain the mechanisms underlying intelligent behavior. It comprises three main methodologies: the modeling of psychological and biological systems in a holistic manner that considers the mind and body as a single entity; the formation of a common set of general principles of intelligent behavior; and the experimental use of robotic agents in controlled environments. == Contributors == Embodied cognitive science borrows heavily from embodied philosophy and the related research fields of cognitive science, psychology, neuroscience and artificial intelligence. Contributors to the field include: From the perspective of neuroscience, Gerald Edelman of the Neurosciences Institute at La Jolla, Francisco Varela of CNRS in France, and J. A. Scott Kelso of Florida Atlantic University From the perspective of psychology, Lawrence Barsalou, Michael Turvey, Vittorio Guidano and Eleanor Rosch From the perspective of linguistics, Gilles Fauconnier, George Lakoff, Mark Johnson, Leonard Talmy and Mark Turner From the perspective of language acquisition, Eric Lenneberg and Philip Rubin at Haskins Laboratories From the perspective of anthropology, Edwin Hutchins, Bradd Shore, James Wertsch and Merlin Donald. From the perspective of autonomous agent design, early work is sometimes attributed to Rodney Brooks or Valentino Braitenberg From the perspective of artificial intelligence, Understanding Intelligence by Rolf Pfeifer and Christian Scheier or How the Body Shapes the Way We Think, by Rolf Pfeifer and Josh C. Bongard From the perspective of philosophy, Andy Clark, Dan Zahavi, Shaun Gallagher, and Evan Thompson In 1950, Alan Turing proposed that a machine may need a human-like body to think and speak: It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. That process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again, I do not know what the right answer is, but I think both approaches should be tried. == Traditional cognitive theory == Embodied cognitive science is an alternative theory to cognition in which it minimizes appeals to computational theory of mind in favor of greater emphasis on how an organism's body determines how and what it thinks. Traditional cognitive theory is based mainly around symbol manipulation, in which certain inputs are fed into a processing unit that produces an output. These inputs follow certain rules of syntax, from which the processing unit finds semantic meaning. Thus, an appropriate output is produced. For example, a human's sensory organs are its input devices, and the stimuli obtained from the external environment are fed into the nervous system which serves as the processing unit. From here, the nervous system is able to read the sensory information because it follows a syntactic structure, thus an output is created. This output then creates bodily motions and brings forth behavior and cognition. Of particular note is that cognition is sealed away in the brain, meaning that mental cognition is cut off from the external world and is only possible by the input of sensory information. == The embodied cognitive approach == Embodied cognitive science differs from the traditionalist approach in that it denies the input-output system. This is chiefly due to the problems presented by the Homunculus argument, which concluded that semantic meaning could not be derived from symbols without some kind of inner interpretation. If some little man in a person's head interpreted incoming symbols, then who would interpret the little man's inputs? Because of the specter of an infinite regress, the traditionalist model began to seem less plausible. Thus, embodied cognitive science aims to avoid this problem by defining cognition in three ways. === Physical attributes of the body === The first aspect of embodied cognition examines the role of the physical body, particularly how its properties affect its ability to think. This part attempts to overcome the symbol manipulation component that is a feature of the traditionalist model. Depth perception, for instance, can be better explained under the embodied approach due to the sheer complexity of the action. Depth perception requires that the brain detect the disparate retinal images obtained by the distance of the two eyes. In addition, body and head cues complicate this further. When the head is turned in a given direction, objects in the foreground will appear to move against objects in the background. From this, it is said that some kind of visual processing is occurring without the need of any kind of symbol manipulation. This is because the objects appearing to move the foreground are simply appearing to move. This observation concludes then that depth can be perceived with no intermediate symbol manipulation necessary. A more poignant example exists through examining auditory perception. Generally speaking the greater the distance between the ears, the greater the possible auditory acuity. Also relevant is the amount of density in between the ears, for the strength of the frequency wave alters as it passes through a given medium. The brain's auditory system takes these factors into account as it process information, but again without any need for a symbolic manipulation system. This is because the distance between the ears for example does not need symbols to represent it. The distance itself creates the necessary opportunity for greater auditory acuity. The amount of density between the ears is similar, in that it is the actual amount itself that simply forms the opportunity for frequency alteration. Thus under consideration of the physical properties of the body, a symbolic system is unnecessary and an unhelpful metaphor. === The body's role in the cognitive process === The second aspect draws heavily from George Lakoff's and Mark Johnson's work on concepts. They argued that humans use metaphors whenever possible to better explain their external world. Humans also have a basic stock of concepts in which other concepts can be derived from. These basic concepts include spatial orientations such as up, down, front, and back. Humans can understand what these concepts mean because they can directly experience them from their own bodies. For example, because human movement revolves around standing erect and moving the body in an up-down motion, humans innately have these concepts of up and down. Lakoff and Johnson contend this is similar with other spatial orientations such as front and back too. As mentioned earlier, these basic stocks of spatial concepts are the basis in which other concepts are constructed. Happy and sad for instance are seen now as being up or down respectively. When someone says they are feeling down, what they are really saying is that they feel sad for example. Thus the point here is that true understanding of these concepts is contingent on whether one can have an understanding of the human body. So the argument goes that if one lacked a human body, they could not possibly know what up or down could mean, or how it could relate to emotional states. [I]magine a spherical being living outside of any gravitational field, with no knowledge or imagination of any other kind of experience. What could UP possibly mean to such a being? While this does not mean that such beings would be incapable of expressing emotions in other words, it does mean that they would express emotions differently from humans. Human concepts of happiness and sadness would be different because human would have different bodies. So then an organism's body directly affects how it can think, because it uses metaphors related to its body as the basis of concepts. === Interaction of local environment === A third component of the embodied approach looks at how agents use their immediate environment in cognitive processing. Meaning, the local environment is seen as an actual extension of the body's cognitive process. The example of a personal digital assistant (PDA) is used to better imagine this. Echoing functionalism (philosophy of mind), this point claims that mental states are individuated by their role in a much larger system. So under this premise, the information on a PDA is similar to the information stored in the brain. So then if one thinks information in the brain constitutes mental states, then it must follow that information in the PDA is a cognitive state too. Consider also the role of pen and paper in a complex multiplication problem. The pen and paper are so involved in the cognitive process of solving the problem that it seems ridiculous to say they are somehow different from the process, in very much the same way the PDA is used for information like the brain. Another example examines how humans control and manipulate their environment

    Read more →
  • Similarity learning

    Similarity learning

    Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

    Read more →
  • Syman

    Syman

    SYMAN is an artificial intelligence technology that uses data from social media profiles to identify trends in the job market. SYMAN is designed to organize actionable data for products and services including recruiting, human capital management, CRM, and marketing. SYMAN was developed with a $21 million series B financing round secured by Identified, which was led by VantagePoint Capital Partners and Capricorn Investment Group.

    Read more →
  • Hyperparameter (machine learning)

    Hyperparameter (machine learning)

    In machine learning, a hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. Hyperparameters can be classified as either model hyperparameters (such as the topology and size of a neural network) or algorithm hyperparameters (such as the learning rate and the batch size of an optimizer). These are named hyperparameters in contrast to parameters, which are characteristics that the model learns from the data. Hyperparameters are not required by every model or algorithm. Some simple algorithms such as ordinary least squares regression require none. However, the LASSO algorithm, for example, adds a regularization hyperparameter to ordinary least squares which must be set before training. Even models and algorithms without a strict requirement to define hyperparameters may not produce meaningful results if these are not carefully chosen. However, optimal values for hyperparameters are not always easy to predict. Some hyperparameters may have no meaningful effect, or one important variable may be conditional upon the value of another. Often a separate process of hyperparameter tuning is needed to find a suitable combination for the data and task. As well as improving model performance, hyperparameters can be used by researchers to introduce robustness and reproducibility into their work, especially if it uses models that incorporate random number generation. == Considerations == The time required to train and test a model can depend upon the choice of its hyperparameters. A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems. The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers. === Difficulty-learnable parameters === The objective function is typically non-differentiable with respect to hyperparameters. As a result, in most instances, hyperparameters cannot be learned using gradient-based optimization methods (such as gradient descent), which are commonly employed to learn model parameters. These hyperparameters are those parameters describing a model representation that cannot be learned by common optimization methods, but nonetheless affect the loss function. An example would be the tolerance hyperparameter for errors in support vector machines. === Untrainable parameters === Sometimes, hyperparameters cannot be learned from the training data because they aggressively increase the capacity of a model and can push the loss function to an undesired minimum (overfitting to the data), as opposed to correctly mapping the richness of the structure in the data. For example, if we treat the degree of a polynomial equation fitting a regression model as a trainable parameter, the degree would increase until the model perfectly fit the data, yielding low training error, but poor generalization performance. === Tunability === Most performance variation can be attributed to just a few hyperparameters. The tunability of an algorithm, hyperparameter, or interacting hyperparameters is a measure of how much performance can be gained by tuning it. For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters, batching and momentum have no significant effect on its performance. Although some research has advocated the use of mini-batch sizes in the thousands, other work has found the best performance with mini-batch sizes between 2 and 32. === Robustness === An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance. Methods that are not robust to simple changes in hyperparameters, random seeds, or even different implementations of the same algorithm cannot be integrated into mission critical control systems without significant simplification and robustification. Reinforcement learning algorithms, in particular, require measuring their performance over a large number of random seeds, and also measuring their sensitivity to choices of hyperparameters. Their evaluation with a small number of random seeds does not capture performance adequately due to high variance. Some reinforcement learning methods, e.g. DDPG (Deep Deterministic Policy Gradient), are more sensitive to hyperparameter choices than others. == Optimization == Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss. Typically these methods are not gradient based, and instead apply concepts from derivative-free optimization or black box optimization. == Reproducibility == Apart from tuning hyperparameters, machine learning involves storing and organizing the parameters and results, and making sure they are reproducible. In the absence of a robust infrastructure for this purpose, research code often evolves quickly and compromises essential aspects like bookkeeping and reproducibility. Online collaboration platforms for machine learning go further by allowing scientists to automatically share, organize and discuss experiments, data, and algorithms. Reproducibility can be particularly difficult for deep learning models. For example, research has shown that deep learning models depend very heavily even on the random seed selection of the random number generator.

    Read more →
  • Artificial intelligence in education

    Artificial intelligence in education

    Artificial intelligence in education (often abbreviated as AIEd) is a subfield of educational technology that studies how to use artificial intelligence to create learning environments. Considerations in the field include data-driven decision-making, AI ethics, data privacy and AI literacy. Concerns include the potential for cheating, over-reliance, equity of access, reduced critical thinking, and the perpetuation of misinformation and bias. == History == Efforts to integrate AI into educational contexts have often followed technological advancement in the history of artificial intelligence. In the 1960s, educators and researchers began developing computer-based instruction systems, such as PLATO, developed by the University of Illinois. In the 1970s and 1980s, intelligent tutoring systems (ITS) were being adapted for classroom instruction. The International Artificial Intelligence in Education Society was founded in 1993. Coinciding with the AI boom of the 2020s, the use of large language models in the global north has been promoted and funded by venture capital and big tech. Companies creating AI services have targeted students and educational institutions as customers. Similarly, pre-AI boom educational companies have expanded their use of AI technologies. These commercial incentives for AIEd use may be related to a potential AI bubble. In the U.S., bipartisan support of AI development in K-12 education has been expressed, but specific implementations and best practices remain contentious. == Theory == AIEd applies theory from education studies, machine learning, and related fields. A 2019 review of the previous decade of studies found that most research prioritized technological design over pedagogical integration. Ouyang and Jiao (2021) propose three paradigms for AI in education, which follow roughly from least to most learner-centered and from requiring least to most technical complexity from the AI systems: AI-directed, learner-as-recipient: AIEd systems present a pre-set curriculum based on statistical patterns that do not adjust to learner's feedback. AI-supported, learner-as-collaborator: Systems that incorporate responsiveness to learner's feedback through, for example, natural language processing, wherein AI can support knowledge construction. AI-empowered, learner-as-leader: This model seeks to position AI as a supplement to human intelligence wherein learners take agency and AI provides consistent and actionable feedback. Some scholars place AI in education within a socio-technical framework. This positions AI alongside other emerging educational technologies, such as computing, the internet, and social media. The framework of Tsao, Heinrichs and Camit (2025) draws on new materialism and posthumanism, specifically Donna Haraway's concept of sympoiesis (making-with). This perspective views learning as an entanglement of human and non-human actors (students, teachers, and AI algorithms), where knowledge is co-composed in contact zones between human context and algorithmic prediction. AI agents have been trained on biased datasets, and thus continue to perpetuate societal biases. Since LLMs were created to produce human-like text, algorithmic bias can be introduced and reproduced. AI's data processing and monitoring reinforce neoliberal approaches to education rather than addressing inequalities. == Applications == Uses of generative AI chatbots in education have included assessment and feedback, machine translations, proof-reading exam question generation and copy editing, or as virtual assistants. Emotional AI in education is the study and development of systems that can detect learners' emotions or provide emotional support in learning. == Usage == === Schools and educators === Following the release of ChatGPT in November 2022, some schools and large school districts blocked access to the site and issued warnings that the use of such tools would be seen as cheating. Governmental and non-governmental organizations such as UNESCO, Article 4 of the European Union's AI Act, and the U.S. Department of Education have published reports advocating for specific AIEd approaches. National higher-education bodies have also published guidance on generative AI, including Ireland's Higher Education Authority, which issued a policy framework for higher education teaching and learning in December 2025. In 2024, UNESCO released updated global guidance for generative AI in education, emphasizing ethical use, teacher training, and data protection to ensure responsible integration of AI tools in learning environments. According to Taso (2025), policy implementation in higher education is interpreted and enacted differently by various organizations. These decentralized policies can lead to inconsistent enforcement and confusion among students regarding what constitutes acceptable use, with the burden of ethical navigation falling on individual teachers and students. AI integration in classrooms has created new forms of invisible labour for educators, who must navigate ambiguous policies, redesign assessments to be AI-resilient, and adjudicate potential academic integrity violations. The use of AI detection tools has also been criticised for creating an adversarial relationship between students and institutions, where students may be falsely accused of misconduct based on probabilistic software. AIEd advocates say that efforts should be made towards increasing global accessibility and training educators to serve underprivileged areas. === Students === Reliance on generative AI has been linked with reduced academic self-esteem and performance, and heightened learned helplessness. Algorithm errors and hallucinations are common flaws in AI agents, making them less trustworthy and reliable. According to a 2025 survey from Inside Higher Ed, 85% of higher education students use generative AI technology in some way, with 25% using AI to complete assignments for them. The most common reason cited for using AI to cheat was pressure to get high grades. 97% of students wanted some form of action from schools on the threat to academic integrity caused by AI, with the most popular options being clearer policies and more education about ethical uses of AI. In September 2025, The Atlantic published an op-ed from a high school senior arguing that the normalization of AI cheating was eroding critical thinking, academic integrity, creativity, and the shared student experience.

    Read more →
  • Radar geo-warping

    Radar geo-warping

    Radar geo-warping is the adjustment of geo-referenced radar images and video data to be consistent with a geographical projection. This image warping avoids any restrictions when displaying it together with video from multiple radar sources or with other geographical data including scanned maps and satellite images which may be provided in a particular projection. There are many areas where geo warping has unique benefits: Single radar video signal displayed together with maps of different geographical projections. E.g. Mercator UTM stereographic Multiple radar video signals displayed simultaneously: Having the computing power to do so on one computer. Adapting the projection of all radar signals allowing the geographically correct display and accurate superimposition of those videos. Slant range correction: a modern 3D radar system can measure the height of a target and hence it is possible to correct the radar video by the real corrected range of the target. Slant Range Correction also allows to compensate the radar tower height e.g. for maritime surveillance radars. == Introduction == Radar video presents the echoes of electromagnetic waves a radar system has emitted and received as reflections afterwards. These echoes are typically presented on a computer screen with a color-coding scheme depicting the reflection strength. Two problems have to be solved during such a visualization process. The first problem arises from the fact that typically the radar antenna turns around its position and measures the reflection echo distances from its position in one direction. This effectively means that the radar video data are present in polar coordinates. In older systems the polar oriented picture has been displayed in so called plan position indicators (PPI). The PPI-scope uses a radial sweep pivoting about the center of the presentation. This results in a map-like picture of the area covered by the radar beam. A long-persistence screen is used so that the display remains visible until the sweep passes again. Bearing to the target is indicated by the target's angular position in relation to an imaginary line extending vertically from the sweep origin to the top of the scope. The top of the scope is either true north (when the indicator is operated in the true bearing mode) or ship's heading (when the indicator is operated in the relative bearing mode). For visualization on a modern computer screen the polar coordinates have to be converted into Cartesian coordinates. This process called radar scan conversion is presented with more detail in the next section. The second problem to solve arises from the fact that a radar system is placed in the real world and measures real world echo positions. These echoes have to be displayed together with other real world data like object positions, vector maps and satellite images in a consistent way. All this information refers to the curved earth surface but is displayed on a flat computer display. Building a link from real world earth positions to display pixels is commonly called geographical referencing or in short geo-referencing. Part of the geo-referencing process is to map the 3D earth surface onto a 2D display. This process of a geographical projection can be performed in many ways, but different data sources have their own 'natural' projection. E.g. Cartesian radar video data from a radar source on the earth surface are geo-referenced by a so-called radar projection. When using this radar projection the Cartesian radar video pixels can directly displayed on a computer screen (only being linearly transformed according to the current position on the screen and e.g. the current zoom level). A problem now arises if e.g. also a satellite map shall be shown together with the radar video data. The 'natural' geographical projection of a satellite image would be a satellite projection which depends on the satellite orbit, position and further parameters. Now either the satellite image has to be reprojected to a radar projection or the radar video has to use the satellite projection. This geographical re-projection is also called geographical warping or Geo Warping where each image pixel has to be transformed from one projection into another. This article describes in further detail the Geo Warping of radar video images in real time. It will also show that radar video Geo Warping is done most efficiently when it is integrated with the radar scan conversion process. == Radar-scan conversion == This section describes the principles of the radar-scan conversion (RSC) process. The radar supplies its measured data in polar coordinates (ρ,θ) directly from the rotating antenna. ρ defines the target/echo distance and θ the target angle in polar world coordinates. These data are measured, digitized and stored in a polar coordinate polar store or polar pixmap. The main RSC task is to convert these data to Cartesian (x, y) display coordinates, creating the necessary display pixels. The RSC process is influenced by the current zoom, shift and rotation settings defining which part of the 'world' shall be visible in the display image. As detailed later the RSC process also takes the currently used geographical projection into account when the radar video images are Geo Warped. The OpenGL RSC is implemented using a reverse scan conversion approach which calculates for every image pixel the most appropriate radar amplitude value in the polar store. This approach generates an optimal image without any artifacts known from forward spoke fill algorithms. By applying bi-linear filtering between adjacent pixels in the polar store during the conversion process the OpenGL RSC finally achieves a very high visual quality radar display image for every zoom level, creating smooth images of the radar echoes. == Radar projection == This section illustrates how radar video data are geo referenced and displayed on a computer screen. The radar sensor is positioned on the earth surface with a height h above the ground. It measures the direct distance d to the target (and not e.g. the distance the target is away from the radar if one would move on the earth surface). This distance is then used in the display plane after adjustment to the current display zoom level by the radar scan converter (RSC). Now it has to be clarified how the radar video data is geo referenced. This basically means, that if we want to display a geographical real world object (like e.g. a light house) which is at the same real world position as the radar target, that it also shall appear at the same position in the display plane. This is realized by calculating the distance from the radar sensor to the respective real world object and use that distance in the display plane. The position of the real world object is typically given in geographical coordinates (latitude, longitude and height above the earth surface). In other words, using a radar projection with geographical data is done by simulating a radar measurement process with the real world objects and use the resulting range and azimuth in the display plane. The second picture to the right shows an example radar projection with the center of projection (COP) at latitude 50.0° and longitude 0.0° which is also the radar position. The dashed lines are the equal-latitude and equal-longitude lines on top of the background map. The solid lines show equal-range and equal-azimuth with the respect to the radar position. It is a feature of the radar projection that equal-range lines are circles and equal-azimuth lines are straight lines. This is necessary to display radar video consistently with other map data when using a radar projection where the projection center has to be the radar position. == Geo Warping process == This section explains the actual geo warping or re-projection process when applied to radar video in real time. Assume we want to display radar video on top of a satellite image. As an example we use the CIB projection which is used to display satellite data in CIB (Controlled Image Base) format. The Figure Geo Warping Radar to CIB Projection shows dashed the maximal range circle for a range of 111 km or 60 miles using the radar projection. Such a range is typical for long range coastal surveillance radars. As stated in the last section this is a perfect circle also on the computer screen. The solid line ellipse shows the same range circle for the CIB projection. Typically the errors occurring without Geo Warping are smallest near the radar position if at least the projection center (COP) coincides with the radar position, as realized in our example. Otherwise the error distribution depends both on the used projection and also on the projection parameters. Thus, in our case the errors are most significant near the maximum radar range. The CIB projection error corrected in east–west direction at half the radar range is 2.6 km and is 5.3 km at the full radar range of 111 km. An error of 5.3 km is

    Read more →
  • Operation Serenata de Amor

    Operation Serenata de Amor

    Operation Serenata de Amor is an artificial intelligence project designed to analyze public spending in Brazil. The project has been funded by a recurrent financing campaign since September 7, 2016, and came in the wake of major scandals of misappropriation of public funds in Brazil, such as the Mensalão scandal and what was revealed in the Operation Car Wash investigations. The analysis began with data from the National Congress then expanded to other types of budget and instances of government, such as the Federal Senate. The project is built through collaboration on GitHub and using a public group with more than 600 participants on Telegram. The name "Serenata de Amor," which means "serenade of love," was taken from a popular cashew cream bonbon produced by Chocolates Garoto in Brazil. == Modules == Throughout development of the project, new modules have been newly introduced in addition to the main repository: The main repository, serenata-de-amor, serves as the starting point for investigative work. Rosie is the robot programmed to identify public funds expenses with discrepancies, starting with CEAP (Quota for Exercise of Parliamentary Activity); it analyzes each of the reimbursements requested by the deputies and senators, indicating the reasons that lead it to believe they are suspicious. From Rosie was born whistleblower, which tweets under the name of @RosieDaSerenata, distributing the results found on social media. Jarbas (Github repository) is a data visualization tool which shows a complete list of reimbursements made available by the Chamber of Deputies and mined by Rosie. Toolbox is a Python installable package that supports the development of Serenata de Amor and Rosie. == History == Operation Serenata de Amor is an Artificial intelligence project for analysis of public expenditures. It was conceived in March 2016 by data scientist Irio Musskopf, sociologist Eduardo Cuducos and entrepreneur Felipe Cabral. The project was financed collectively in the Catarse platform, where it reached 131% of the collection goal paying 3 months of project development. Ana Schwendler, also a data scientist, Pedro Vilanova "Tonny", data journalist, Bruno Pazzim, software engineer, Filipe Linhares, a frontend engineer, Leandro Devegili, an entrepreneur and André Pinho took the first steps towards constructing the platform, such as collecting and structuring the first datasets. Jessica Temporal, data scientist and Yasodara Córdova "Yaso", researcher, Tatiana Balachova "Russa", UX designer, joined the project after the financing took place. The members created a recurring financing campaign, expanding the analysis of public spending to the Federal Senate. Donors make monthly payments ranging from 5 BRL to 200 BRL to maintain group activities. The monthly amount collected is around 10,000 BRL. == Results == In January 2017, concluding the period financed by the initial campaign, the group carried out an investigation into the suspicious activities found by the data analysis system. 629 complaints were made to the Ombudsman's Office of the Chamber of Deputies, questioning expenses of 216 federal deputies. In addition, the Facebook project page has more than 25,000 followers, and users frequently cite the operation as a benchmark in transparency in the Brazilian government. One of the examples of results obtained by the operation is the case of the Deputy who had to return about 700 BRL to the House after his expenses were analyzed by the platform. The platform was able to analyze more than 3 million notes, raising about 8,000 suspected cases in public spending. The community that supports the work of the team benefits from open source repositories, with licenses open for the collaboration. So much so that the two main data scientists of the project presented it at the CivicTechFest in Taipei, obtaining several mentions even in the international press. The technical leader presented the project in Poland during DevConf2017 in Kraków. It was also presented in the Google News Lab in 2017. It was presented by Yaso, when she was the Director of the initiative, at the MIT Media Lab/Berkman Klein Center Initiative for Artificial Intelligence ethics, and at the Artificial Intelligence and Inclusion Symposium, an initiative of the Global Network of Internet & Society Centers (NoC). It was also presented both by Irio and Yaso at the Digital Harvard Kennedy School, over a lunch seminar, where the transparency of the platform and the main solutions found were discussed, so that the code and data are always available to verify its suitability. This infographic provides information about the first results of Operation Serenata de Amor, a project that analyzes open data on public spending to find discrepancies. The project was presented by Yaso to the House Audit and Control Committee of the Chamber of Deputies in August 2017, and raised the interest of House officials who work with open data. The operation has been a source of inspiration for other civic projects that aim to work with similar goals, demonstrating the broader impact of artificial intelligence also in industry in Brazil. Participation of several team members in events throughout Brazil and abroad can be found on the Internet, such as presentation at OpenDataDay, held at Calango Hackerspace in the Federal District, Campus Party Bahia, Campus Party Brasilia, Friends of Tomorrow, XIII National Meeting of Internal Control, in the event USP Talks Hackfest against corruption in João Pessoa, the latter being also highlighted in the National Press.

    Read more →
  • Intelligent decision support system

    Intelligent decision support system

    An intelligent decision support system (IDSS) is a decision support system that makes extensive use of artificial intelligence (AI) techniques. Use of AI techniques in management information systems has a long history – indeed terms such as "Knowledge-based systems" (KBS) and "intelligent systems" have been used since the early 1980s to describe components of management systems, but the term "Intelligent decision support system" is thought to originate with Clyde Holsapple and Andrew Whinston in the late 1970s. Examples of specialized intelligent decision support systems include Flexible manufacturing systems (FMS), intelligent marketing decision support systems and medical diagnosis systems. Ideally, an intelligent decision support system should behave like a human consultant: supporting decision makers by gathering and analysing evidence, identifying and diagnosing problems, proposing possible courses of action and evaluating such proposed actions. The aim of the AI techniques embedded in an intelligent decision support system is to enable these tasks to be performed by a computer, while emulating human capabilities as closely as possible. Many IDSS implementations are based on expert systems, a well established type of KBS that encode knowledge and emulate the cognitive behaviours of human experts using predicate logic rules, and have been shown to perform better than the original human experts in some circumstances. Expert systems emerged as practical applications in the 1980s based on research in artificial intelligence performed during the late 1960s and early 1970s. They typically combine knowledge of a particular application domain with an inference capability to enable the system to propose decisions or diagnoses. Accuracy and consistency can be comparable to (or even exceed) that of human experts when the decision parameters are well known (e.g. if a common disease is being diagnosed), but performance can be poor when novel or uncertain circumstances arise. Research in AI focused on enabling systems to respond to novelty and uncertainty in more flexible ways is starting to be used in IDSS. For example, intelligent agents that perform complex cognitive tasks without any need for human intervention have been used in a range of decision support applications. Capabilities of these intelligent agents include knowledge sharing, machine learning, data mining, and automated inference. A range of AI techniques such as case based reasoning, rough sets and fuzzy logic have also been used to enable decision support systems to perform better in uncertain conditions. A 2009 research about a multi-artificial system intelligence system named IILS is proposed to automate problem-solving processes within the logistics industry. The system involves integrating intelligence modules based on case-based reasoning, multi-agent systems, fuzzy logic, and artificial neural networks aiming to offer advanced logistics solutions and support in making well-informed, high-quality decisions to address a wide range of customer needs and challenges.

    Read more →
  • Hardware for artificial intelligence

    Hardware for artificial intelligence

    Specialized computer hardware is often used to execute artificial intelligence (AI) programs faster, and with less energy, such as Lisp machines, neuromorphic engineering, event cameras, and physical neural networks. Since 2017, several consumer grade CPUs and SoCs have on-die NPUs. As of 2023, the market for AI hardware is dominated by GPUs. As of the 2020s, AI computation is dominated by graphics processing units (GPUs) and newer domain-specific accelerators such as Google's Tensor Processing Units (TPUs), AMD's Instinct MI300 series, and various on-device neural-processing units (NPUs) found in consumer hardware. == Scope == For the purposes of this article, AI hardware refers to computing components and systems specifically designed or optimized to accelerate artificial-intelligence workloads such as machine-learning training or inference. This includes general-purpose accelerators used for AI (for example, GPUs) and domain-specific accelerators (for example, TPUs, NPUs, and other AI ASICs). Event-based cameras are sometimes discussed in the context of neuromorphic computing, but they are input sensors rather than AI compute devices. Conversely, components such as memristors are basic circuit elements rather than specialized AI hardware when considered alone. == Lisp machines == Lisp machines were developed in the late 1970s and early 1980s to make artificial intelligence programs written in the programming language Lisp run faster. == Dataflow architecture == Dataflow architecture processors used for AI serve various purposes with varied implementations like the polymorphic dataflow Convolution Engine by Kinara (formerly Deep Vision), structure-driven dataflow by Hailo, and dataflow scheduling by Cerebras. == Component hardware == === AI accelerators === Since the 2010s, advances in computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer. By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced central processing units (CPUs) as the dominant means to train large-scale commercial cloud AI. OpenAI estimated the hardware compute used in the largest deep learning projects from Alex Net (2012) to Alpha Zero (2017), and found a 300,000-fold increase in the amount of compute needed, with a doubling-time trend of 3.4 months. === General-purpose GPUs for AI === Since the 2010s, graphics processing units (GPUs) have been widely used to train and deploy deep learning models because of their highly parallel architecture and high memory bandwidth. Modern data-center GPUs include dedicated tensor or matrix-math units that accelerate neural-network operations. In 2022, NVIDIA introduced the Hopper-generation H100 GPU, adding FP8 precision support and faster interconnects for large-scale model training. AMD and other vendors have also developed GPUs and accelerators aimed at AI and high-performance computing workloads. === Domain-specific accelerators (ASICs / NPUs) === Beyond general-purpose GPUs, several companies have developed application-specific integrated circuits (ASICs) and neural processing units (NPUs) tailored for AI workloads. Google introduced the Tensor Processing Unit (TPU) in 2016 for deep-learning inference, with later generations supporting large-scale training through dense systolic-array designs and optical interconnects. Other vendors have released similar devices—such as Apple's Neural Engine and various on-device NPUs—that emphasize energy-efficient inference in mobile or edge computing environments. === Memory and interconnects === AI accelerators rely on fast memory and inter-chip links to manage the large data volumes of training and inference. High-bandwidth memory (HBM) stacks, standardized as HBM3 in 2022, provide terabytes-per-second throughput on modern GPUs and ASICs. These accelerators are often connected through dedicated fabrics such as NVIDIA's NVLink and NVSwitch or optical interconnects used in TPU systems to scale performance across thousands of chips.

    Read more →
  • Security.txt

    Security.txt

    security.txt is an accepted standard for website security information that allows security researchers to report security vulnerabilities easily. The standard prescribes a text file named security.txt in the well known location, similar in syntax to robots.txt but intended to be machine and human readable, for those wishing to contact a website's owner about security issues. security.txt files have been adopted by Google, GitHub, LinkedIn, and Facebook. == History == The Internet Draft was first submitted by Edwin Foudil in September 2017. At that time it covered four directives, "Contact", "Encryption", "Disclosure" and "Acknowledgement". Foudil expected to add further directives based on feedback. In addition, web security expert Scott Helme said he had seen positive feedback from the security community while use among the top 1 million websites was "as low as expected right now". In 2019, the Cybersecurity and Infrastructure Security Agency (CISA) published a draft binding operational directive that requires all US federal agencies to publish a security.txt file within 180 days. The Internet Engineering Steering Group (IESG) issued a Last Call for security.txt in December 2019 which ended on January 6, 2020. A study in 2021 found that over ten percent of top-100 websites published a security.txt file, with the percentage of sites publishing the file decreasing as more websites were considered. The study also noted a number of discrepancies between the standard and the content of the file. In April 2022 the security.txt file has been accepted by Internet Engineering Task Force (IETF) as RFC 9116. == File format == security.txt files can be served under the /.well-known/ directory (i.e. /.well-known/security.txt) or the top-level directory (i.e. /security.txt) of a website. The file must be served over HTTPS and in plaintext format.

    Read more →
  • Deep Learning Super Sampling

    Deep Learning Super Sampling

    Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available in a number of video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that approximates the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings or frame rates for a given output resolution, depending on user preference. All generations of DLSS are available on all RTX-branded cards from Nvidia in supported titles. However, the Frame Generation feature is only supported on RTX 40 series GPUs or newer and Multi Frame Generation is only available on 50 series GPUs. == History == Nvidia advertised DLSS as a key feature of GeForce RTX 20 series GPUs when they launched in September 2018. At that time, the results were limited to a few video games, namely Battlefield V, or Metro Exodus, because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling. In 2019, Control shipped with ray tracing and an image processing algorithm that approximated DLSS, which did not use the Tensor Cores. In April 2020, Nvidia advertised and shipped an improved version of DLSS named DLSS 2 with driver version 445.75. DLSS 2.0 was available for a few existing games including Control and Wolfenstein: Youngblood, and would later be added to many newly released games and game engines such as Unreal Engine and Unity. This time Nvidia said that it used the Tensor Cores again, and that the AI did not need to be trained specifically on each game. Despite sharing the DLSS branding, the two iterations of DLSS differ significantly and are not backwards-compatible. In January 2025, Nvidia stated that there are over 540 games and apps supporting DLSS, and that over 80% of Nvidia RTX users activate DLSS. In March 2025, there were more than 100 games that support DLSS 4, according to Nvidia. By May 2025, over 125 games supported DLSS 4. The first video game console to use DLSS, the Nintendo Switch 2, was released on June 5, 2025. Nvidia announced DLSS 4.5 at CES 2026. In January 2026, Nvidia stated that over 250 games and applications support Multi Frame Generation. On March 16, 2026, at GTC 2026, Nvidia CEO Jensen Huang presented DLSS 5, a real-time AI model based on neural rendering that realistically enhances lighting and material surfaces at up to 4K resolution while retaining the developer's intended art style. It is planned to release in fall of 2026. In a blog post on its website, Nvidia has announced that DLSS 5 will be available in such games as Assassin's Creed Shadows, Delta Force, Hogwarts Legacy, Naraka: Bladepoint, Phantom Blade Zero, Resident Evil Requiem, Starfield, The Elder Scrolls IV: Oblivion Remastered, and more. On May 31, 2026, Nvidia announced an updated version of Ray Reconstruction for DLSS 4.5 in a blog post, scheduled for release on all RTX GPUs in August of the same year. They said it is designed to better embed spatial awareness into scenes and analyze engine data on movements and lighting conditions, resulting in a sharper, more stable, and less noisy image. === Release timeline === == Technology == === DLSS 1 === The first iteration of DLSS is a predominantly spatial image upscaler with two stages, both relying on convolutional auto-encoder neural networks. The first step is an image enhancement network which uses the current frame and motion vectors to perform edge enhancement, and spatial anti-aliasing. The second stage is an image upscaling step which uses the single raw, low-resolution frame to upscale the image to the desired output resolution. Using just a single frame for upscaling means the neural network itself must generate a large amount of new information to produce the high-resolution output, which can result in slight hallucinations such as leaves that differ in style to the source content. The neural networks are trained on a per-game basis by generating a "perfect frame" using traditional supersampling to 64 samples per pixel, as well as the motion vectors for each frame. The data collected must be as comprehensive as possible, including as many levels, times of day, graphical settings, resolutions, etc. as possible. This data is also augmented using common augmentations such as rotations, colour changes, and random noise to help generalize the test data. Training is performed on Nvidia's Saturn V supercomputer. This first iteration received a mixed response, with many criticizing the often soft appearance and artifacts along with glitches in certain situations; likely a side effect of the limited data from only using a single frame input to the neural networks which could not be trained to perform optimally in all scenarios and edge-cases. Nvidia also demonstrated the ability for the auto-encoder networks to learn the ability to recreate depth-of-field and motion blur, although this functionality has never been included in a publicly released product. === DLSS 2 === DLSS 2 is a temporal anti-aliasing upsampling (TAAU) implementation, using data from previous frames extensively through sub-pixel jittering to resolve fine detail and reduce aliasing. The data DLSS 2 collects includes: the raw low-resolution input, motion vectors, depth buffers, and exposure / brightness information. It can also be used as a simpler TAA implementation where the image is rendered at 100% resolution, rather than being upsampled by DLSS, Nvidia brands this as DLAA (Deep Learning Anti-Aliasing). TAA(U) is used in many modern video games and game engines; however, all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. DLSS 2 uses a convolutional auto-encoder neural network trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLSS 2 can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts. This is why DLSS 2 can sometimes produce a sharper image than rendering at higher, or even native resolutions using traditional TAA. However, no temporal solution is perfect, and artifacts (ghosting in particular) are still visible in some scenarios when using DLSS 2. Because temporal artifacts occur in most art styles and environments in broadly the same way, the neural network that powers DLSS 2 does not need to be retrained when being used in different games. Despite this, Nvidia does frequently ship new minor revisions of DLSS 2 with new titles, so this could suggest some minor training optimizations may be performed as games are released, although Nvidia does not provide changelogs for these minor revisions to confirm this. The main advancements compared to DLSS 1 include: Significantly improved detail retention, a generalized neural network that does not need to be re-trained per-game, and ~2x less overhead (~1–2 ms vs ~2–4 ms). It should also be noted that forms of TAAU such as DLSS 2 are not upscalers in the same sense as techniques such as ESRGAN or DLSS 1, which attempt to create new information from a low-resolution source; instead, TAAU works to recover data from previous frames, rather than creating new data. In practice, this means low resolution textures in games will still appear low-resolution when using current TAAU techniques. This is why Nvidia recommends game developers use higher resolution textures than they would normally for a given rendering resolution by applying a mip-map bias when DLSS 2 is enabled. === DLSS 3 === Augments DLSS 2 with improved image quality and the introduction of a new motion interpolation feature, called Frame Generation. The DLSS Frame Generation algorithm takes two rendered frames from the rendering pipeline and generates a new frame that smoothly transitions between them. For every frame rendered, one additional frame is generated. DLSS 3.0 makes use of a new generation Optical Flow Accelerator (OFA) included in the Ada Lovelace architecture of GeForce RTX 40 series GPUs and with that is exclusive to them. The new OFA is said to be faster and more accurate than the one already available in previous Turing and Ampere RTX GPUs. === DLSS 3.5 === DLSS 3.5 adds Ray Reconstruction, replacing multiple denoising algorithms with a single AI model trained o

    Read more →
  • Personality computing

    Personality computing

    Personality computing is a research field related to artificial intelligence and personality psychology that studies personality by means of computational techniques from different sources, including text, multimedia, and social networks. == Overview == Personality computing addresses three main problems involving personality: automatic personality recognition, perception, and synthesis. Automatic personality recognition is the inference of the personality type of target individuals from their digital footprint. Automatic personality perception is the inference of the personality attributed by an observer to a target individual based on some observable behavior. Automatic personality synthesis is the generation of the style or behaviour of artificial personalities in Avatars and virtual agents. Self-assessed personality tests or observer ratings are always exploited as the ground truth for testing and validating the performance of artificial intelligence algorithms for the automatic prediction of personality types. There is a wide variety of personality tests, such as the Myers Briggs Type Indicator (MBTI) or the MMPI, but the most used are tests based on the Five Factor Model such as the Revised NEO Personality Inventory. Personality computing can be considered as an extension or complement of Affective computing, where the former focuses on personality traits and the latter on affective states. A further extension of the two fields is Character Computing which combines various character states and traits including but not limited to personality and affect. == History == Personality computing began around 2005 with the pioneering research in personality recognition by Shlomo Argamon and later by François Mairesse. These works showed that personality traits could be inferred with reasonable accuracy from text, such as blogs, self-presentations, and email addresses. In 2008, the concept of "portable personality" for the distributed management of personality profiles has been developed. A few years later, research began in personality recognition and perception from multimodal and social signals, such as recorded meetings and voice calls. In the 2010s, the research focused mainly on personality recognition and perception from social media, helped by the first workshops organized by Fabio Celli. In particular personality was extracted from Facebook, Twitter and Instagram. In the same years, automatic personality synthesis helped improve the coherence of simulated behavior in virtual agents. Scientific works by Michal Kosinski demonstrated the validity of Personality Computing from different digital footprints, in particular from user preferences such as Facebook page likes, showed that machines can recognize personality better than humans and raised a warning against Cambridge Analytica and misuse of this kind of technology. == Applications == Personality computing techniques, in particular personality recognition and perception, have applications in Social media marketing, where they can help reducing the cost of advertising campaigns through psychological targeting.

    Read more →