AI For Business Escp

AI For Business Escp — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Viewport

    Viewport

    A viewport is a polygon viewing region in computer graphics. In computer graphics theory, there are two region-like notions of relevance when rendering some objects to an image. In textbook terminology, the world coordinate window is the area of interest (meaning what the user wants to visualize) in some application-specific coordinates, e.g. miles, centimeters etc. The word window as used here should not be confused with the GUI window, i.e. the notion used in window managers. Rather it is an analogy with how a window limits what one can see outside a room. In contrast, the viewport is an area (typically rectangular) expressed in rendering-device-specific coordinates, e.g. pixels for screen coordinates, in which the objects of interest are going to be rendered. Clipping to the world-coordinates window is usually applied to the objects before they are passed through the window-to-viewport transformation. For a 2D object, the latter transformation is simply a combination of translation and scaling, the latter not necessarily uniform. An analogy of this transformation process based on traditional photography notions is to equate the world-clipping window with the camera settings and the variously sized prints that can be obtained from the resulting film image as possible viewports. Because the physical-device-based coordinates may not be portable from one device to another, a software abstraction layer known as normalized device coordinates is typically introduced for expressing viewports; it appears for example in the Graphical Kernel System (GKS) and later systems inspired from it. In 3D computer graphics, the viewport refers to the 2D rectangle used to project the 3D scene to the position of a virtual camera. A viewport is a region of the screen used to display a portion of the total image to be shown. In virtual desktops, the viewport is the visible portion of a 2D area which is larger than the visualization device. When viewing a document in a web browser, the viewport is the region of the browser window which contains the visible portion of the document. If the size of the viewport changes, for example as a result of the user resizing the browser window, then the browser may reflow the document (recalculate the locations and sizes of elements of the document). If the document is larger than the viewport, the user can control the portion of the document which is visible by scrolling in the viewport.

    Read more →
  • Curse of dimensionality

    Curse of dimensionality

    The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. The curse generally refers to issues that arise when the number of datapoints is small (in a suitably defined sense) relative to the intrinsic dimension of the data. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient. == Domains == === Combinatorics === In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities. Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the combinatorial explosion. Even in the simplest case of d {\displaystyle d} binary variables, the number of possible combinations already is 2 d {\displaystyle 2^{d}} , exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations. === Sampling === There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 102 = 100 evenly spaced sample points suffice to sample a unit interval (try to visualize a "1-dimensional" cube, i.e. a line) with no more than 10−2 = 0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2 = 0.01 between adjacent points would require 1020 = [(102)10] sample points. In general, with a spacing distance of 10−n the 10-dimensional hypercube appears to be a factor of 10n(10−1) = [(10n)10/(10n)] "larger" than the 1-dimensional hypercube, which is the unit interval. In the above example n = 2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 1018 "larger" than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below. === Optimization === When solving dynamic optimization problems by numerical backward induction, the objective function must be computed for each combination of values. This is a significant obstacle when the dimension of the "state variable" is large. === Machine learning === In machine learning problems that involve learning a "state-of-nature" from a finite number of data samples in a high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values. In an abstract sense, as the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially. A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation. In machine learning and insofar as predictive performance is concerned, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as Hughes phenomenon. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily. Nevertheless, in the context of a simple classifier (e.g., linear discriminant analysis in the multivariate Gaussian model under the assumption of a common known covariance matrix), Zollanvari et al. showed both analytically and empirically that as long as the relative cumulative efficacy of an additional feature set (with respect to features that are already part of the classifier) is greater (or less) than the size of this additional feature set, the expected error of the classifier constructed using these additional features will be less (or greater) than the expected error of the classifier constructed without them. In other words, both the size of additional features and their (relative) cumulative discriminatory effect are important in observing a decrease or increase in the average predictive power. In metric learning, higher dimensions can sometimes allow a model to achieve better performance. After normalizing embeddings to the surface of a hypersphere, FaceNet achieves the best performance using 128 dimensions as opposed to 64, 256, or 512 dimensions in one ablation study. A loss function for unitary-invariant dissimilarity between word embeddings was found to be minimized in high dimensions. === Data mining === In data mining, the curse of dimensionality refers to a data set with too many features. Consider the first table, which depicts 200 individuals and 2000 genes (features) with a 1 or 0 denoting whether or not they have a genetic mutation in that gene. A data mining application to this data set may be finding the correlation between specific genetic mutations and creating a classification algorithm such as a decision tree to determine whether an individual has cancer or not. A common practice of data mining in this domain would be to create association rules between genetic mutations that lead to the development of cancers. To do this, one would have to loop through each genetic mutation of each individual and find other genetic mutations that occur over a desired threshold and create pairs. They would start with pairs of two, then three, then four until they result in an empty set of pairs. The complexity of this algorithm can lead to calculating all permutations of gene pairs for each individual or row. Given the formula for calculating the permutations of n items with a group size of r is: n ! ( n − r ) ! {\displaystyle {\frac {n!}{(n-r)!}}} , calculating the number of three pair permutations of any given individual would be 7988004000 different pairs of genes to evaluate for each individual. The number of pairs created will grow by an order of factorial as the size of the pairs increase. The growth is depicted in the permutation table (see right). As we can see from the permutation table above, one of the major problems data miners face regarding the curse of dimensionality is that the space of possible parameter values grows exponentially or factorially as the number of features in the data set grows. This problem critically affects both computational time and space when searching for associations or optimal features to consider. Another problem data miners may face when dealing with too many features is that the number of false predictions or classifications tends to increase as the number of features grows in the data set. In terms of the classification problem discussed above, keeping every data point could lead to a higher number of false positives and false negatives in the model. This may seem counterintuitive, but consider the genetic mutation table from above, depicting all genetic mutations for each individual. Each genetic mutation, whether they correlate with cancer or not, will have some input or weight in the model that guides the decision-making process of the algorithm. There may be mutations that are outliers or ones that dominate the overall distribution of genetic mutations when in fact they do not correlate with cancer. These features may be working against one's model, making it more difficult to obtain optimal results. This problem is up to the data miner to solve, and there is no universal solution. The first step any data miner should take is to explore the data, in an attempt to gain an understanding of how it can be used to solve the problem. One must first understand what the data means, and what they are trying to discover before they can decide if anything must be removed from the data set. Then they can create or use a feature selection or dimensionality reduction algorithm to remove samples or features from the data set if they deem it necessary. One example of such methods is the interquartile range method, used to remove outliers in a data set by calculating the standard deviation of a feature or occurrence. === Distance function === When a measure such as a Euclidean distance is defined using many coordinat

    Read more →
  • Maximum inner-product search

    Maximum inner-product search

    Maximum inner-product search (MIPS) is a search problem, with a corresponding class of search algorithms which attempt to maximise the inner product between a query and the data items to be retrieved. MIPS algorithms are used in a wide variety of big data applications, including recommendation algorithms and machine learning. Formally, for a database of vectors x i {\displaystyle x_{i}} defined over a set of labels S {\displaystyle S} in an inner product space with an inner product ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } defined on it, MIPS search can be defined as the problem of determining a r g m a x i ∈ S ⟨ x i , q ⟩ {\displaystyle {\underset {i\in S}{\operatorname {arg\,max} }}\ \langle x_{i},q\rangle } for a given query q {\displaystyle q} . Although there is an obvious linear-time implementation, it is generally too slow to be used on practical problems. However, efficient algorithms exist to speed up MIPS search. Under the assumption of all vectors in the set having constant norm, MIPS can be viewed as equivalent to a nearest neighbor search (NNS) problem in which maximizing the inner product is equivalent to minimizing the corresponding distance metric in the NNS problem. Like other forms of NNS, MIPS algorithms may be approximate or exact. MIPS search is used as part of DeepMind's RETRO algorithm.

    Read more →
  • Token maxxing

    Token maxxing

    Token Maxxing or Token Maxing is a metric used in an attempt to track productivity in the workplace especially for those using Artificial Intelligence (AI) based services. AI services charge for each token which represent units of effort expended by an AI service to solve a problem. Some believe that token consumption equates to productivity and thus can be used as a metric to monitor an employee's work. Supporters believe that higher token usage indicates higher productivity and higher utilization of powerful AI services. This also suggests that those not consuming enough tokens may be less productive and underutilizing powerful AI services. This belief might lead to an environment that incentivizes higher token usage to predict increased productivity. Critics of token maxxing as a metric claim that prudent workers will maximize any metric that management wants increased to gain a workplace advantage. For example: Engineers in the tech industries pressed to consume as many tokens as possible might run several AI agents in tandem, enter longer input prompts, or automate their tasks to maximize their token consumption. To management, this higher token usage may indicate potential productivity, but in reality may cause additional token costs, worker burnout, or actually create more bloated code of lower quality. Another claim is AI service companies potentially benefit from such an emphasis on token consumption and actively encourage the trend. Some developers have publicly advocated the practice. Developer Sigrid Jin, who said he used 50 billion tokens in a single year, has argued that maximizing token consumption is the best way to understand the value of AI, advising others to spend as much on AI usage as they pay in rent to obtain a return on investment. == See Also == Goodhart's law Perverse incentive Jevons Paradox

    Read more →
  • Gold (linker)

    Gold (linker)

    In software engineering, gold is a linker for ELF files. It became an official GNU package and was added to binutils in March 2008 and first released in binutils version 2.19. gold was developed by Ian Lance Taylor and a small team at Google. The motivation for writing gold was to make a linker that is faster than the GNU linker, especially for large applications coded in C++. Unlike the GNU linker, gold does not use the BFD library to process object files. While this limits the object file formats it can process to ELF only, it is also claimed to result in a cleaner and faster implementation without an additional abstraction layer. The author cited complete removal of BFD as a reason to create a new linker from scratch rather than incrementally improve the GNU linker. This rewrite also fixes some bugs in old ld that break ELF files in various minor ways. To specify gold in a makefile, one sets the LD or LD environment variable to ld.gold. To specify gold through a compiler option, one can use the gcc option -fuse-ld=gold. Fedora has moved gold from binutils into its own package due to concerns it is suffering from bitrot after Google's interest has moved to LLVM. In particular, gold does not read LDFLAGS variable, so cannot see libraries in folders like /usr/local/lib. On 2025-02-02 the 2.44 version of GNU Binutils removed gold from the default source distribution and into a separate package, stating that "the gold linker is now deprecated and will eventually be removed unless volunteers step forward and offer to continue development and maintenance".

    Read more →
  • Instance selection

    Instance selection

    Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance selection can be applied for reducing the original dataset to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Algorithms of instance selection can also be applied for removing noisy instances, before applying learning algorithms. This step can improve the accuracy in classification problems. Algorithm for instance selection should identify a subset of the total available data to achieve the original purpose of the data mining (or machine learning) application as if the whole data had been used. Considering this, the optimal outcome of IS would be the minimum data subset that can accomplish the same task with no performance loss, in comparison with the performance achieved when the task is performed using the whole available data. Therefore, every instance selection strategy should deal with a trade-off between the reduction rate of the dataset and the classification quality. == Instance selection algorithms == The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3, ICF and LSBo. On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENN and LSSm. In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo. There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS, CDIS and XLDIS select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF. Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA, PSDSP and PSSP. The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP adopts a similar approach for selecting actual instances (instead of prototypes).

    Read more →
  • Astrostatistics

    Astrostatistics

    Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by automated scanning of the cosmos, to characterize complex datasets, and to link astronomical data to astrophysical theory. Many branches of statistics are involved in astronomical analysis including nonparametrics, multivariate regression and multivariate classification, time series analysis, and especially Bayesian inference. The field is closely related to astroinformatics.

    Read more →
  • AI-complete

    AI-complete

    In the field of artificial intelligence (AI), tasks that are hypothesized to require artificial general intelligence to solve are informally known as AI-complete or AI-hard. Calling a problem AI-complete reflects the belief that it cannot be solved by a simple specific algorithm. Prior to 2013, problems supposed to be AI-complete included computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real-world problem. AI-complete tasks were notably considered useful for distinguishing humans from automated agents, as CAPTCHAs aim to do. == History == The term was coined by Fanya Montalvo by analogy with NP-complete and NP-hard in complexity theory, which formally describes the most famous class of difficult problems. Early uses of the term are in Erik Mueller's 1987 PhD dissertation and in Eric Raymond's 1991 Jargon File. Expert systems, that were popular in the 1980s, were able to solve very simple and/or restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempted to "scale up" their systems to handle more complicated, real-world situations, the programs tended to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation: they would fail as unexpected circumstances outside of its original problem context would begin to appear. When human beings are dealing with new situations in the world, they are helped by their awareness of the general context: they know what the things around them are, why they are there, what they are likely to do and so on. They can recognize unusual situations and adjust accordingly. Expert systems lacked this adaptability and were brittle when facing new situations. DeepMind published a work in May 2022 in which they trained a single model to do several things at the same time. The model, named Gato, can "play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens." Similarly, some tasks once considered to be AI-complete, like machine translation, are among the capabilities of large language models. == AI-complete problems == AI-complete problems have been hypothesized to include: AI peer review (composite natural language understanding, automated reasoning, automated theorem proving, formalized logic expert system) Bongard problems Computer vision (and subproblems such as object recognition) Natural language understanding (and subproblems such as text mining, machine translation, and word-sense disambiguation) Autonomous driving Dealing with unexpected circumstances while solving any real world problem, whether navigation, planning, or even the kind of reasoning done by expert systems. == Formalization == Computational complexity theory deals with the relative computational difficulty of computable functions. By definition, it does not cover problems whose solution is unknown or has not been characterized formally. Since many AI problems have no formalization yet, conventional complexity theory does not enable a formal definition of AI-completeness. == Research == Roman Yampolskiy suggests that a problem C {\displaystyle C} is AI-Complete if it has two properties: It is in the set of AI problems (Human Oracle-solvable). Any AI problem can be converted into C {\displaystyle C} by some polynomial time algorithm. On the other hand, a problem H {\displaystyle H} is AI-Hard if and only if there is an AI-Complete problem C {\displaystyle C} that is polynomial time Turing-reducible to H {\displaystyle H} . This also gives as a consequence the existence of AI-Easy problems, that are solvable in polynomial time by a deterministic Turing machine with an oracle for some problem. Yampolskiy has also hypothesized that the Turing Test is a defining feature of AI-completeness. Groppe and Jain classify problems which require artificial general intelligence to reach human-level machine performance as AI-complete, while only restricted versions of AI-complete problems can be solved by the current AI systems. For Šekrst, getting a polynomial solution to AI-complete problems would not necessarily be equal to solving the issue of artificial general intelligence, while emphasizing the lack of computational complexity research being the limiting factor towards achieving artificial general intelligence. For Kwee-Bintoro and Velez, solving AI-complete problems would have strong repercussions on society.

    Read more →
  • Umoove

    Umoove

    Umoove is a high tech startup company that has developed and patented a software-only face and eye tracking technology. The idea was first conceived as an attempt to aid people with disabilities but has since evolved. The only compatibility qualification for tablet computers and smartphones to run Umoove software is a front-facing camera. Umoove headquarters are in Israel on Jerusalem’s Har Hotzvim. Umoove has 15 employees and received two million dollars in financing in 2012. The company's original founders invested around $800,000 to start the business in 2010. In 2013 Umoove was named one of the top three most promising Israeli start ups by Newsgeeks magazine. The company also participated in the 2013 LeWeb conference in Paris, France, where innovative technology startups are showcased. == Technology == The technology uses information extracted from previous frames, such as the angle of the user's head to predict where to look for facial targets in the next frame. This anticipation minimizes the amount of computation needed to scan each image. Umoove accounts for variances in environment, lighting conditions and user hand shake/movement. The technology is designed to provide a consistent experience, whether you're in a brightly lit area or a darkened basement, and to work fluidly between them by adapting its processing when it detects color and brightness shifts. It uses an active stabilization technique to filter out natural body movements from an unstable camera in order to minimize false-positive motion detection. Running the Umoove software on a Samsung Galaxy S3 is said to take up only 2% CPU. Umoove works exclusively with software and there is no hardware add-on necessary. It can be run on any smartphone or tablet computer that has a front-facing camera. Umoove claims that even a low-quality camera on an old device will run their software flawlessly. == Umoove Experience == In January 2014 Umoove released its first game onto the app store. The Umoove Experience game lets users control where they are 'flying' in the game through simple gestures and motions with their head. The avatar will basically go toward wherever the user looks. The game was created to showcase the technology for game developers but that did not stop some from criticizing its simplicity. Umoove also announced that they raised another one million dollars and that they are opening offices in Silicon Valley, California. In February 2014, Umoove announced that their face-tracking software development kit is available for Android developers as well as iOS. == Reviews == The Umoove Experience garnered mostly positive reviews from bloggers and mainstream media with some predicting that it could be the future of mobile gaming. Mashable wrote that Umoove's technology could be the emergence of gesture recognition technology in the mobile space, similar to Kinect with console gaming and what Leap Motion has done with desktop computers. Some, however, remain skeptical. CNET, for example, did not give the game a positive review and called the eye tracking technology 'freaky but cool'. They also noted that pioneering technologies have been known to fall short of expectations, citing Apple Inc’s Siri as an example. The technology blog GigaOM said that the Umoove Experience is ’awesome’ and technology evangelist Robert Scoble has called Umoove "brilliant". == uHealth == In January 2015, Umoove released uHealth, a mobile application that uses eye tracking game-like exercise to challenge the user's ability to be attentive, continuously focus, follow commands and avoid distractions. The app is designed in the form of two games, one to improve attention and another that hones focus. uHealth is a training tool, not a diagnostic. Umoove has stated that they want to use their technology for diagnosing neurological disorders but this will depend on clinical tests and FDA approval. The company cites the direct relationship between eye movements and brain activity as well as various vision-based therapies have been backed by many scientific studies conducted over the past decades. uHealth is the first time this type of therapy is delivered right to the end user through a simple download. == Collaboration rumors == In March 2013 there were rumors on the internet that Umoove would be the functioning software embedded into the Samsung Galaxy S4, which was due to launch that month. This rumor was perpetrated by, among others, New York Times, Techcrunch and Yahoo. Once Samsung launched without the Umoove technology rumors about a potential collaboration with Apple Inc hit the web. It has been said that due to the fact that Apple Inc is losing market share and stock value to Samsung they will be more aggressive and eye tracking is a logical place to make that move.

    Read more →
  • Hybrid intelligent system

    Hybrid intelligent system

    Hybrid intelligent system denotes a software system which employs, in parallel, a combination of methods and techniques from artificial intelligence subfields, such as: Neuro-symbolic systems Neuro-fuzzy systems Hybrid connectionist-symbolic models Fuzzy expert systems Connectionist expert systems Evolutionary neural networks Genetic fuzzy systems Rough fuzzy hybridization Reinforcement learning with fuzzy, neural, or evolutionary methods as well as symbolic reasoning methods. From the cognitive science perspective, every natural intelligent system is hybrid because it performs mental operations on both the symbolic and subsymbolic levels. For the past few years, there has been an increasing discussion of the importance of A.I. Systems Integration. Based on notions that there have already been created simple and specific AI systems (such as systems for computer vision, speech synthesis, etc., or software that employs some of the models mentioned above) and now is the time for integration to create broad AI systems. Proponents of this approach are researchers such as Marvin Minsky, Ron Sun, Aaron Sloman, Angelo Dalli and Michael A. Arbib. An example hybrid is a hierarchical control system in which the lowest, reactive layers are sub-symbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning (even by hybrid wisdom). Intelligent systems usually rely on hybrid reasoning processes, which include induction, deduction, abduction and reasoning by analogy.

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →
  • Rademacher complexity

    Rademacher complexity

    In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

    Read more →
  • Apache Kudu

    Apache Kudu

    Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable fast analytics on fast data. The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016. == Comparison with other storage engines == Kudu was designed and optimized for OLAP workloads. Like HBase, it is a real-time store that supports key-indexed record lookup and mutation. Kudu differs from HBase since Kudu's datamodel is a more traditional relational model, while HBase is schemaless. Kudu's "on-disk representation is truly columnar and follows an entirely different storage design than HBase/Bigtable".

    Read more →
  • Video Super Resolution

    Video Super Resolution

    RTX Video Super Resolution (RTX VSR) is a video scaling feature by Nvidia. It was released on February 28, 2023. == History == The feature was first unveiled during CES 2023 as RTX Video Super Resolution. It uses the on-board Tensor Cores to upscale browser video content in real time. Video Super Resolution was initially only available on RTX 30 and 40 series GPUs, while support for 20 series GPUs was added afterwards; it is now available on all Nvidia RTX-branded GPUs. The feature supports input resolutions from 360p to 1440p and a max output of 4K and comes without support for HDR content although that could be likely added in the future. Nvidia released RTX Video Super Resolution 1.5 with improved video quality and RTX 20 series support on October 17, 2023. == Reception == According to ComputerBase, although "the algorithm is not yet working flawlessly", the feature is "overall recommendable".

    Read more →
  • Personality computing

    Personality computing

    Personality computing is a research field related to artificial intelligence and personality psychology that studies personality by means of computational techniques from different sources, including text, multimedia, and social networks. == Overview == Personality computing addresses three main problems involving personality: automatic personality recognition, perception, and synthesis. Automatic personality recognition is the inference of the personality type of target individuals from their digital footprint. Automatic personality perception is the inference of the personality attributed by an observer to a target individual based on some observable behavior. Automatic personality synthesis is the generation of the style or behaviour of artificial personalities in Avatars and virtual agents. Self-assessed personality tests or observer ratings are always exploited as the ground truth for testing and validating the performance of artificial intelligence algorithms for the automatic prediction of personality types. There is a wide variety of personality tests, such as the Myers Briggs Type Indicator (MBTI) or the MMPI, but the most used are tests based on the Five Factor Model such as the Revised NEO Personality Inventory. Personality computing can be considered as an extension or complement of Affective computing, where the former focuses on personality traits and the latter on affective states. A further extension of the two fields is Character Computing which combines various character states and traits including but not limited to personality and affect. == History == Personality computing began around 2005 with the pioneering research in personality recognition by Shlomo Argamon and later by François Mairesse. These works showed that personality traits could be inferred with reasonable accuracy from text, such as blogs, self-presentations, and email addresses. In 2008, the concept of "portable personality" for the distributed management of personality profiles has been developed. A few years later, research began in personality recognition and perception from multimodal and social signals, such as recorded meetings and voice calls. In the 2010s, the research focused mainly on personality recognition and perception from social media, helped by the first workshops organized by Fabio Celli. In particular personality was extracted from Facebook, Twitter and Instagram. In the same years, automatic personality synthesis helped improve the coherence of simulated behavior in virtual agents. Scientific works by Michal Kosinski demonstrated the validity of Personality Computing from different digital footprints, in particular from user preferences such as Facebook page likes, showed that machines can recognize personality better than humans and raised a warning against Cambridge Analytica and misuse of this kind of technology. == Applications == Personality computing techniques, in particular personality recognition and perception, have applications in Social media marketing, where they can help reducing the cost of advertising campaigns through psychological targeting.

    Read more →