AI Data Analyst Zalando

AI Data Analyst Zalando — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Desktop Window Manager

    Desktop Window Manager

    Desktop Window Manager (DWM, previously Desktop Compositing Engine or DCE in builds of pre-reset Windows Longhorn) is the compositing window manager in Microsoft Windows since Windows Vista that enables the use of hardware acceleration to render the graphical user interface of Windows. It was originally created to enable portions of the new "Windows Aero" user experience, which allowed for effects such as transparency, 3D window switching and more. It is also included with Windows Server 2008, but requires the "Desktop Experience" feature and compatible graphics drivers to be installed. == Architecture == The Desktop Window Manager is a compositing window manager, meaning that each program has a buffer that it writes data to; DWM then composites each program's buffer into a final image. By comparison, the stacking window manager in Windows XP and earlier (and also Windows Vista and Windows 7 with Windows Aero disabled) comprises a single display buffer to which all programs write. DWM works in different ways depending on the operating system (Windows 7 or Windows Vista) and on the version of the graphics drivers it uses (WDDM 1.0 or 1.1). Under Windows 7 and with WDDM 1.1 drivers, DWM only writes the program's buffer to the video RAM, even if it is a graphics device interface (GDI) program. This is because Windows 7 supports (limited) hardware acceleration for GDI and in doing so does not need to keep a copy of the buffer in system RAM so that the CPU can write to it. Because the compositor has access to the graphics of all applications, it easily allows visual effects that string together visuals from multiple applications, such as transparency. DWM uses DirectX to perform the function of compositing and rendering in the GPU, freeing the CPU of the task of managing the rendering from the off-screen buffers to the display. However, it does not affect applications painting to the off-screen buffers – depending on the technologies used for that, this might still be CPU-bound. DWM-agnostic rendering techniques like GDI are redirected to the buffers by rendering the user interface (UI) as bitmaps. DWM-aware rendering technologies like WPF directly make the internal data structures available in a DWM-compatible format. The window contents in the buffers are then converted to DirectX textures. The desktop itself is a full-screen Direct3D surface, with windows being represented as a mesh consisting of two adjacent (and mutually inverted) triangles, which are transformed to represent a 2D rectangle. The texture, representing the UI chrome, is then mapped onto these rectangles. Window transitions are implemented as transformations of the meshes, using shader programs. With Windows Vista, the transitions are limited to the set of built-in shaders that implement the transformations. Greg Schechter, a developer at Microsoft has suggested that this might be opened up for developers and users to plug in their own effects in a future release. DWM only maps the primary desktop object as a 3D surface; other desktop objects, including virtual desktops as well as the secure desktop used by User Account Control are not. Because all applications render to an off-screen buffer, they can be read off the buffer embedded in other applications as well. Since the off-screen buffer is constantly updated by the application, the embedded rendering will be a dynamic representation of the application window and not a static rendering. This is how the live thumbnail previews and Windows Flip work in Windows Vista and Windows 7. DWM exposes a public API that allows applications to access these thumbnail representations. The size of the thumbnail is not fixed; applications can request the thumbnails at any size - smaller than the original window, at the same size or even larger - and DWM will scale them properly before returning. Aero Flip does not use the public thumbnail APIs as they do not allow for directly accessing the Direct3D textures. Instead, Aero Flip is implemented directly in the DWM engine. The Desktop Window Manager uses Media Integration Layer (MIL), the unmanaged compositor which it shares with Windows Presentation Foundation, to represent the windows as composition nodes in a composition tree. The composition tree represents the desktop and all the windows hosted in it, which are then rendered by MIL from the back of the scene to the front. Since all the windows contribute to the final image, the color of a resultant pixel can be decided by more than one window. This is used to implement effects such as per-pixel transparency. DWM allows custom shaders to be invoked to control how pixels from multiple applications are used to create the displayed pixel. The DWM includes built-in Pixel Shader 2.0 programs which compute the color of a pixel in a window by averaging the color of the pixel as determined by the window behind it and its neighboring pixels. These shaders are used by DWM to achieve the blur effect in the window borders of windows managed by DWM, and optionally for the areas where it is requested by the application. Since MIL provides a retained mode graphics system by caching the composition trees, the job of repainting and refreshing the screen when windows are moved is handled by DWM and MIL, freeing the application of the responsibility. The background data is already in the composition tree and the off-screen buffers and is directly used to render the background. In pre-Vista Windows OSs, background applications had to be requested to re-render themselves by sending them the WM_PAINT message. DWM uses double-buffered graphics to prevent flickering and tearing when moving windows. The compositing engine uses optimizations such as culling to improve performance, as well as not redrawing areas that have not changed. Because the compositor is multi-monitor aware, DWM natively supports this too. During full-screen applications, such as games, DWM does not perform window compositing and therefore performance will not appreciably decrease. On Windows 8 and Windows Server 2012, DWM is used at all times and cannot be disabled, due to the new "start screen experience" implemented. Since the DWM process is usually required to run at all times on Windows 8, users experiencing an issue with the process are seeing memory usage decrease after a system reboot. This is often the first step in a long list of troubleshooting tasks that can help. It is possible to prevent DWM from restarting temporarily in Windows 8, which causes the desktop to turn black, the taskbar grey, and break the start screen/modern apps, but desktop apps will continue to function and appear just like Windows 7 and Vista's Basic theme, based on the single-buffer renderer used by XP. They also use Windows 8's centered title bar, visible within Windows PreInstallation Environment. Starting up Windows without DWM will not work because the default lock screen requires DWM unlike the fallback lockscreen that appears as a command line interface program when Windows.UI.Logon.dll isn't present on Windows versions such as 1507 and later, so it can only be done on the fly, and does not have any practical purposes. Starting with Windows 10, disabling DWM in such a way will cause the entire compositing engine to break, even traditional desktop apps, due to Universal App implementations in the taskbar and new start menu. Windows can still be partially usable without the presence of DWM but requires Sihost.exe to not be present due to it relying on DWM. Most of the applications in Windows 11 require DWM to render UI elements and transparency, Windows 11's new task manager requires dwm to render menus unlike the fallback -d version. Unlike its predecessors, Windows 8 supports basic display adapters through Windows Advanced Rasterization Platform (WARP), which uses software rendering and the CPU to render the interface rather than the graphics card. This allows DWM to function without compatible drivers, but not at the same level of performance as with a normal graphics card. DWM on Windows 8 also adds support for stereoscopic 3D. == Redirection == For rendering techniques that are not DWM-aware, output must be redirected to the DWM buffers. With Windows, either GDI or DirectX can be used for rendering. To make these two work with DWM, redirection techniques for both are provided. With GDI, which is the most used UI rendering technique in Microsoft Windows, each application window is notified when it or a part of it comes in view and it is the job of the application to render itself. Without DWM, the rendering rasterizes the UI in a buffer in video memory, from where it is rendered to the screen. Under DWM, GDI calls are redirected to use the Canonical Display Driver (cdd.dll), a software renderer. A buffer equal to the size of the window is allocated in system memory and CDD.DLL outputs to this buffer rather than the video memory. Another buffer is allocated in the video memory to represent t

    Read more →
  • Artificial consciousness

    Artificial consciousness

    Artificial consciousness, also known as machine consciousness, synthetic consciousness, or digital consciousness, is consciousness hypothesized to be possible for artificial intelligence. It is also the corresponding field of study, which draws insights from philosophy of mind, philosophy of artificial intelligence, cognitive science and neuroscience. The term "sentience" can be used when specifically designating ethical considerations stemming from a form of phenomenal consciousness (P-consciousness, or the ability to feel qualia). Since sentience involves the ability to experience ethically positive or negative (i.e., valenced) mental states, it may justify welfare concerns and legal protection, as with non-human animals. Some scholars believe that consciousness is generated by the interoperation of various parts of the brain; these mechanisms are labeled the neural correlates of consciousness (NCC). Some further believe that constructing a system (e.g., a computer system) that can emulate this NCC interoperation would result in a system that is conscious. Some scholars reject the possibility of non-biological conscious beings. == Philosophical views == As there are many hypothesized types of consciousness, there are many potential implementations of artificial consciousness. In the philosophical literature, perhaps the most common taxonomy of consciousness is into "access" and "phenomenal" variants. Access consciousness concerns those aspects of experience that can be apprehended, while phenomenal consciousness concerns those aspects of experience that seemingly cannot be apprehended, instead being characterized qualitatively in terms of "raw feels", "what it is like" or qualia. === Plausibility debate === Type-identity theorists and other skeptics hold the view that consciousness can be realized only in particular physical systems because consciousness has properties that necessarily depend on physical constitution. In his 2001 article "Artificial Consciousness: Utopia or Real Possibility," Giorgio Buttazzo says that a common objection to artificial consciousness is that, "Working in a fully automated mode, they [the computers] cannot exhibit creativity, unreprogrammation (which means can 'no longer be reprogrammed', from rethinking), emotions, or free will. A computer, like a washing machine, is a slave operated by its components." For other theorists (e.g., functionalists), who define mental states in terms of causal roles, any system that can instantiate the same pattern of causal roles, regardless of physical constitution, will instantiate the same mental states, including consciousness. ==== Thought experiments ==== David Chalmers proposed two thought experiments intending to demonstrate that "functionally isomorphic" systems (those with the same "fine-grained functional organization", i.e., the same information processing) will have qualitatively identical conscious experiences, regardless of whether they are based on biological neurons or digital hardware. The "fading qualia" is a reductio ad absurdum thought experiment. It involves replacing, one by one, the neurons of a brain with a functionally identical component, for example based on a silicon chip. Chalmers makes the hypothesis, knowing it in advance to be absurd, that "the qualia fade or disappear" when neurons are replaced one-by-one with identical silicon equivalents. Since the original neurons and their silicon counterparts are functionally identical, the brain's information processing should remain unchanged, and the subject's behaviour and introspective reports would stay exactly the same. Chalmers argues that this leads to an absurd conclusion: the subject would continue to report normal conscious experiences even as their actual qualia fade away. He concludes that the subject's qualia actually don't fade, and that the resulting robotic brain, once every neuron is replaced, would remain just as sentient as the original biological brain. Similarly, the "dancing qualia" thought experiment is another reductio ad absurdum argument. It supposes that two functionally isomorphic systems could have different perceptions (for instance, seeing the same object in different colors, like red and blue). It involves a switch that alternates between a chunk of brain that causes the perception of red, and a functionally isomorphic silicon chip, that causes the perception of blue. Since both perform the same function within the brain, the subject would not notice any change during the switch. Chalmers argues that this would be highly implausible if the qualia were truly switching between red and blue, hence the contradiction. Therefore, he concludes that the equivalent digital system would not only experience qualia, but it would perceive the same qualia as the biological system (e.g., seeing the same color). Greg Egan's short story Learning To Be Me (mentioned in §In fiction), illustrates how undetectable duplication of the brain and its functionality could be from a first-person perspective. Critics object that Chalmers' proposal begs the question in assuming that all mental properties and external connections are already sufficiently captured by abstract causal organization. Van Heuveln et al. argue that the dancing qualia argument contains an equivocation fallacy, conflating a "change in experience" between two systems with an "experience of change" within a single system. Mogensen argues that the fading qualia argument can be resisted by appealing to vagueness at the boundaries of consciousness and the holistic structure of conscious neural activity, which suggests consciousness may require specific biological substrates rather than being substrate-independent. Anil Seth argues that the complexity of brain neurons intrinsically matters in addition to their function and that it is not possible to replace any part of the brain with a perfect silicon equivalent. He points out that some of biological neurons exhibit activity aimed at cleaning up metabolic waste products, and writes that a perfect silicon replacement would require a silicon-based metabolism, but silicon is not suitable for creating such artificial metabolism. ==== In large language models ==== In 2022, Google engineer Blake Lemoine made a viral claim that Google's LaMDA chatbot was sentient. Lemoine supplied as evidence the chatbot's humanlike answers to many of his questions; however, the chatbot's behavior was judged by the scientific community as likely a consequence of mimicry, rather than machine sentience. Lemoine's claim was widely derided for being ridiculous. Moreover, attributing consciousness based solely on the basis of LLM outputs or the immersive experience created by an algorithm is considered a fallacy. However, while philosopher Nick Bostrom states that LaMDA is unlikely to be conscious, he additionally poses the question of "what grounds would a person have for being sure about it?" One would have to have access to unpublished information about LaMDA's architecture, and also would have to understand how consciousness works, and then figure out how to map the philosophy onto the machine: "(In the absence of these steps), it seems like one should be maybe a little bit uncertain. [...] there could well be other systems now, or in the relatively near future, that would start to satisfy the criteria." David Chalmers argued in 2023 that LLMs today display impressive conversational and general intelligence abilities, but are likely not conscious yet, as they lack some features that may be necessary, such as recurrent processing, a global workspace, and unified agency. Nonetheless, he considers that non-biological systems can be conscious, and suggested that future, extended models (LLM+s) incorporating these elements might eventually meet the criteria for consciousness, raising both profound scientific questions and significant ethical challenges. However, the view that consciousness can exist without biological phenomena is controversial and some reject it. Kristina Šekrst cautions that anthropomorphic terms such as "hallucination" can obscure important ontological differences between artificial and human cognition. While LLMs may produce human-like outputs, she argues that it does not justify ascribing mental states or consciousness to them. Instead, she advocates for an epistemological framework (such as reliabilism) that recognizes the distinct nature of AI knowledge production. She suggests that apparent understanding in LLMs may be a sophisticated form of AI hallucination. She also questions what would happen if an LLM were trained without any mention of consciousness. === Testing === Sentience is an inherently first-person phenomenon. Because of that, and due to the lack of an empirical definition of sentience, directly measuring it may be impossible. Although systems may display numerous behaviors correlated with sentience, determining whether a system is sentient is known as the hard pr

    Read more →
  • Graphics processing unit

    Graphics processing unit

    A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a component on a discrete graphics card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. GPUs are increasingly being used for artificial intelligence (AI) processing due to linear algebra acceleration, which is also used extensively in graphics processing. Although there is no single definition of the term, and it may be used to describe any video display system, in modern use a GPU includes the ability to internally perform the calculations needed for various graphics tasks, like rotating and scaling 3D images, and often the additional ability to run custom programs known as shaders. This contrasts with earlier graphics controllers known as video display controllers which had no internal calculation capabilities, or blitters, which performed only basic memory movement operations. The modern GPU emerged during the 1990s, adding the ability to perform operations like drawing lines and text without CPU help, and later adding 3D functionality. Graphics functions are generally independent and this lends these tasks to being implemented on separate calculation engines. Modern GPUs include hundreds, or thousands, of calculation units. This made them useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. The ability of GPUs to rapidly perform vast numbers of calculations has led to their adoption in diverse fields including artificial intelligence (AI) where they excel at handling data-intensive and computationally demanding tasks. Other non-graphical uses include the training of neural networks and cryptocurrency mining. == History == === 1960s === Dedicated 3D graphics hardware dates back to graphic terminals such as the Adage AGT-30 from 1967 with analog matrix processors. In 1969 Evans & Sutherland (E&S) introduced the Line Drawing System-1 (LDS-1), which was the first all-digital system to provide matrix multiplication. Also in 1969, the low-cost graphics terminal IMLAC PDS-1 was introduced. It later saw use as an early 3D gaming machine with the likes of Maze War. === 1970s === In professional hardware, in 1972 PLATO IV system becomes operational at the University of Illinois Urbana-Champaign. Between around 1973 and 1978, several networked multiplayer wireframe 3D games are implemented and popularized by users of the system. Also in 1972, the E&S Continuous Tone 1 (CT1) "Watkins box" system (consisting of an E&S LDS-2 and Shaded Picture System) is delivered to Case Western Reserve University. It offered the first real-time Gouraud shading. In 1975, a joint effort between Evans & Sutherland Computer Corporation and the University of Utah's computer graphics department results in the first ever MOSFET video framebuffer, capable of color and smooth shading. E&S Continuous Tone 3 (CT3) system was delivered in 1977 to Lufthansa for pilot training using computer simulation. It was the first graphics system capable of real-time texture mapping. Ikonas made graphics systems with 8- and 24-bit graphics and 3D acceleration in the late 70s. Arcade system boards have used specialized 2D graphics circuits since the 1970s. In early video game hardware, RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor. A specialized barrel shifter circuit helped the CPU animate the framebuffer graphics for various 1970s arcade video games from Midway and Taito, such as Gun Fight (1975), Sea Wolf (1976), and Space Invaders (1978). The Namco Galaxian arcade system in 1979 used specialized graphics hardware that supported RGB color, multi-colored sprites, and tilemap backgrounds. The Galaxian hardware was widely used during the golden age of arcade video games, by game companies such as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu, Sega, and Taito. The Atari 2600 in 1977 used a video shifter called the Television Interface Adaptor. Atari 8-bit computers (1979) had ANTIC, a video processor which interpreted instructions describing a "display list"—the way the scan lines map to specific bitmapped or character modes and where the memory is stored (so there did not need to be a contiguous frame buffer). 6502 machine code subroutines could be triggered on scan lines by setting a bit on a display list instruction. ANTIC also supported smooth vertical and horizontal scrolling independent of the CPU. === 1980s === In the 1980s significant advancements were made in professional 3D graphics hardware. Perhaps most impactful was the 1981 development of the Geometry Engine, a VLSI vector processor ASIC designed by Jim Clark and Marc Hannah at Stanford University. This processor is the forerunner of modern tensor cores and other similar processors marketed for graphics and AI. The Geometry Engine went on to be used in Silicon Graphics workstations for many years. Silicon Graphics's first product, shipped in November 1983, was the IRIS 1000, a terminal with hardware-accelerated 3D graphics based on the Geometry Engine. The Geometry Engine was capable of approximately 6 million operations per second. The 1981 NEC μPD7220 was the first implementation of a personal computer graphics display processor as a single large-scale integration (LSI) integrated circuit chip. This enabled the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became the best-known GPU until the mid-1980s. It was the first fully integrated VLSI (very large-scale integration) metal–oxide–semiconductor (NMOS) graphics display processor for PCs, supported up to 1024×1024 resolution, and laid the foundations for the PC graphics market. It was used in a number of graphics cards and was licensed for clones such as the Intel 82720, the first of Intel's graphics processing units. The Williams Electronics arcade games Robotron: 2084, Joust, Sinistar, and Bubbles, all released in 1982, contain custom blitter chips for operating on 16-color bitmaps. In 1984, Hitachi released the ARTC HD63484, the first major CMOS graphics processor for personal computers. The ARTC could display up to 4K resolution when in monochrome mode. It was used in a number of graphics cards and terminals during the late 1980s. In 1985, the Amiga was released with a custom graphics chip called Agnus including a blitter for bitmap manipulation, line drawing, and area fill. It also included a coprocessor with its own simple instruction set, that was capable of manipulating graphics hardware registers in sync with the video beam (e.g. for per-scanline palette switches, sprite multiplexing, and hardware windowing), or driving the blitter. Also in 1985, IBM released the Professional Graphics Controller, designed by later to be Nvidia co-founder Curtis Priem, which was a rudimentary 3D card with 640 × 480 256-color graphics which used a dedicated CPU to draw graphics independently of the main system. It was used as the basis of cards by a number of makers (including Matrox) and its analog RGB signaling led directly to the VGA video standard. Priem later in the 80s worked on the influential Sun Microsystems GX (also known as cgsix) accelerated 2D graphics card. In 1986, Texas Instruments released the TMS34010, the first fully programmable graphics processor. It could run general-purpose code but also had a graphics-oriented instruction set. During 1990–1992, this chip became the basis of the Texas Instruments Graphics Architecture ("TIGA") Windows accelerator cards. Following in 1987, the IBM 8514 graphics system was released. It was one of the first video cards for IBM PC compatibles that implemented fixed-function 2D primitives in electronic hardware. Sharp's X68000, released in 1987, used a custom graphics chipset with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields. It served as a development machine for Capcom's CP System arcade board. Fujitsu's FM Towns computer, released in 1989, had support for a 16,777,216 color palette. For context, IBM also introduced its Video Graphics Array (VGA) display system in 1987, with a maximum resolution of 640 × 480 pixels. Unlike 8514/A, VGA had no hardware acceleration features. In November 1988, NEC Home Electronics announced its creation of the Video Electronics Standards Association (VESA) to develop and promote a Super VGA (SVGA) computer display standard as a successor to VGA. Super VGA enabled graphics display resolutions up to 800 × 600 pixels, a 56% increase. In 1988 SGI sold IRIS workstation graphics with 10-12 Geometry Engines and introduced the IrisVision add-in board for IBM MicroChannel bus (RS/6000) based on the Geometry Engine as well. In 1988 as well, the first dedicated polygonal 3D graphics boards in arcade machines were introduced wit

    Read more →
  • Gödel machine

    Gödel machine

    A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

    Read more →
  • Outline of natural language processing

    Outline of natural language processing

    Natural language processing is computer activity in which computers are entailed to analyze, understand, alter, or generate natural language. This includes the automation of any or all linguistic forms, activities, or methods of communication, such as conversation, correspondence, reading, written composition, dictation, publishing, translation, lip reading, and so on. Natural-language processing is also the name of the branch of computer science, artificial intelligence, and linguistics concerned with enabling computers to engage in communication using natural language(s) in all forms, including but not limited to speech, print, writing, and signing. The following outline is provided as an overview of and topical guide to natural-language processing: == Natural-language processing == Natural-language processing can be described as all of the following: A field of science – systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe. An applied science – field that applies human knowledge to build or design useful things. A field of computer science – scientific and practical approach to computation and its applications. A branch of artificial intelligence – intelligence of machines and robots and the branch of computer science that aims to create it. A subfield of computational linguistics – interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective. An application of engineering – science, skill, and profession of acquiring and applying scientific, economic, social, and practical knowledge, in order to design and also build structures, machines, devices, systems, materials and processes. An application of software engineering – application of a systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software. A subfield of computer programming – process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages (such as Java, C++, C#, Python, etc.). The purpose of programming is to create a set of instructions that computers use to perform specific operations or to exhibit desired behaviors. A subfield of artificial intelligence programming – A type of system – set of interacting or interdependent components forming an integrated whole or a set of elements (often called 'components' ) and relationships which are different from relationships of the set or its elements to other elements or sets. A system that includes software – software is a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it. Software refers to one or more computer programs and data held in the storage of the computer. In other words, software is a set of programs, procedures, algorithms and its documentation concerned with the operation of a data processing system. A type of technology – making, modification, usage, and knowledge of tools, machines, techniques, crafts, systems, methods of organization, in order to solve a problem, improve a preexisting solution to a problem, achieve a goal, handle an applied input/output relation or perform a specific function. It can also refer to the collection of such tools, machinery, modifications, arrangements and procedures. Technologies significantly affect human as well as other animal species' ability to control and adapt to their natural environments. A form of computer technology – computers and their application. NLP makes use of computers, image scanners, microphones, and many types of software programs. Language technology – consists of natural-language processing (NLP) and computational linguistics (CL) on the one hand, and speech technology on the other. It also includes many application oriented aspects of these. It is often called human language technology (HLT). == Prerequisite technologies == The following technologies make natural-language processing possible: Communication – the activity of a source sending a message to a receiver Language – Speech – Writing – Computing – Computers – Computer programming – Information extraction – User interface – Software – Text editing – program used to edit plain text files Word processing – piece of software used for composing, editing, formatting, printing documents Input devices – pieces of hardware for sending data to a computer to be processed Computer keyboard – typewriter style input device whose input is converted into various data depending on the circumstances Image scanners – == Subfields of natural-language processing == Information extraction (IE) – field concerned in general with the extraction of semantic information from text. This covers tasks such as named-entity recognition, coreference resolution, relationship extraction, etc. Ontology engineering – field that studies the methods and methodologies for building ontologies, which are formal representations of a set of concepts within a domain and the relationships between those concepts. Speech processing – field that covers speech recognition, text-to-speech and related tasks. Statistical natural-language processing – Statistical semantics – a subfield of computational semantics that establishes semantic relations between words to examine their contexts. Distributional semantics – a subfield of statistical semantics that examines the semantic relationship of words across a corpora or in large samples of data. == Related fields == Natural-language processing contributes to, and makes use of (the theories, tools, and methodologies from), the following fields: Automated reasoning – area of computer science and mathematical logic dedicated to understanding various aspects of reasoning, and producing software which allows computers to reason completely, or nearly completely, automatically. A sub-field of artificial intelligence, automatic reasoning is also grounded in theoretical computer science and philosophy of mind. Linguistics – scientific study of human language. Natural-language processing requires understanding of the structure and application of language, and therefore it draws heavily from linguistics. Applied linguistics – interdisciplinary field of study that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, linguistics, psychology, computer science, anthropology, and sociology. Some of the subfields of applied linguistics relevant to natural-language processing are: Bilingualism / Multilingualism – Computer-mediated communication (CMC) – any communicative transaction that occurs through the use of two or more networked computers. Research on CMC focuses largely on the social effects of different computer-supported communication technologies. Many recent studies involve Internet-based social networking supported by social software. Contrastive linguistics – practice-oriented linguistic approach that seeks to describe the differences and similarities between a pair of languages. Conversation analysis (CA) – approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. Turn-taking is one aspect of language use that is studied by CA. Discourse analysis – various approaches to analyzing written, vocal, or sign language use or any significant semiotic event. Forensic linguistics – application of linguistic knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. Interlinguistics – study of improving communications between people of different first languages with the use of ethnic and auxiliary languages (lingua franca). For instance by use of intentional international auxiliary languages, such as Esperanto or Interlingua, or spontaneous interlanguages known as pidgin languages. Language assessment – assessment of first, second or other language in the school, college, or university context; assessment of language use in the workplace; and assessment of language in the immigration, citizenship, and asylum contexts. The assessment may include analyses of listening, speaking, reading, writing or cultural understanding, with respect to understanding how the language works theoretically and the ability to use the language practically. Language pedagogy – science and art of language education, including approaches and methods of language teaching and study. Natural-language processing is used in programs designed to teach language, including first- and second-language training. Language planning – Language policy – Lexicography – Literacies – Pragmatics – Second-language acquisition – Stylistics – Translation – Comp

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • Meta-Labeling

    Meta-Labeling

    Meta-labeling, also known as corrective AI, is a machine learning (ML) technique utilized in quantitative finance to enhance the performance of investment and trading strategies, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. The core idea is to separate the decision of trade direction (side) from the decision of trade sizing, addressing the inefficiencies of simultaneously learning both side and size predictions. The side decision involves forecasting market movements (long, short, neutral), while the size decision focuses on risk management and profitability. It serves as a secondary decision-making layer that evaluates the signals generated by a primary predictive model. By assessing the confidence and likely profitability of those signals, meta-labeling allows investors and algorithms to dynamically size positions and suppress false positives. == Motivation == Meta-labeling is designed to improve precision without sacrificing recall. As noted by López de Prado, attempting to model both the direction and the magnitude of a trade using a single algorithm can result in poor generalization. By separating these tasks, meta-labeling enables greater flexibility and robustness: Enhances control over capital allocation. Reduces overfitting by limiting model complexity. Allows the use of interpretability tools and tailored thresholds to manage risk. Enables dynamic trade suppression in unfavorable regimes. == Applications == Meta-labeling has been applied in a variety of financial ML contexts, including: Algorithmic trading: Filtering and sizing trades to reduce false positives. Portfolio optimization: Scaling exposure across multiple signals with differing confidence levels. Risk management: Dynamically disabling strategies in adverse market conditions. Model validation: Interpreting when and why a model may be underperforming due to regime shifts. == General architecture == Meta-labeling decouples two core components of systematic trading strategies: directional prediction and position sizing. The process involves training a primary model to generate trade signals (e.g., buy, sell, or hold) and then training a secondary model to determine whether each signal is likely to lead to a profitable trade. The second model outputs a probability that is interpreted as the confidence in the forecast, which can be used to adjust the position size or to filter out unreliable trades. Meta-labeling is typically implemented as a three-stage process: Primary model (M1): Predicts the direction or label of a financial outcome using features such as market prices, returns, or volatility indicators. A typical output is directional, e.g., Y ∈ {−1,0,1}, representing short, neutral, or long positions. Secondary model (M2): A binary classifier trained to predict whether the primary model's prediction will be profitable. The target variable is a binary meta-label F ∈ { 0 , 1 } {\displaystyle F\in \{0,1\}} . Inputs can include features used in the primary model, performance diagnostics, or market regime data. Position sizing algorithm (M3): Translates the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 1: Forecasting side === Primary model architecture Figure 1 Figure 1 presents the architecture of a primary model. It focuses on forecasting the side of the trade. Following the example, this model (M1) takes in input data – such as open-high-low-close data and determines the side of the position to take: a negative number is a short position, and positive number is a long position, the range is set between −1 and 1 (the closer it is to −1 or 1, the stronger the models conviction is). When training the model, the labels are −1 and 1, based on the direction of forward returns for some predefined investment horizon. The researcher may decide to apply a recall check (τ: "Tau") by setting a minimum threshold that the initial output needs to be to qualify of a short or long position (if the threshold is not met, no side forecast is predicted, leading to closing of any open positions), this leads to the primary model output which is one of three possible side forecasts: −1, 0, or 1. The primary model also generates evaluation data which can be used by the secondary model, to improve performance of size forecasts. Some examples of evaluation data include rolling accuracy, F1, recall, precision, and AUC scores. === Stage 2: Filtering out false positives === General meta-labeling architecture Figure 2 Next comes the phase of filtering out false positives, by applying a secondary machine learning model (M2), which is a binary classifier trained to determine if the trade will be profitable or not. The model takes as input four general groupings of data: General input data which is predictive of a false positive. For example the last 30 days rolling volatility of the underlying asset. Evaluation data. Market state and regime data, one may find that macro economic data or clustering the market into regimes may help as specific trading strategies are known to perform better in particular regimes. Example: momentum based strategies perform best in periods with low volatility and strong directional moves. Primary models initial input which is a value between −1 and 1. This highlights the strength of the primary models conviction. The output of the model is a value between −1 and 1 (if using a Tanh function) which will indicate the strength of the conviction that a short or long position is profitable, or it could simply be between 0 and 1 (using a sigmoid function) if one only wanted to know if it made money or not. This output allows filtering out trades that are likely to lead to losses. One could stop at this point or use the outputs of the secondary model as inputs to a position sizing algorithm (M3) which could further enhance strategy performance metrics by translating the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 3: Optimizing position sizes === ==== Position sizing methods (M3) ==== Various algorithms have been proposed for transforming predicted probabilities into trade sizes: All-or-nothing: Allocate 100% of capital if the probability exceeds a predefined threshold (e.g., 0.5); otherwise, do not trade. Model confidence: Use the probability score directly as the fraction of capital allocated. Linear scaling: Rescale the model's probabilities using min-max normalization based on the training data. Normal CDF (NCDF): Use a normal cumulative distribution function applied to a z-statistic derived from the predicted probability. Empirical CDF (ECDF): Rank probabilities based on their percentile in the training data to ensure relative allocation. Sigmoid Optimal Position Sizing (SOPS): Applies a smooth non-linear sigmoid transformation optimized to maximize risk-adjusted returns (Sharpe ratio). ==== Model calibration ==== Each machine learning algorithm used in meta-labeling tends to produce outputs with different characteristic distributions; for example, some are approximately normally distributed, whereas others exhibit a pronounced U-shape, concentrating probabilities near the extremes. Due to these varying distributions, simply summing the outputs of different models can inadvertently lead to uneven weighting of signals, biasing trade decisions. To address this, model calibration techniques are essential to adjust the predicted probabilities towards frequentist probabilities, ensuring that model outputs reflect true likelihoods more accurately. Two common calibration techniques are: Platt scaling (Sigmoid scaling): Suitable for correcting S-shaped calibration plots typically produced by models such as support vector machines (SVMs). Isotonic regression: Fits a non-decreasing step function to probabilities and is effective particularly with larger datasets, though it can sometimes lead to overfitting. Transforming predictions to frequentist probabilities is crucial as it provides probabilistic outputs that are directly interpretable as the actual likelihood of an event occurring. Such calibration significantly enhances the effectiveness of fixed position sizing methods, reducing maximum drawdowns and increasing risk-adjusted returns. However, calibration has less impact on position sizing methods that directly estimate parameters from the training data, such as ECDF and SOPS, suggesting that calibration is a critical step mainly for fixed methods that rely heavily on raw model outputs. =

    Read more →
  • Intelligent decision support system

    Intelligent decision support system

    An intelligent decision support system (IDSS) is a decision support system that makes extensive use of artificial intelligence (AI) techniques. Use of AI techniques in management information systems has a long history – indeed terms such as "Knowledge-based systems" (KBS) and "intelligent systems" have been used since the early 1980s to describe components of management systems, but the term "Intelligent decision support system" is thought to originate with Clyde Holsapple and Andrew Whinston in the late 1970s. Examples of specialized intelligent decision support systems include Flexible manufacturing systems (FMS), intelligent marketing decision support systems and medical diagnosis systems. Ideally, an intelligent decision support system should behave like a human consultant: supporting decision makers by gathering and analysing evidence, identifying and diagnosing problems, proposing possible courses of action and evaluating such proposed actions. The aim of the AI techniques embedded in an intelligent decision support system is to enable these tasks to be performed by a computer, while emulating human capabilities as closely as possible. Many IDSS implementations are based on expert systems, a well established type of KBS that encode knowledge and emulate the cognitive behaviours of human experts using predicate logic rules, and have been shown to perform better than the original human experts in some circumstances. Expert systems emerged as practical applications in the 1980s based on research in artificial intelligence performed during the late 1960s and early 1970s. They typically combine knowledge of a particular application domain with an inference capability to enable the system to propose decisions or diagnoses. Accuracy and consistency can be comparable to (or even exceed) that of human experts when the decision parameters are well known (e.g. if a common disease is being diagnosed), but performance can be poor when novel or uncertain circumstances arise. Research in AI focused on enabling systems to respond to novelty and uncertainty in more flexible ways is starting to be used in IDSS. For example, intelligent agents that perform complex cognitive tasks without any need for human intervention have been used in a range of decision support applications. Capabilities of these intelligent agents include knowledge sharing, machine learning, data mining, and automated inference. A range of AI techniques such as case based reasoning, rough sets and fuzzy logic have also been used to enable decision support systems to perform better in uncertain conditions. A 2009 research about a multi-artificial system intelligence system named IILS is proposed to automate problem-solving processes within the logistics industry. The system involves integrating intelligence modules based on case-based reasoning, multi-agent systems, fuzzy logic, and artificial neural networks aiming to offer advanced logistics solutions and support in making well-informed, high-quality decisions to address a wide range of customer needs and challenges.

    Read more →
  • CineAsset

    CineAsset

    CineAsset was a complete mastering software suite by Doremi Labs that could create and playback encrypted (Pro version) and unencrypted DCI compliant packages from virtually any source. CineAsset included a separate "Editor" application for generating Digital Cinema Packages (DCPs). CineAsset Pro added the ability to generate encrypted DCPs and Key Delivery Messages (KDMs) for any encrypted content in the database. It has since been discontinued, along with CineAsset Player. == Features == == Supported formats == === Input === Source: ==== Containers ==== AVI MOV MXF MPG TS WMV M2TS MTS MP4 MKV ==== Video Codecs ==== JPEG2000 ProRes 422 DNxHD® YUV Uncompressed 8-10 bits DIVX® XVID® MPEG4 AVC / H-264 VC-1 MPEG2 ==== Image Sequences ==== BMP TIFF TGA DPX JPG J2C ==== Audio Files ==== WAV MP3 WMA MP2 === Output === Source: ==== JPEG2000 ==== 2D and 3D at up to 4K resolution Bit Rate: 50–250 Mbit/s (500 Mbit/s for frame rates above 30 fps) Speed: Faster than real-time processing when using optional render nodes ==== MPEG2 ==== I-Only or Long GOP 1080p up to 80 Mbit/s ==== H264 ==== 1080p up to 50 Mbit/s ==== VC1 ==== DCP wrapping only (no transcode)

    Read more →
  • Outline of deep learning

    Outline of deep learning

    The following outline is provided as an overview of, and topical guide to, deep learning: Deep learning is a subfield of machine learning and artificial intelligence based on artificial neural networks with multiple processing layers. It emphasizes representation learning and is widely used in areas such as computer vision, natural language processing, speech recognition, recommender systems, robotics, and generative artificial intelligence. == Ways to categorize deep learning == A field of study A branch of artificial intelligence A subfield of machine learning A subfield of computer science A form of representation learning A class of methods based on artificial neural networks An approach used in computational statistics == History == === Precursors === Cybernetics Perceptron Connectionism Neocognitron Backpropagation === Milestones === LeNet Long short-term memory Deep belief network AlexNet Sequence to sequence learning Generative adversarial network Residual neural network Transformer BERT Generative pre-trained transformer Diffusion model === Related histories === History of artificial intelligence History of machine learning Timeline of machine learning == Core concepts == == Learning settings == Supervised learning Unsupervised learning Self-supervised learning Semi-supervised learning Reinforcement learning Transfer learning Multitask learning Multimodal learning Online machine learning Continual learning == Common tasks == Image classification Object detection Image segmentation Automatic speech recognition Neural machine translation Question answering Automatic summarization Text-to-image model Protein structure prediction == Architectures == === Feedforward and convolutional architectures === Feedforward neural network Multilayer perceptron Convolutional neural network Radial basis function network Residual neural network U-Net === Recurrent and sequence architectures === Recurrent neural network Long short-term memory Gated recurrent unit Sequence to sequence learning Recursive neural network === Representation-learning architectures === Autoencoder Denoising autoencoder Sparse autoencoder Variational autoencoder Restricted Boltzmann machine Deep belief network === Attention and transformer architectures === Attention (machine learning) Transformer BERT Generative pre-trained transformer Vision transformer === Generative and probabilistic architectures === Autoregressive model Diffusion model Energy-based model Generative adversarial network Mixture of experts === Graph and memory architectures === Graph neural network Graph convolutional network Siamese network Neural Turing machine Memory network Echo state network Capsule neural network == Neural network components and techniques == Artificial neuron Activation function Rectified linear unit Sigmoid function Softmax function Embedding Convolution Pooling layer Attention Batch normalization Layer normalization Residual connections == Training and optimization == Backpropagation Gradient descent Stochastic gradient descent Adam optimization Learning rate Loss function Cross-entropy Mean squared error Regularization Dropout Early stopping Batch normalization Data augmentation Transfer learning Knowledge distillation Ensemble learning Curriculum learning == Datasets and benchmarks == CIFAR-10 ImageNet MNIST database Common Objects in Context (COCO) General Language Understanding Evaluation (GLUE) benchmark LibriSpeech SQuAD == Applications == === Computer vision === Computer vision Facial recognition system Image classification Image segmentation Medical imaging Object detection Optical character recognition === Natural language processing === Automatic summarization Chatbot Information retrieval Large language model Natural language processing Neural machine translation Question answering Sentiment analysis === Speech and audio === Automatic speech recognition Music information retrieval Speaker recognition Speech synthesis === Science and medicine === Bioinformatics Computational biology Drug discovery Medical diagnosis Protein structure prediction === Robotics and control === Autonomous car Computer game bot Control theory Robotics === Recommendation, search, and forecasting === Anomaly detection Forecasting Fraud detection Recommender system Search engine === Generative artificial intelligence === Deepfake Generative artificial intelligence Large language model Speech synthesis Text-to-image model === Computer graphics and video games === Deep Learning Anti-Aliasing (DLAA) Deep Learning Super Sampling (DLSS) == Hardware == AMD Instinct AMD XDNA Application-specific integrated circuit Deep learning processor, Neural processing unit (NPU), or Neural Engine Field-programmable gate array General-purpose computing on graphics processing units (GPGPU) Graphics processing unit NVIDIA Deep Learning Accelerator (NVDLA) Tensor processing unit Vision processing unit Wafer-scale integration === Supporting software platforms === CUDA Metal ROCm == Software == === Open-source frameworks and libraries === === Neural network software === EDLUT Emergent Encog JOONE Neuroph NeuroSolutions OpenNN Peltarion Synapse SNNS === Platforms, tools, and deployment === Amazon SageMaker Google Colab Hugging Face Kaggle Kubeflow MLflow ONNX OpenVINO TensorFlow Hub == Algorithms for deep learning and neural networks == Backpropagation Conjugate gradient method Generalized Hebbian algorithm Gradient descent Levenberg–Marquardt algorithm Perceptron Quasi-Newton method Wake-sleep algorithm == Methods and related topics == === Representation and metric learning === Contrastive learning Embedding Feature learning Manifold learning Metric learning === Generative modeling === Autoregressive model Diffusion model Generative adversarial network Generative model Variational inference === Efficient and scalable deep learning === Knowledge distillation Low-rank approximation Mixture of experts Quantization Sparsity === Reliability, safety, and interpretability === Adversarial machine learning AI alignment Algorithmic bias Catastrophic forgetting Differential privacy Explainable artificial intelligence Federated learning Hallucination (artificial intelligence) == Conferences and workshops == Annual Meeting of the Association for Computational Linguistics Conference on Computer Vision and Pattern Recognition Conference on Neural Information Processing Systems International Conference on Computer Vision International Conference on Learning Representations International Conference on Machine Learning == Organizations == === Research laboratories and institutions === Allen Institute for AI Alberta Machine Intelligence Institute European Laboratory for Learning and Intelligent Systems Google DeepMind Meta AI Mila Microsoft Research Vector Institute === Companies === Anthropic Cerebras Cohere DeepSeek Mistral AI OpenAI Stability AI xAI == Publications == === Books === Deep Learning – Ian Goodfellow and Yoshua Bengio Neural Networks and Deep Learning – Michael Nielsen Perceptrons – Marvin Minsky and Seymour Papert === Journals === IEEE Transactions on Neural Networks and Learning Systems Neural Networks Neural Computation == Influential persons ==

    Read more →
  • Web intelligence

    Web intelligence

    Web intelligence is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology for new products, services and frameworks that are empowered by the World Wide Web. The term was coined in a paper written by Ning Zhong, Jiming Liu Yao and Y.Y. Ohsuga in the Computer Software and Applications Conference in 2000. == Research == The research about the web intelligence covers many fields – including data mining (in particular web mining), information retrieval, pattern recognition, predictive analytics, the semantic web, web data warehousing – typically with a focus on web personalization and adaptive websites.

    Read more →
  • Something Big Is Happening

    Something Big Is Happening

    "Something Big Is Happening" is an essay by Matt Shumer, an AI entrepreneur, about the impact of artificial intelligence, published in February 2026, that has since been reportedly viewed more than 80 million times and widely discussed. Shumer noted that the technology has crossed an important threshold, where AI has become capable of creating self-improving systems. Referring to one the most recent AI models, he wrote: "It was making intelligent decisions. It had something that felt, for the first time, like judgment. Like taste." Speaking to CNBC's Power Lunch, Shumer said that his "core message" is "people in the workforce should start to use and experiment with AI tools so they can understand what’s coming". Even as the essay was widely shared and discussed, the essay also elicited criticism. Paulo Carvao, in an essay published by the Forbes Magazine stated that some of his advice is sound, but added: "It reads at times like a sales pitch. He urges readers to subscribe to the most advanced AI tools. He implies that those with access to premium models will outpace those without. He frames paid AI subscriptions as a form of insurance against obsolescence." Writing in The Guardian, Dan Milmo and Aisha Down mentioned Shumer as having a history of AI hype and stated, "He previously excited the internet by announcing the release of the world's "top open-source model", which it was not". Many workers in the technology sector criticized the article in blog posts shared on Hacker News; Edward Zitron commented that "while coding LLMs can test products, or scan/fix some bugs, this suggests they A) do this autonomously without human input, B) they do this correctly every time (or ever!)." In an article alluding to Shumer's original post, Ari Colaprete wrote "the LLM is fundamentally a writing machine, it does everything via text, and if you make it produce writing that exists purely to serve some sort of mechanical function, and you train it to succeed in that task, then it will tend to do so, even with vast intricacy."

    Read more →
  • Curve (tonality)

    Curve (tonality)

    In image editing, a curve is a remapping of image tonality, specified as a function from input level to output level, used as a way to emphasize colours or other elements in a picture. Curves can usually be applied to all channels together in an image, or to each channel individually. Applying a curve to all channels typically changes the brightness in part of the spectrum. Light parts of a picture can be easily made lighter and dark parts darker to increase contrast. Applying a curve to individual channels can be used to stress a colour. This is particularly efficient in the Lab colour space due to the separation of luminance and chromaticity, but it can also be used in RGB, CMYK or whatever other colour models the software supports.

    Read more →
  • Anomaly detection

    Anomaly detection

    In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

    Read more →
  • Cross-validation (statistics)

    Cross-validation (statistics)

    Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

    Read more →