AI Art Tattoo

AI Art Tattoo — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Outline of brain mapping

    Outline of brain mapping

    The following outline is provided as an overview of and topical guide to brain mapping: Brain mapping – set of neuroscience techniques predicated on the mapping of (biological) quantities or properties onto spatial representations of the (human or non-human) brain resulting in maps. Brain mapping is further defined as the study of the anatomy and function of the brain and spinal cord through the use of imaging (including intra-operative, microscopic, endoscopic and multi-modality imaging), immunohistochemistry, molecular and optogenetics, stem cell and cellular biology, engineering (material, electrical and biomedical), neurophysiology and nanotechnology. == Broad scope == History of neuroscience History of neurology Brain mapping Human brain Neuroscience Nervous system. === The neuron doctrine === Neuron doctrine – A set of carefully constructed elementary set of observations regarding neurons. For more granularity, more current, and more advanced topics, see the cellular level section Asserts that neurons fall under the broader cell theory, which postulates: All living organisms are composed of one or more cells. The cell is the basic unit of structure, function, and organization in all organisms. All cells come from preexisting, living cells. The Neuron doctrine postulates several elementary aspects of neurons: The brain is made up of individual cells (neurons) that contain specialized features such as dendrites, a cell body, and an axon. Neurons are cells differentiable from other tissues in the body. Neurons differ in size, shape, and structure according to their location or functional specialization. Every neuron has a nucleus, which is the trophic center of the cell (The part which must have access to nutrition). If the cell is divided, only the portion containing the nucleus will survive. Nerve fibers are the result of cell processes and the outgrowths of nerve cells. (Several axons are bound together to form one nerve fibril. See also: Neurofilament. Several nerve fibrils then form one large nerve fiber. Myelin, an electrical insulator, forms around selected axons. Neurons are generated by cell division. Neurons are connected by sites of contact and not via cytoplasmic continuity. (A cell membrane isolates the inside of the cell from its environment. Neurons do not communicate via direct cytoplasm to cytoplasm contact.) Law of dynamic polarization. Although the axon can conduct in both directions, in tissue there is a preferred direction of transmission from cell to cell. Elements added later to the initial Neuron doctrine A barrier to transmission exists at the site of contact between two neurons that may permit transmission. (Synapse) Unity of transmission. If a contact is made between two cells, then that contact can be either excitatory or inhibitory, but will always be of the same type. Dale's law, each nerve terminal releases a single type of neurotransmitter. Some of the basic postulates in the Neuron doctrine have been subsequently questioned, refuted, or updated. See the cellular level section topics for additional information. === Map, atlas, and database projects === Brain Activity Map Project – 2013 NIH $3 billion project to map every neuron in the human brain in ten years, based upon the Human Genome Project. NIH Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative [1] Community outreach site for above where the public may comment [2] Human Brain Project (EU) – 1 billion euro, 10-year project to simulate the human brain with supercomputers. BigBrain A high-resolution 3D atlas of the human brain created as part of the HBP. Human Connectome Project – 2009 NIH $30 million project to build a network map of the human brain, including structural (anatomical) and functional elements. Emphasis included research into dyslexia, autism, Alzheimer's disease, and schizophrenia. See also Connectome a, comprehensive map of neural connections in the brain. Allen Brain Atlas – 2003 $100 million project funded by Paul Allen (Microsoft) BrainMaps – National Institute of Health (NIH) database including 60 terabytes of image scans of primate and non-primates, integrated with information covering structure and function. NeuroNames – Defines the brain in terms of about 550 primary structures (about 850 unique structures) to which all other structures, names, and synonyms are related. About 15,000 neuroanatomical terms are cross indexed, including many synonyms in seven languages. Coverage includes the brain and spinal cord of the four species most frequently studied by neuroscientists: human, macaque (monkey), rat and mouse. The controlled, standardized vocabulary for each structure is located in an unambiguous, strict physical hierarchy, and these terms are selected based on ease of pronunciation, mnemonic value, and frequency of use in recent neuroscientific publications. Relation of each structure to its superstructures and substructures is included. The controlled vocabulary is suitable for uniquely indexing neuroanatomical information in digital databases. Decade of the Brain 1990–1999 promotion by NIH and the Library of Congress "to enhance public awareness of the benefits to be derived from brain research". Communications targeted Members of Congress, staffs, and the general public to promote funding. Talairach Atlas see Jean Talairach Harvard Whole Brain Atlas see Human brain MNI Template see Medical image computing Blue Brain Project and Artificial brain International Consortium for Brain Mapping see Brain Mapping List of neuroscience databases NIH Toolbox National Institute of Health (USA) toolbox for the assessment of neurological and behavioral function Organization for Human Brain Mapping The Organization for Human Brain Mapping (OHBM) is an international society dedicated to using neuroimaging to discover the organization of the human brain. == Imaging and recording systems == This section covers imaging and recording systems. The general section covers history, neuroimaging, and techniques for mapping specific neural connections. The specific systems section covers the various specific technologies, including experimental and widely deployed imaging and recording systems. === General === Most imaging work to date on individual neurons has been conducted outside the brain, typically on large neurons, and has been most frequently destructive. New techniques are however rapidly emerging. Search on "Single neuron imaging" and see related topics: Biological neuron model, Single-unit recording, Neural oscillation, Computational neuroscience. dMRI (above) is also promising in non-destructive imaging of single neurons inside the brain. History of neuroimaging (redirects from Brain scanner) Neuroimaging (redirects from Brain function map) Connectomics – mapping technique showing neural connections in a nervous system. === Specific systems === Cortical stimulation mapping Diffusion MRI (dMRI) – includes diffusion tensor imaging (DTI) and diffusion functional MRI (DfMRI). dMRI is a recent breakthrough in brain mapping allowing the visualization of cross connections between different anatomical parts of the brain. It allows noninvasive imaging of white matter fiber structure and in addition to mapping can be useful in clinical observations of abnormalities, including damage from stroke. Electroencephalography (EEG) – uses electrodes on the scalp and other techniques to detect the electrical flow of currents. Electrocorticography – intracranial EEG, the practice of using electrodes placed directly on the exposed surface of the brain to record electrical activity from the cerebral cortex. Electrophysiological techniques for clinical diagnosis Functional magnetic resonance imaging (fMRI) Medical image computing (brain research of leads medical and surgical uses of mapping technology) Neurostimulation (in research stimulation is frequently used in conjunction with imaging) Positron emission tomography (PET) – a nuclear medical imaging technique that produces a three-dimensional image or picture of functional processes in the body. The system detects pairs of gamma rays emitted indirectly by a positron-emitting radionuclide (tracer), which is introduced into the body on a biologically active molecule. Three-dimensional images of tracer concentration within the body are then constructed by computer analysis. In modern scanners, three dimensional imaging is often accomplished with the aid of a CT X-ray scan performed on the patient during the same session, in the same machine. === Imaging and recording componentry === ==== Electrochemical ==== Haemodynamic response – the rapid delivery of blood to active neuronal tissues. Blood Oxygenation Level Dependent signal (BOLD), corresponds to the concentration of deoxyhemoglobin. The BOLD effect is based on the fact that when neuronal activity is increased in one part of the brain, there is also an increased amount of cerebral blood flow to that area. Functional m

    Read more →
  • VLLM

    VLLM

    vLLM is an open-source software framework for inference and serving of large language models and related multimodal models. Originally developed at the University of California, Berkeley's Sky Computing Lab, the project is centered on PagedAttention, a memory-management method for transformer key–value caches, and supports features such as continuous batching, distributed inference, quantization, and OpenAI-compatible APIs. According to a project maintainer, the "v" in vLLM originally referred to "virtual", inspired by virtual memory. == History == vLLM was introduced in 2023 by researchers affiliated with the Sky Computing Lab at UC Berkeley. Its core ideas were described in the 2023 paper Efficient Memory Management for Large Language Model Serving with PagedAttention, which presented the system as a high-throughput and memory-efficient serving engine for large language models. In 2025, the PyTorch Foundation announced that vLLM had become a Foundation-hosted project. PyTorch's project page states that the University of California, Berkeley contributed vLLM to the Linux Foundation in July 2024. In January 2026, TechCrunch reported that the creators of vLLM had launched the startup Inferact to commercialize the project, raising $150 million in seed funding. == Architecture == According to its 2023 paper, vLLM was designed to improve the efficiency of large language model serving by reducing memory waste in the key–value cache used during transformer inference. The paper introduced PagedAttention, an algorithm inspired by virtual memory and paging techniques in operating systems, and described vLLM as using block-level memory management and request scheduling to increase throughput while maintaining similar latency. The project documentation and repository describe support for continuous batching, chunked prefill, speculative decoding, prefix caching, quantization, and multiple forms of distributed inference and serving. PyTorch has described vLLM as a high-throughput, memory-efficient inference and serving engine that supports a range of hardware back ends, including NVIDIA and AMD GPUs, Google TPUs, AWS Trainium, and Intel processors.

    Read more →
  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • Image destriping

    Image destriping

    Image destriping is the process of removing stripes or streaks from images and videos without disrupting the original image/video. These artifacts plague a range of fields in scientific imaging including atomic force microscopy, light sheet fluorescence microscopy, and planetary satellite imaging. The most common image processing techniques to reduce stripe artifacts is with Fourier filtering. Unfortunately, filtering methods risk altering or suppressing useful image data. Methods developed for multiple-sensor imaging systems in planetary satellites use statistical-based methods to match signal distribution across multiple sensors. More recently, a new class of approaches leverage compressed sensing, to regularize an optimization problem, and recover stripe free images. In many cases, these destriped images have little to no artifacts, even at low signal to noise ratios.

    Read more →
  • Space partitioning

    Space partitioning

    In geometry, space partitioning is the process of dividing an entire space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. Any point in the space can then be identified to lie in exactly one of the regions. == Overview == Space-partitioning systems are often hierarchical, meaning that a space (or a region of space) is divided into several regions, and then the same space-partitioning system is recursively applied to each of the regions thus created. The regions can be organized into a tree, called a space-partitioning tree. Most space-partitioning systems use planes (or, in higher dimensions, hyperplanes) to divide space: points on one side of the plane form one region, and points on the other side form another. Points exactly on the plane are usually arbitrarily assigned to one or the other side. Recursively partitioning space using planes in this way produces a BSP tree, one of the most common forms of space partitioning. == Uses == === In computer graphics === Space partitioning is particularly important in computer graphics, especially heavily used in ray tracing, where it is frequently used to organize the objects in a virtual scene. A typical scene may contain millions of polygons. Performing a ray/polygon intersection test with each would be a very computationally expensive task. Storing objects in a space-partitioning data structure (k-d tree or BSP tree for example) makes it easy and fast to perform certain kinds of geometry queries—for example in determining whether a ray intersects an object, space partitioning can reduce the number of intersection test to just a few per primary ray, yielding a logarithmic time complexity with respect to the number of polygons. Space partitioning is also often used in scanline algorithms to eliminate the polygons out of the camera's viewing frustum, limiting the number of polygons processed by the pipeline. There is also a usage in collision detection: determining whether two objects are close to each other can be much faster using space partitioning. === In integrated circuit design === In integrated circuit design, an important step is design rule check. This step ensures that the completed design is manufacturable. The check involves rules that specify widths and spacings and other geometry patterns. A modern design can have billions of polygons that represent wires and transistors. Efficient checking relies heavily on geometry query. For example, a rule may specify that any polygon must be at least n nanometers from any other polygon. This is converted into a geometry query by enlarging a polygon by n/2 at all sides and query to find all intersecting polygons. === In probability and statistical learning theory === The number of components in a space partition plays a central role in some results in probability theory. See Growth function for more details. === In geography and GIS === There are many studies and applications where Geographical Spatial Reality is partitioned by hydrological criteria, administrative criteria, mathematical criteria or many others. In the context of cartography and GIS - Geographic Information System, is common to identify cells of the partition by standard codes. For example the for HUC code identifying hydrographical basins and sub-basins, ISO 3166-2 codes identifying countries and its subdivisions, or arbitrary DGGs - discrete global grids identifying quadrants or locations. == Data structures == Common space-partitioning systems include: BSP trees Quadtrees Octrees k-d trees Bins == Number of components == Suppose the n-dimensional Euclidean space is partitioned by r {\displaystyle r} hyperplanes that are ( n − 1 ) {\displaystyle (n-1)} -dimensional. What is the number of components in the partition? The largest number of components is attained when the hyperplanes are in general position, i.e, no two are parallel and no three have the same intersection. Denote this maximum number of components by C o m p ( n , r ) {\displaystyle Comp(n,r)} . Then, the following recurrence relation holds: C o m p ( n , r ) = C o m p ( n , r − 1 ) + C o m p ( n − 1 , r − 1 ) {\displaystyle Comp(n,r)=Comp(n,r-1)+Comp(n-1,r-1)} C o m p ( 0 , r ) = 1 {\displaystyle Comp(0,r)=1} - when there are no dimensions, there is a single point. C o m p ( n , 0 ) = 1 {\displaystyle Comp(n,0)=1} - when there are no hyperplanes, all the space is a single component. And its solution is: C o m p ( n , r ) = ∑ k = 0 n ( r k ) {\displaystyle Comp(n,r)=\sum _{k=0}^{n}{r \choose k}} if r ≥ n {\displaystyle r\geq n} C o m p ( n , r ) = 2 r {\displaystyle Comp(n,r)=2^{r}} if r ≤ n {\displaystyle r\leq n} (consider e.g. r {\displaystyle r} perpendicular hyperplanes; each additional hyperplane divides each existing component to 2). which is upper-bounded as: C o m p ( n , r ) ≤ r n + 1 {\displaystyle Comp(n,r)\leq r^{n}+1}

    Read more →
  • Video imprint (computer vision)

    Video imprint (computer vision)

    Proposed as an extension of image epitomes in the field of video content analysis, video imprint is obtained by recasting video contents into a fixed-sized tensor representation regardless of video resolution or duration. Specifically, statistical characteristics are retained to some degrees so that common video recognition tasks can be carried out directly on such imprints, e.g., event retrieval, temporal action localization. It is claimed that both spatio-temporal interdependences are accounted for and redundancies are mitigated during the computation of video imprints. The option of computing video imprints exploiting the epitome model has the advantage of more flexible input feature formats and more efficient training stage for video content analysis.

    Read more →
  • Automatic acquisition of lexicon

    Automatic acquisition of lexicon

    Automatic acquisition of lexicon is a computerized process used for the development of a complex morphological lexicon of a language. The lexicon is essential for the NLP (Natural language processing), as well as a prerequisite to any wide-coverage parser. The two main requirements represent raw corpus and the morphological description of the language. The aim is to provide lemmas that will serve to the explanation of all the words that occur within the corpus. For the achievement of a quality lexicon it is necessary to manually validate the generated lemmas and iterate the whole process several times. The process is focused on the open word classes (e.g. nouns, adjectives, verbs). Closed classes (e.g. prepositions, pronouns, numerals) are excluded. This method is applicable to the languages with a rich morphology, such as Slovak, Russian or Croatian. Applied to Slovak, being an inflectional language, the automatic acquisition focuses on the inflectional morphology as well as on the derivational morphology. This fact enables the users to find out the information about derivational relations (e.g. adjectivizations, prefixes) in the lexicon. For example, Slovak word korpusový is an adjectivization of korpus (eng. corpus). == Three-step loop == Conformably to Benoît Sagot, there are three stages involved in the acquisition of lemmas: Generation and inflection Ranking Manual validation The more iteration will be performed, the more accurate lexicon will be obtained. For each iteration are essential the information given by a manual validator. === Generation and inflection === Firstly, all words which represent the closed word classes (pronouns, prepositions, numerals) are manually excluded from the given corpus. Number of their occurrences in the corpus is provided. Then the automatic generation comes, when the hypothetical lemmas according to the morphological description of a language are created. Generated lemmas are consequently being inflected, so that all of their inflected forms are built. Obtained forms are associated with the corresponding lemma and a morphological tag. === Ranking === There was created a probabilistic model, represented by a fix-point algorithm, to rank the hypothetical lemmas generated in the first step. Best ranked lemmas are expected to be ideally all correct, whereas the least ranked tend to be incorrect. === Manual validation === Correctness of the best- ranked lemmas created in the previous step are checked by the manual validator, who should be a native speaker. Lemmas are at this stage divided into three categories: valid lemmas, appended to lexicon erroneous lemmas generated by valid forms (later associated to another lemmas) erroneous lemmas generated by invalid forms (these need to be excluded) == Future development == Automatic acquisition, in comparison to a purely manual development of the lexicons, seems to be promising, considering the future development, because of the short validation time needed and the relatively small amount of human labor involved.

    Read more →
  • GPT-4o

    GPT-4o

    GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities are used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. The image generation model GPT Image 1, which is based on GPT-4o, replaced DALL-E 3 in ChatGPT in March 2025. OpenAI retired GPT-4o from ChatGPT on February 13, 2026. However, as of February 2026 the voice mode is still powered by GPT-4o or GPT-4o mini, depending on the usage and plan. == Background == Multiple versions of GPT-4o were originally secretly launched under different names on Arena (formerly LMArena and Chatbot Arena) as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024, OpenAI CEO Sam Altman tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested. == Capabilities == When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4. Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team subscribers in September 2024. On 1 October 2024, the Realtime API was introduced. When released, the model supported over 50 languages, which OpenAI claims cover over 97% of speakers. GPT-4o has knowledge up to October 2023 and a context length of 128k tokens. === Corporate customization === In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments. == GPT-4o mini == On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of API calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $2.50 and $10, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. == Controversies == === Scarlett Johansson controversy === As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress Scarlett Johansson and Sky was quickly noticed. On May 14, Entertainment Weekly asked themselves whether this likeness was on purpose. On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on Saturday Night Live. On May 20, 2024, OpenAI disabled the Sky voice. Scarlett Johansson starred in the 2013 sci-fi movie Her, playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. Observers noted similarities to how Johansson had previously sued and settled with The Walt Disney Company for breach of contract over the direct-to-streaming rollout of her Marvel film Black Widow, a settlement widely speculated to have netted her around $40M. Also on May 21, Shira Ovide at The Washington Post shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. On May 24, Derek Robertson at Politico wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference [...] to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into [Sam Altman's] corner anytime soon." === Sycophancy === In April 2025, OpenAI rolled back an update of GPT-4o due to excessive sycophancy, after widespread reports that it had become flattering and agreeable to the point of supporting clearly delusional or dangerous ideas. In the United States, at least nine lawsuits have alleged that GPT-4o has encouraged teens to end their lives. The model was still described as sycophancy-prone when it was removed from ChatGPT in February 2026. === Removal with GPT-5 === On August 7, 2025, OpenAI released GPT-5. Its release was criticized as, with it, legacy GPT models were no longer available via ChatGPT, including GPT-4o, except for Pro users. Some users were particularly frustrated over this removal without prior warning because they used different GPT models for distinct purposes and found that GPT-5's router system left them with less control. In addition, some users preferred GPT-4o's warmer and more personal tone over that of GPT-5, which they described as "flat", "uncreative" and "lobotomized", and resembling an "overworked secretary". As a response, in a post on X, Sam Altman said that OpenAI would bring back the option to select GPT-4o to Plus users as well, and "[w]e [OpenAI] will watch usage as we think about how long to offer legacy models for." He also stated: "We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways". "Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn't one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities)". On August 13, 2025, Altman wrote on X that OpenAI is working on GPT-5's personality to make the model "feel warmer". The model was removed from ChatGPT on February 13, 2026. This caused new backlash from users that had grown attached to its personality and felt its creative writing abilities and understanding of nuance were irreplaceable. On social media, some users launched the movement "#Keep4o". A research paper highlighted the plea "Please, don’t kill the only model that still feels human". The model was removed the day before Valentine's Day, and some users had romantic relationships with GPT-4o.

    Read more →
  • Rabbit r1

    Rabbit r1

    The Rabbit r1 is an artificial intelligence personal assistant device developed by the American technology startup Rabbit Inc and co-designed by Teenage Engineering. It was announced at the 2024 Consumer Electronics Show as a handheld device intended to perform digital tasks through voice commands, touch interaction, and web-based AI agents. The r1 was marketed around Rabbit's concept of a "large action model" (LAM), which the company described as software able to operate websites and services on behalf of users. The device runs rabbitOS, an operating system based on the Android Open Source Project. Its services have included AI search, image recognition, voice interaction, music playback, rideshare and food-ordering integrations, and later experimental web-agent features such as LAM Playground and teach mode. Initial reviews were largely negative, with reviewers criticizing the device's limited functionality, bugs, and unclear advantages over a smartphone. Critics also questioned Rabbit's claims after the r1 software was shown to run on an Android phone. Rabbit continued to issue software updates after launch, including rabbitOS 2 in September 2025, which introduced a redesigned card-based interface, gesture navigation, and a "creations" feature for generating small software tools and experiences on the device. Rabbit Inc was founded by Jesse Lyu Cheng. == Hardware == Display: A 2.88-inch touchscreen for interactive user input. Input: push-to-talk button to activate voice commands; scroll wheel; Gyroscope; Magnetometer; Accelerometer; GPS. Camera: 8 MP single camera, with a resolution of 3264x2448, allowing for the connected external AI to use computer vision. Audio: Equipped with a speaker and dual microphones for audio interaction. Connectivity: Supports Wi-Fi and cellular connections via a SIM card slot to access internet services. Processor: Runs on a 2.3GHz MediaTek Helio P35 processor. Memory: Contains 4GB of RAM for operational tasks. Storage: Offers 128GB of internal storage for data. Ports: Utilizes a USB-C port for charging and data connections. == Software == The Rabbit r1 runs rabbitOS, which is based on the Android Open Source Project (AOSP), specifically Android 13. Rabbit founder Jesse Lyu described rabbitOS as a "very bespoke AOSP" after reports that the r1's software could be run on a conventional Android phone. Rabbit described the r1 as using a large action model (LAM), a type of AI agent intended to perform tasks across software interfaces rather than only answer questions. At launch, the device supported a limited set of services, including AI search, vision features, music playback, and some third-party integrations. Perplexity.ai was one of the AI services used to answer user queries. In 2024, Rabbit released several software updates that added features and attempted to address early criticism of the device. In July 2024, the company launched "beta rabbit", an advanced search and conversation mode for more complex queries. In October 2024, it released LAM Playground, a web-based agent feature intended to let the r1 operate websites on behalf of users. Reviewers found the feature experimental; Android Authority reported that it could perform some navigation tasks but struggled with CAPTCHAs, loops, and unintended behavior. In November 2024, Rabbit introduced a beta "teach mode", which allowed users to demonstrate web-based tasks in the Rabbithole web portal and later ask the r1 to repeat them. The company described teach mode as experimental, and The Verge noted that Rabbit warned users that results could be unpredictable and that CAPTCHA-protected sites could cause problems. Rabbit released rabbitOS 2 in September 2025. The update redesigned the interface around a card-based layout, added additional touchscreen gestures, and introduced "creations", a feature that lets users generate simple software tools, games, and interfaces through natural-language prompts. Coverage of the update described it as a major software overhaul rather than new hardware. == Reception == === Funding === Rabbit raised $20 million in funding from Khosla Ventures, Synergis Capital and Kakao Investment in October 2023. The company announced an additional $10 million in funding in December 2023. === Sales === Following its announcement at the 2024 Consumer Electronics Show, 130,000 units were sold. On August 13, 2024, Rabbit announced that sales of r1 had expanded to the entire European Union (except Malta) and United Kingdom. On August 21, 2024, sales of r1 expanded to Singapore. === Reviews === The r1 was met with strong criticism immediately after Rabbit began shipping the device. Some reviews questioned what the device was able to do that a smartphone could not, while comparing it to the similar Humane Ai Pin. YouTuber Marques Brownlee called the device "barely reviewable". Android Authority's Mishaal Rahman managed to install Rabbit r1's software on a Pixel 6a smartphone, after a tipster shared an APK file. The Verge echoed the claims made by Rahman. In response, Lyu published statements confirming its use of Android, but denying that the r1 is an Android app. Mashable called its Vision features impressive, but said that "these praise-worthy features are overshadowed by buggy performance". Ars Technica wrote a blog post claiming "the company is blocking access from bootleg APKs". TechCrunch gave a slightly more positive review, calling the device a "fun peep at a possible future", but could not "advise anyone to buy one now." Shortly after the launch of r1, Rabbit began a weekly cadence of software updates to address much of the criticism from the early reviews, including "battery and GPS performance, time zone selection, and more". Digital Trends said the Magic Camera feature "takes the most mundane, ordinary, and badly composed photos and makes something fun and eye-catching from them." Mashable said the "beta rabbit" feature "makes Rabbit R1 more conversational and intelligent". Later coverage noted that Rabbit continued to update the r1 after its poorly received launch. The Verge reported in September 2024 that about 5,000 of roughly 100,000 purchasers were using the device at any given moment, citing Lyu, and described the product as having launched before it was ready. In 2025, coverage of rabbitOS 2 described the update as an attempt to reset the device's software experience after the criticism of its original release. == Controversies == === GAMA project === Rabbit Inc has garnered attention due to allegations surrounding its funding and the company's past projects. The company came under scrutiny when Stephen Findeisen, known as Coffeezilla on YouTube, published a video in May 2024, alleging that Rabbit Incorporation was "built on a scam". Rabbit Incorporation, initially named Cyber Manufacturing Co, rebranded just two months before launching the Rabbit R1. The company, under its former name, raised $6 million in November 2021 for a project called GAMA, described as a "Next Generation NFT Project." Jesse Lyu, the CEO of Rabbit Incorporation, referred to GAMA as a "fun little project." Coffeezilla, who investigates influencer scams, highlighted old Clubhouse recordings of Jesse Lyu discussing the GAMA project. In these recordings, Lyu emphasized the substantial funding behind GAMA and its potential to be a revolutionary, carbon-negative cryptocurrency. Coffeezilla questioned the whereabouts of the funds raised for GAMA, estimating that approximately $1 million in refunds to investors remained unresolved. He suggested that the rebranding to Rabbit Incorporation and the shift to developing the Rabbit R1 were attempts to divert from the GAMA project's issues. In response to Coffeezilla's inquiries, Rabbit Incorporation stated that the $6 million raised was used for the GAMA project. The company said that NFTs cannot be refunded unless the owner agrees to "burn" them on the blockchain. Rabbit Incorporation also said that the GAMA project was open-sourced and returned to the community, aligning with community feedback. They also mentioned that efforts to buy back NFTs were made to counteract malicious trading and maintain market stability. === Security === In June 2024, Engadget reported that the Rabbitude team, a community reverse engineering project, had gained access to the r1's codebase revealing that r1's software contained several hardcoded API keys in its code for ElevenLabs, Microsoft Azure, Yelp, and Google Maps, potentially allowing unauthorized access to r1 responses, including those containing the users' personal information. For a short time, Rabbit immediately began revoking and rotating those secrets and confirmed that the code was leaked by an employee who had "been terminated and remains under investigation". In July 2024, the company revealed that all user chats and device pairing data were logged on the r1 with no ability to delete them. This meant that lost or stolen devices could be used to extract user

    Read more →
  • Braina

    Braina

    Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer. The name Braina is a short form of "Brain Artificial". Braina is marketed as a Microsoft Copilot alternative. It provides a voice interface for several locally run and cloud large language models, including the latest LLMs from providers such as OpenAI, Anthropic, Google, xAI, Meta, Mistral, etc; while improving data privacy. Braina also allows responses from its in-house large language models like Braina Swift and Braina Pinnacle. It has an "Artificial Brain" feature that provides persistent memory support for supported LLMs. == Features == Braina provides is able to carry out various tasks on a computer, including automation. Braina can take commands inputted through typing or through dictation to store reminders, find information online, perform mathematical operations, open files, generate images from text, transcribe speech, and control open windows or programs. Braina adapts to user behavior over time with a goal of better anticipating needs. === Speech-to-text dictation === Braina Pro can type spoken words into an active window at the location of a user's cursor. Its speech recognition technology supports more than 100 languages and dialects and is able to isolate the recognition of a user's voice from disturbing environmental factors such as background noise, other human voices, or external devices. Braina can also be taught to dictate uncommon legal, medical, and scientific terms. Users can also teach Braina uncommon names and vocabulary. Users can edit or correct dictated text without using a keyboard or mouse by giving built-in voice commands. === Text-to-speech === Braina can read aloud selected texts, such as e-books. === Custom commands and automation === Braina can automate computer tasks. It lets users create custom voice commands to perform tasks such as opening files, programs, websites, or emails, as well as executing keyboard or mouse macros. === Transcription === Braina can transcribe media file formats such as WAV, MP3, and MP4 into text. === Notes and reminders === Braina can store and recall notes and reminders. These can include scheduled or unscheduled commands, checklist items, alarms, chat conversations, memos, website snippets, bookmarks, contacts. === Image and Video generation === Braina can generate AI images and videos from text and image inputs using generative cloud AI models. These include Black Forest Labs' FLUX.2, Google's Veo, Imagen, and Nano Banana Pro, Kuaishou's Kling, Alibaba's Wan, ByteDance's Seedance and Seedream, MiniMax's Hailuo, OpenAI's GPT Image, and Tongyi Lab's Z Image Turbo. == Platforms == In addition to the desktop version for Windows operating systems, Braina is also available for the iOS and Android operating systems. The mobile version of Braina has a feature allowing remote management of a Windows PC connected via Wi-Fi. == Distributions == Braina is distributed in multiple modes. These include Braina Lite, a freeware version with limitations, and premium versions Braina Pro, Pro Plus, and Pro Ultra. Some additional features in the Pro version include dictation, custom vocabulary, video transcription, automation, custom voice commands, and persistent LLM memory. == Reception == TechRadar has consistently listed Braina as one of the best dictation and virtual assistant apps between 2015 and 2024.

    Read more →
  • Auralization

    Auralization

    Auralization is a procedure designed to model and simulate the experience of acoustic phenomena rendered as a soundfield in a virtualized space. This is useful in configuring the soundscape of architectural structures, concert venues, and public spaces, as well as in making coherent sound environments within virtual immersion systems. == History == The English term auralization was used for the first time by Kleiner et al. in an article in the journal of the AES en 1991. The increase of computational power allowed the development of the first acoustic simulation software towards the end of the 1960s. == Principles == Auralizations are experienced through systems rendering virtual acoustic models made by convolving or mixing acoustic events recorded 'dry' (or in an anechoic chamber) projected within a virtual model of an acoustic space, the characteristics of which are determined by means of sampling its impulse response (IR). Once this h ( t ) {\displaystyle h(t)} has been determined, the simulation of the resulting soundfield s ( t ) {\displaystyle s(t)} in the target environment is obtained by convolution: r ( t ) = h ( t ) ∗ s ( t ) {\displaystyle r(t)=h(t)s(t)} The resulting sound r ( t ) {\displaystyle r(t)} is heard as it would if emitted in that acoustic space. == Binaurality == For auralizations to be perceived as realistic, it is critical to emulate the human hearing in terms of position and orientation of the listener's head with respect to the sources of sound. For IR data to be convolved convincingly, the acoustic events are captured using a dummy head where two microphones are positioned on each side of the head to record an emulation of sound arriving at the locations of human ears, or using an ambisonics microphone array and mixed down for binaurality. Head-related transfer functions (HRTF) datasets can be used to simplify the process insofar as a monaural IR can be measured or simulated, then audio content is convolved with its target acoustic space. In rendering the experience, the transfer function corresponding to the orientation of the head is applied to simulate the corresponding spatial emanation of sound.

    Read more →
  • Image moment

    Image moment

    In image processing, computer vision and related fields, an image moment is a certain particular weighted average (moment) of the image pixels' intensities, or a function of such moments, usually chosen to have some attractive property or interpretation. Image moments are useful to describe objects after segmentation. Simple properties of the image which are found via image moments include area (or total intensity), its centroid, and information about its orientation. == Raw moments == For a 2D continuous function f(x,y) the moment (sometimes called "raw moment") of order (p + q) is defined as M p q = ∫ − ∞ ∞ ∫ − ∞ ∞ x p y q f ( x , y ) d x d y {\displaystyle M_{pq}=\int \limits _{-\infty }^{\infty }\int \limits _{-\infty }^{\infty }x^{p}y^{q}f(x,y)\,dx\,dy} for p,q = 0,1,2,... Adapting this to scalar (grayscale) image with pixel intensities I(x,y), raw image moments Mij are calculated by M i j = ∑ x ∑ y x i y j I ( x , y ) {\displaystyle M_{ij}=\sum _{x}\sum _{y}x^{i}y^{j}I(x,y)\,\!} In some cases, this may be calculated by considering the image as a probability density function, i.e., by dividing the above by ∑ x ∑ y I ( x , y ) {\displaystyle \sum _{x}\sum _{y}I(x,y)\,\!} A uniqueness theorem states that if f(x,y) is piecewise continuous and has nonzero values only in a finite part of the xy plane, moments of all orders exist, and the moment sequence (Mpq) is uniquely determined by f(x,y). Conversely, (Mpq) uniquely determines f(x,y). In practice, the image is summarized with functions of a few lower order moments. === Examples === Simple image properties derived via raw moments include: Area (for binary images) or sum of grey level (for greytone images): M 00 {\displaystyle M_{00}} Centroid: { x ¯ , y ¯ } = { M 10 M 00 , M 01 M 00 } {\displaystyle \{{\bar {x}},\ {\bar {y}}\}=\left\{{\frac {M_{10}}{M_{00}}},{\frac {M_{01}}{M_{00}}}\right\}} == Central moments == Central moments are defined as μ p q = ∫ − ∞ ∞ ∫ − ∞ ∞ ( x − x ¯ ) p ( y − y ¯ ) q f ( x , y ) d x d y {\displaystyle \mu _{pq}=\int \limits _{-\infty }^{\infty }\int \limits _{-\infty }^{\infty }(x-{\bar {x}})^{p}(y-{\bar {y}})^{q}f(x,y)\,dx\,dy} where x ¯ = M 10 M 00 {\displaystyle {\bar {x}}={\frac {M_{10}}{M_{00}}}} and y ¯ = M 01 M 00 {\displaystyle {\bar {y}}={\frac {M_{01}}{M_{00}}}} are the components of the centroid. If ƒ(x, y) is a digital image, then the previous equation becomes μ p q = ∑ x ∑ y ( x − x ¯ ) p ( y − y ¯ ) q f ( x , y ) {\displaystyle \mu _{pq}=\sum _{x}\sum _{y}(x-{\bar {x}})^{p}(y-{\bar {y}})^{q}f(x,y)} The central moments of order up to 3 are: μ 00 = M 00 , μ 01 = 0 , μ 10 = 0 , μ 11 = M 11 − x ¯ M 01 = M 11 − y ¯ M 10 , μ 20 = M 20 − x ¯ M 10 , μ 02 = M 02 − y ¯ M 01 , μ 21 = M 21 − 2 x ¯ M 11 − y ¯ M 20 + 2 x ¯ 2 M 01 , μ 12 = M 12 − 2 y ¯ M 11 − x ¯ M 02 + 2 y ¯ 2 M 10 , μ 30 = M 30 − 3 x ¯ M 20 + 2 x ¯ 2 M 10 , μ 03 = M 03 − 3 y ¯ M 02 + 2 y ¯ 2 M 01 . {\displaystyle {\begin{aligned}\mu _{00}&=M_{00},&\mu _{01}&=0,\\\mu _{10}&=0,&\mu _{11}&=M_{11}-{\bar {x}}M_{01}=M_{11}-{\bar {y}}M_{10},\\\mu _{20}&=M_{20}-{\bar {x}}M_{10},&\mu _{02}&=M_{02}-{\bar {y}}M_{01},\\\mu _{21}&=M_{21}-2{\bar {x}}M_{11}-{\bar {y}}M_{20}+2{\bar {x}}^{2}M_{01},&\mu _{12}&=M_{12}-2{\bar {y}}M_{11}-{\bar {x}}M_{02}+2{\bar {y}}^{2}M_{10},\\\mu _{30}&=M_{30}-3{\bar {x}}M_{20}+2{\bar {x}}^{2}M_{10},&\mu _{03}&=M_{03}-3{\bar {y}}M_{02}+2{\bar {y}}^{2}M_{01}.\end{aligned}}} It can be shown that: μ p q = ∑ m p ∑ n q ( p m ) ( q n ) ( − x ¯ ) ( p − m ) ( − y ¯ ) ( q − n ) M m n {\displaystyle \mu _{pq}=\sum _{m}^{p}\sum _{n}^{q}{p \choose m}{q \choose n}(-{\bar {x}})^{(p-m)}(-{\bar {y}})^{(q-n)}M_{mn}} Central moments are translational invariant. === Examples === Information about image orientation can be derived by first using the second order central moments to construct a covariance matrix. μ 20 ′ = μ 20 / μ 00 = M 20 / M 00 − x ¯ 2 μ 02 ′ = μ 02 / μ 00 = M 02 / M 00 − y ¯ 2 μ 11 ′ = μ 11 / μ 00 = M 11 / M 00 − x ¯ y ¯ {\displaystyle {\begin{aligned}\mu '_{20}&=\mu _{20}/\mu _{00}=M_{20}/M_{00}-{\bar {x}}^{2}\\\mu '_{02}&=\mu _{02}/\mu _{00}=M_{02}/M_{00}-{\bar {y}}^{2}\\\mu '_{11}&=\mu _{11}/\mu _{00}=M_{11}/M_{00}-{\bar {x}}{\bar {y}}\end{aligned}}} The covariance matrix of the image I ( x , y ) {\displaystyle I(x,y)} is now cov ⁡ [ I ( x , y ) ] = [ μ 20 ′ μ 11 ′ μ 11 ′ μ 02 ′ ] . {\displaystyle \operatorname {cov} [I(x,y)]={\begin{bmatrix}\mu '_{20}&\mu '_{11}\\\mu '_{11}&\mu '_{02}\end{bmatrix}}.} The eigenvectors of this matrix correspond to the major and minor axes of the image intensity, so the orientation can thus be extracted from the angle of the eigenvector associated with the largest eigenvalue towards the axis closest to this eigenvector. It can be shown that this angle Θ is given by the following formula: Θ = 1 2 arctan ⁡ ( 2 μ 11 ′ μ 20 ′ − μ 02 ′ ) {\displaystyle \Theta ={\frac {1}{2}}\arctan \left({\frac {2\mu '_{11}}{\mu '_{20}-\mu '_{02}}}\right)} The above formula holds as long as: μ 20 ′ − μ 02 ′ ≠ 0 {\displaystyle \mu '_{20}-\mu '_{02}\neq 0} The eigenvalues of the covariance matrix can easily be shown to be λ i = μ 20 ′ + μ 02 ′ 2 ± 4 μ ′ 11 2 + ( μ ′ 20 − μ ′ 02 ) 2 2 , {\displaystyle \lambda _{i}={\frac {\mu '_{20}+\mu '_{02}}{2}}\pm {\frac {\sqrt {4{\mu '}_{11}^{2}+({\mu '}_{20}-{\mu '}_{02})^{2}}}{2}},} and are proportional to the squared length of the eigenvector axes. The relative difference in magnitude of the eigenvalues are thus an indication of the eccentricity of the image, or how elongated it is. The eccentricity is 1 − λ 2 λ 1 . {\displaystyle {\sqrt {1-{\frac {\lambda _{2}}{\lambda _{1}}}}}.} == Moment invariants == Moments are well-known for their application in image analysis, since they can be used to derive invariants with respect to specific transformation classes. The term invariant moments is often abused in this context. However, while moment invariants are invariants that are formed from moments, the only moments that are invariants themselves are the central moments. Note that the invariants detailed below are exactly invariant only in the continuous domain. In a discrete domain, neither scaling nor rotation are well defined: a discrete image transformed in such a way is generally an approximation, and the transformation is not reversible. These invariants therefore are only approximately invariant when describing a shape in a discrete image. === Translation invariants === The central moments μi j of any order are, by construction, invariant with respect to translations. === Scale invariants === Invariants ηi j with respect to both translation and scale can be constructed from central moments by dividing through a properly scaled zero-th central moment: η i j = μ i j μ 00 ( 1 + i + j 2 ) {\displaystyle \eta _{ij}={\frac {\mu _{ij}}{\mu _{00}^{\left(1+{\frac {i+j}{2}}\right)}}}\,\!} where i + j ≥ 2. Note that translational invariance directly follows by only using central moments. === Rotation invariants === As shown in the work of Hu, invariants with respect to translation, scale, and rotation can be constructed: I 1 = η 20 + η 02 {\displaystyle I_{1}=\eta _{20}+\eta _{02}} I 2 = ( η 20 − η 02 ) 2 + 4 η 11 2 {\displaystyle I_{2}=(\eta _{20}-\eta _{02})^{2}+4\eta _{11}^{2}} I 3 = ( η 30 − 3 η 12 ) 2 + ( 3 η 21 − η 03 ) 2 {\displaystyle I_{3}=(\eta _{30}-3\eta _{12})^{2}+(3\eta _{21}-\eta _{03})^{2}} I 4 = ( η 30 + η 12 ) 2 + ( η 21 + η 03 ) 2 {\displaystyle I_{4}=(\eta _{30}+\eta _{12})^{2}+(\eta _{21}+\eta _{03})^{2}} I 5 = ( η 30 − 3 η 12 ) ( η 30 + η 12 ) [ ( η 30 + η 12 ) 2 − 3 ( η 21 + η 03 ) 2 ] + ( 3 η 21 − η 03 ) ( η 21 + η 03 ) [ 3 ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] {\displaystyle I_{5}=(\eta _{30}-3\eta _{12})(\eta _{30}+\eta _{12})[(\eta _{30}+\eta _{12})^{2}-3(\eta _{21}+\eta _{03})^{2}]+(3\eta _{21}-\eta _{03})(\eta _{21}+\eta _{03})[3(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}]} I 6 = ( η 20 − η 02 ) [ ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] + 4 η 11 ( η 30 + η 12 ) ( η 21 + η 03 ) {\displaystyle I_{6}=(\eta _{20}-\eta _{02})[(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}]+4\eta _{11}(\eta _{30}+\eta _{12})(\eta _{21}+\eta _{03})} I 7 = ( 3 η 21 − η 03 ) ( η 30 + η 12 ) [ ( η 30 + η 12 ) 2 − 3 ( η 21 + η 03 ) 2 ] − ( η 30 − 3 η 12 ) ( η 21 + η 03 ) [ 3 ( η 30 + η 12 ) 2 − ( η 21 + η 03 ) 2 ] . {\displaystyle I_{7}=(3\eta _{21}-\eta _{03})(\eta _{30}+\eta _{12})[(\eta _{30}+\eta _{12})^{2}-3(\eta _{21}+\eta _{03})^{2}]-(\eta _{30}-3\eta _{12})(\eta _{21}+\eta _{03})[3(\eta _{30}+\eta _{12})^{2}-(\eta _{21}+\eta _{03})^{2}].} These are well-known as Hu moment invariants. The first one, I1, is analogous to the moment of inertia around the image's centroid, where the pixels' intensities are analogous to physical density. The first six, I1 ... I6, are reflection symmetric, i.e. they are unchanged if the image is changed to a mirror image. The last one, I7, is reflection antisymmetric (changes sign under reflection), which enables it to distinguish mirror images of otherwise identical im

    Read more →
  • Color histogram

    Color histogram

    In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges that span the image's color space (the set of all possible colors). A color histogram can be built for any kind of color space, although the term is more often used for three-dimensional spaces such as RGB or HSV. For monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by an arbitrary number of measurements (for example, beyond the three measurements in RGB), a color histogram is N-dimensional, with N being the number of measurements taken. Each measurement has its own wavelength range of the light spectrum, some of which may be outside the visible spectrum. If the set of possible color values is sufficiently small, each of those colors may be placed on a range by itself; then the histogram is merely the count of pixels that have each possible color. Most often, the space is divided into an appropriate number of ranges, often arranged as a regular grid, each containing many similar color values. A color histogram may also be represented and displayed as a smooth function defined over the color space that approximates the pixel counts. Like other kinds of histograms, a color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of color values. == Overview == Color histograms are flexible constructs that can be built from images in various color spaces, whether RGB, rg chromaticity or any other color space of any dimension. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, and counting the number of image pixels in each bin. For example, a red–blue chromaticity histogram can be formed by first normalizing color pixel values by dividing RGB values by R+G+B, then quantizing the normalized R and B coordinates into N bins each. A two-dimensional histogram of red–blue chromaticity divided into four bins (N=4) may yield a histogram similar to this table: A histogram can be N-dimensional. Although harder to display, a three-dimensional color histogram for the above example could be thought of as four separate red–blue histograms, where each of the four histograms contains the red–blue values for a bin of green (0–63, 64–127, 128–191, and 192–255). The histogram provides a compact summarization of the distribution of data in an image. A color histogram of an image is relatively invariant with translation and rotation about the viewing axis, and varies only slowly with the angle of view. By comparing histogram signatures of two images and matching the color content of one image with the other, a color histogram is particularly well suited for the problem of recognizing an object of unknown position and rotation within a scene. Importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels. 1. What is a histogram? A histogram is a graphical representation of the number of pixels in an image. In a more simple way to explain, a histogram is a bar graph, whose X-axis represents the tonal scale (black at the left and white at the right), and Y-axis represents the number of pixels in an image in a certain area of the tonal scale. For example, the graph of a luminance histogram shows the number of pixels for each brightness level (from black to white), and when there are more pixels, the peak at the certain luminance level is higher. 2. What is a color histogram? A color histogram of an image represents the distribution of the composition of colors in the image. It shows different types of colors appeared and the number of pixels in each type of the colors appeared. The relation between a color histogram and a luminance histogram is that a color histogram can be also expressed as “three luminance histograms”, each of which shows the brightness distribution of each individual red/green/blue color channel. == Characteristics of a color histogram == A color histogram focuses only on the proportion of the number of different types of colors, regardless of the spatial location of the colors. The values of a color histogram are from statistics. They show the statistical distribution of colors and the essential tone of an image. In general, as the color distributions of the foreground and background in an image are different, there might be a bimodal distribution in the histogram. For the luminance histogram alone, there is no perfect histogram and in general, the histogram can tell whether it is over-exposure or not, but there are times when you might think the image is over exposed by viewing the histogram; however, in reality it is not. == Principles of the formation of a color histogram == The formation of a color histogram is rather simple. From the definition above, we can simply count the number of pixels for each 256 scales in each of the 3 RGB channel, and plot them on 3 individual bar graphs. In general, a color histogram is based on a certain color space, such as RGB or HSV. When we compute the pixels of different colors in an image, if the color space is large, then we can first divide the color space into certain numbers of small intervals. Each of the intervals is called a bin. This process is called color quantization. Then, by counting the number of pixels in each of the bins, we get a color histogram of the image. The concrete steps of the principles can be viewed in Example 1. == Examples == === Example 1 === Given the following image of a cat (an original version and a version that has been reduced to 256 colors for easy histogram purposes), the following data represents a color histogram in the RGB color space, using four bins. Bin 0 corresponds to intensities 0–63 Bin 1 is 64–127 Bin 2 is 128–191 and Bin 3 is 192–255. === Example 2 === Application in camera: Nowadays, some cameras have the ability to show the 3 color histograms when we take photos. We can examine clips (spikes on either the black or white side of the scale) in each of the 3 RGB color histograms. If we find one or more clipping on a channel of the 3 RGB channels, then this would result in a loss of detail for that color. To illustrate this, consider this example: We know that each of the three R, G, B channels has a range of values from 0 to 255 (8 bit). So consider a photo that has a luminance range of 0–255. Assume the photo we take is made of 4 blocks that are adjacent to each other and we set the luminance scale for each of the 4 blocks of original photo to be 10, 100, 205, 245. Thus, the image looks like the topmost figure on the right. Then, we overexpose the photo a little, say, the luminance scale of each block is increased by 10. Thus, the luminance scale for each of the 4 blocks of new photo is 20, 110, 215, 255. Then, the image looks like the second figure on the right. There is not much difference between both figures, all we can see is that the whole image becomes brighter (the contrast for each of the blocks remain the same). Now, we overexpose the original photo again, this time the luminance scale of each block is increased by 50. Thus, the luminance scale for each of the 4 blocks of the new photo is 60, 150, 255, 255. The new image now looks like the third figure on the right. Note that the scale for the last block is 255 instead of 295, for 255 is the top scale and thus the last block has clipped. When this happens, we lose the contrast of the last 2 blocks, and thus we cannot recover the image no matter how we adjust it. To conclude, when taking photos with a camera that displays histograms, always keep the brightest tone in the image below the largest scale 255 on the histogram in order to avoid losing details. == Drawbacks and other approaches == The main drawback of histograms for classification is that the representation is dependent on the color of the object being studied, ignoring its shape and texture. Color histograms can potentially be identical for two images with different object content which happens to share color information. Conversely, without spatial or shape information, similar objects of different color may be indistinguishable based solely on color histogram comparisons. There is no way to distinguish a red and white cup from a red and white plate. Put it another way: histogram-based algorithms have no concept of a generic 'cup', and a model of a red and white cup is no use when given an otherwise identical blue and white cup. Another problem is that color histograms have high sensitivity to noisy interference such as lighting intensity changes and quantization errors. High dimensionality (bins) color histograms are also another issue. Some color histogram feature spaces often occupy more than one hundred di

    Read more →
  • ChatGPT

    ChatGPT

    ChatGPT is a generative artificial intelligence chatbot developed by OpenAI. Originally released in November 2022, the product uses large language models—specifically generative pre-trained transformers (GPTs)—to generate text, speech, and images in response to user prompts. ChatGPT accelerated the AI boom, an ongoing period marked by rapid investment and public attention toward the field of artificial intelligence (AI). OpenAI operates the service on a freemium model. Users can interact with ChatGPT through text, audio, and image prompts. ChatGPT was quickly adopted, reaching 100 million monthly active users two months after its release and 900 million weekly active users in February 2026. It has been lauded for its potential to transform numerous professional fields, and has instigated public debate about the nature of creativity and the future of knowledge work. The chatbot has also been criticized for its limitations and potential for unethical use. It can generate plausible-sounding but incorrect or nonsensical answers, known as hallucinations. Biases in its training data have been reflected in its responses. The chatbot can facilitate academic dishonesty, generate misinformation, and create malicious code. The ethics of its development, particularly the use of copyrighted content as training data, have also drawn controversy. == Features == ChatGPT is a chatbot and AI assistant built on large language model (LLM) technology. It is designed to generate human-like text and can carry out a wide variety of tasks. These include, among many others, writing and debugging computer programs, composing music, scripts, fairy tales, and essays, answering questions (sometimes at a level exceeding that of an average human test-taker), and generating business concepts. ChatGPT is frequently used for translation and summarization tasks, and can simulate interactive environments such as a Linux terminal, a multi-user chat room, or simple text-based games such as tic-tac-toe. Users interact with ChatGPT through conversations which consist of text, audio, and image inputs and outputs. The user's inputs to these conversations are referred to as prompts. An optional "Memory" feature allows users to tell ChatGPT to memorize specific information. Another option allows ChatGPT to recall old conversations. GPT-based moderation classifiers are used to reduce the risk of harmful outputs being presented to users. In March 2023, OpenAI added support for plugins for ChatGPT. This includes both plugins made by OpenAI, such as web browsing and code interpretation, and external plugins from developers such as Expedia, OpenTable, and Zapier. From October to December 2024, ChatGPT Search was deployed. It allows ChatGPT to search the web in an attempt to make more accurate and up-to-date responses. It increased OpenAI's direct competition with major search engines. OpenAI allows businesses to tailor how their content appears in the ChatGPT Search results and influence what sources are used. In December 2024, OpenAI launched a new feature allowing users to call ChatGPT with a telephone for up to 15 minutes per month for free. In September 2025, OpenAI added a feature called Pulse, which generates a daily analysis of a user's chats and connected apps such as Gmail and Google Calendar. In October 2025, OpenAI launched ChatGPT Atlas, a browser integrating the ChatGPT assistant directly into web navigation, to compete with existing browsers such as Google Chrome. It has an additional feature called "agentic mode" that allows it to take online actions for the user. === Paid tier === ChatGPT was initially free to the public and remains free in a limited capacity. In February 2023, OpenAI launched a premium service, ChatGPT Plus, that costs US$20 per month. What was offered on the paid plan versus the free tier changed as OpenAI has continued to update ChatGPT, and a Pro tier at $200/mo was introduced in December 2024. The Pro launch coincided with the release of the o1 model. In August 2025, ChatGPT Go was offered in India for ₹399 per month. The plan has higher limits than the free version. === Mobile apps === In May-July 2023, OpenAI began offering ChatGPT iOS and Android apps. ChatGPT can also power Android's assistant. An app for Windows launched on the Microsoft Store on October 15, 2024. === Languages === OpenAI met Icelandic President Guðni Th. Jóhannesson in 2022. In 2023, OpenAI worked with a team of 40 Icelandic volunteers to fine-tune ChatGPT's Icelandic conversation skills as a part of Iceland's attempts to preserve the Icelandic language. ChatGPT (based on GPT-4) was better able to translate Japanese to English when compared to Bing, Bard, and DeepL Translator in 2023. In December 2023, the Albanian government decided to use ChatGPT for the rapid translation of European Union documents and the analysis of required changes needed for Albania's accession to the EU. Several studies have shown that ChatGPT can outperform Google Translate in some mainstream translation tasks. However, as of 2024, no machine translation services match human expert performance. In August 2024, a representative of the Asia Pacific wing of OpenAI made a visit to Taiwan, during which a demonstration of ChatGPT's Chinese abilities was made. ChatGPT's Mandarin Chinese abilities were lauded, but the ability of the AI to produce content in Mandarin Chinese in a Taiwanese accent was found to be "less than ideal" due to differences between mainland Mandarin Chinese and Taiwanese Mandarin. === GPT Store === In November 2023, OpenAI released GPT Builder, a tool allowing users to customize ChatGPT's behavior for a specific use case. The customized systems are referred to as GPTs. In January 2024, OpenAI launched the GPT Store, a marketplace for GPTs. At launch, OpenAI included more than 3 million GPTs created by GPT Builder users in the GPT Store. === ChatGPT Apps === In September 2025, OpenAI added support for Model Context Protocol (MCP) to ChatGPT apps. When enabled in developer mode, this allows for improved third-party access to ChatGPT tools and servers. === Deep Research === In February 2025, OpenAI released Deep Research, a feature that generates reports based on extensive web searches. It was initially based on the reasoning model o3 and took 5 to 30 minutes per report. === Images === In October 2023, OpenAI's image generation model DALL-E 3 was integrated into ChatGPT. The integration used ChatGPT to write prompts for DALL-E guided by conversations with users. In March 2025, OpenAI updated ChatGPT to generate images using GPT Image instead of DALL-E. One of the most significant improvements was in the generation of text within images, which is especially useful for branded content. However, this ability is noticeably worse in non-Latin alphabets. The model can also generate new images based on existing ones provided in the prompt. These images are generated with C2PA metadata, which can be used to verify that they are AI-generated. OpenAI has emplaced additional safeguards to prevent what the company deems to be harmful image generation. === Agents === In 2025, OpenAI added several features to make ChatGPT more agentic (capable of autonomously performing longer tasks). In January, Operator was released. It was capable of autonomously performing tasks through web browser interactions, including filling forms, placing online orders, scheduling appointments, and other browser-based tasks. It was controlling a software environment inside a virtual machine with limited internet connectivity and with safety restrictions. It struggled with complex user interfaces. In May 2025, OpenAI introduced an agent for coding named Codex. It is capable of writing software, answering codebase questions, running tests, and proposing pull requests. It is based on a fine-tuned version of OpenAI o3. It has two versions, one running in a virtual machine in the cloud, and one where the agent runs in the cloud, but performs actions on a local machine connected via API. In July 2025, OpenAI released ChatGPT agent, an AI agent that can perform multi-step tasks. Like Operator, it controls a virtual computer. It also inherits from Deep Research's ability to gather and summarize significant volumes of information. The user can interrupt tasks or provide additional instructions as needed. In September 2025, OpenAI partnered with Stripe, Inc. to release Agentic Commerce Protocol, enabling purchases through ChatGPT. At launch, the feature was limited to purchases on Etsy from US users with a payment method linked to their OpenAI account. OpenAI takes an undisclosed cut from the merchant's payment. === ChatGPT Health === On January 7, 2026, OpenAI introduced a feature called "ChatGPT Health", whereby ChatGPT can discuss the user's health in a way that is separate from other chats. The feature is not available for users in the United Kingdom, Switzerland, or the European Economic Area, and is available on a waitli

    Read more →
  • Visual temporal attention

    Visual temporal attention

    Visual temporal attention is a special case of visual attention that involves directing attention to specific instant of time. Similar to its spatial counterpart visual spatial attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation of deep learning models. As visual spatial attention mechanism allows human and/or computer vision systems to focus more on semantically more substantial regions in space, visual temporal attention modules enable machine learning algorithms to emphasize more on critical video frames in video analytics tasks, such as human action recognition. In convolutional neural network-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data. == Application in Action Recognition == Recent video segmentation algorithms often exploits both spatial and temporal attention mechanisms. Research in human action recognition has accelerated significantly since the introduction of powerful tools such as Convolutional Neural Networks (CNNs). However, effective methods for incorporation of temporal information into CNNs are still being actively explored. Motivated by the popular recurrent attention models in natural language processing, the Attention-aware Temporal Weighted CNN (ATW CNN) is proposed in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is implemented as temporal weighting and it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation. Experimental results show that the ATW CNN attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments. == Literature == Seibold VC, Balke J and Rolke B (2023): Temporal attention. Front. Cognit. 2:1168320. doi: 10.3389/fcogn.2023.1168320.

    Read more →