AI Chatbot Soulmate

AI Chatbot Soulmate — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Vector database

    Vector database

    A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

    Read more →
  • Berlekamp–Rabin algorithm

    Berlekamp–Rabin algorithm

    In number theory, Berlekamp's root finding algorithm, also called the Berlekamp–Rabin algorithm, is the probabilistic method of finding roots of polynomials over the field F p {\displaystyle \mathbb {F} _{p}} with p {\displaystyle p} elements. The method was discovered by Elwyn Berlekamp in 1970 as an auxiliary to the algorithm for polynomial factorization over finite fields. The algorithm was later modified by Rabin for arbitrary finite fields in 1979. The method was also independently discovered before Berlekamp by other researchers. == History == The method was proposed by Elwyn Berlekamp in his 1970 work on polynomial factorization over finite fields. His original work lacked a formal correctness proof and was later refined and modified for arbitrary finite fields by Michael Rabin. In 1986 René Peralta proposed a similar algorithm for finding square roots in F p {\displaystyle \mathbb {F} _{p}} . In 2000 Peralta's method was generalized for cubic equations. == Statement of problem == Let p {\displaystyle p} be an odd prime number. Consider the polynomial f ( x ) = a 0 + a 1 x + ⋯ + a n x n {\textstyle f(x)=a_{0}+a_{1}x+\cdots +a_{n}x^{n}} over the field F p ≃ Z / p Z {\displaystyle \mathbb {F} _{p}\simeq \mathbb {Z} /p\mathbb {Z} } of remainders modulo p {\displaystyle p} . The algorithm should find all λ {\displaystyle \lambda } in F p {\displaystyle \mathbb {F} _{p}} such that f ( λ ) = 0 {\textstyle f(\lambda )=0} in F p {\displaystyle \mathbb {F} _{p}} . == Algorithm == === Randomization === Let f ( x ) = ( x − λ 1 ) ( x − λ 2 ) ⋯ ( x − λ n ) {\textstyle f(x)=(x-\lambda _{1})(x-\lambda _{2})\cdots (x-\lambda _{n})} . Finding all roots of this polynomial is equivalent to finding its factorization into linear factors. To find such factorization it is sufficient to split the polynomial into any two non-trivial divisors and factorize them recursively. To do this, consider the polynomial f z ( x ) = f ( x − z ) = ( x − λ 1 − z ) ( x − λ 2 − z ) ⋯ ( x − λ n − z ) {\textstyle f_{z}(x)=f(x-z)=(x-\lambda _{1}-z)(x-\lambda _{2}-z)\cdots (x-\lambda _{n}-z)} where z {\displaystyle z} is some element of F p {\displaystyle \mathbb {F} _{p}} . If one can represent this polynomial as the product f z ( x ) = p 0 ( x ) p 1 ( x ) {\displaystyle f_{z}(x)=p_{0}(x)p_{1}(x)} then in terms of the initial polynomial it means that f ( x ) = p 0 ( x + z ) p 1 ( x + z ) {\displaystyle f(x)=p_{0}(x+z)p_{1}(x+z)} , which provides needed factorization of f ( x ) {\displaystyle f(x)} . === Classification of === F p {\displaystyle \mathbb {F} _{p}} elements Due to Euler's criterion, for every monomial ( x − λ ) {\displaystyle (x-\lambda )} exactly one of following properties holds: The monomial is equal to x {\displaystyle x} if λ = 0 {\displaystyle \lambda =0} , The monomial divides g 0 ( x ) = ( x ( p − 1 ) / 2 − 1 ) {\textstyle g_{0}(x)=(x^{(p-1)/2}-1)} if λ {\displaystyle \lambda } is quadratic residue modulo p {\displaystyle p} , The monomial divides g 1 ( x ) = ( x ( p − 1 ) / 2 + 1 ) {\textstyle g_{1}(x)=(x^{(p-1)/2}+1)} if λ {\displaystyle \lambda } is quadratic non-residual modulo p {\displaystyle p} . Thus if f z ( x ) {\displaystyle f_{z}(x)} is not divisible by x {\displaystyle x} , which may be checked separately, then f z ( x ) {\displaystyle f_{z}(x)} is equal to the product of greatest common divisors gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} and gcd ( f z ( x ) ; g 1 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{1}(x))} . === Berlekamp's method === The property above leads to the following algorithm: Explicitly calculate coefficients of f z ( x ) = f ( x − z ) {\displaystyle f_{z}(x)=f(x-z)} , Calculate remainders of x , x 2 , x 2 2 , x 2 3 , x 2 4 , … , x 2 ⌊ log 2 ⁡ p ⌋ {\textstyle x,x^{2},x^{2^{2}},x^{2^{3}},x^{2^{4}},\ldots ,x^{2^{\lfloor \log _{2}p\rfloor }}} modulo f z ( x ) {\displaystyle f_{z}(x)} by squaring the current polynomial and taking remainder modulo f z ( x ) {\displaystyle f_{z}(x)} , Using exponentiation by squaring and polynomials calculated on the previous steps calculate the remainder of x ( p − 1 ) / 2 {\textstyle x^{(p-1)/2}} modulo f z ( x ) {\textstyle f_{z}(x)} , If x ( p − 1 ) / 2 ≢ ± 1 ( mod f z ( x ) ) {\textstyle x^{(p-1)/2}\not \equiv \pm 1{\pmod {f_{z}(x)}}} then gcd {\displaystyle \gcd } mentioned below provide a non-trivial factorization of f z ( x ) {\displaystyle f_{z}(x)} , Otherwise all roots of f z ( x ) {\displaystyle f_{z}(x)} are either residues or non-residues simultaneously and one has to choose another z {\displaystyle z} . If f ( x ) {\displaystyle f(x)} is divisible by some non-linear primitive polynomial g ( x ) {\displaystyle g(x)} over F p {\displaystyle \mathbb {F} _{p}} then when calculating gcd {\displaystyle \gcd } with g 0 ( x ) {\displaystyle g_{0}(x)} and g 1 ( x ) {\displaystyle g_{1}(x)} one will obtain a non-trivial factorization of f z ( x ) / g z ( x ) {\displaystyle f_{z}(x)/g_{z}(x)} , thus algorithm allows to find all roots of arbitrary polynomials over F p {\displaystyle \mathbb {F} _{p}} . === Modular square root === Consider equation x 2 ≡ a ( mod p ) {\textstyle x^{2}\equiv a{\pmod {p}}} having elements β {\displaystyle \beta } and − β {\displaystyle -\beta } as its roots. Solution of this equation is equivalent to factorization of polynomial f ( x ) = x 2 − a = ( x − β ) ( x + β ) {\textstyle f(x)=x^{2}-a=(x-\beta )(x+\beta )} over F p {\displaystyle \mathbb {F} _{p}} . In this particular case problem it is sufficient to calculate only gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} . For this polynomial exactly one of the following properties will hold: GCD is equal to 1 {\displaystyle 1} which means that z + β {\displaystyle z+\beta } and z − β {\displaystyle z-\beta } are both quadratic non-residues, GCD is equal to f z ( x ) {\displaystyle f_{z}(x)} which means that both numbers are quadratic residues, GCD is equal to ( x − t ) {\displaystyle (x-t)} which means that exactly one of these numbers is quadratic residue. In the third case GCD is equal to either ( x − z − β ) {\displaystyle (x-z-\beta )} or ( x − z + β ) {\displaystyle (x-z+\beta )} . It allows to write the solution as β = ( t − z ) ( mod p ) {\textstyle \beta =(t-z){\pmod {p}}} . === Example === Assume we need to solve the equation x 2 ≡ 5 ( mod 11 ) {\textstyle x^{2}\equiv 5{\pmod {11}}} . For this we need to factorize f ( x ) = x 2 − 5 = ( x − β ) ( x + β ) {\displaystyle f(x)=x^{2}-5=(x-\beta )(x+\beta )} . Consider some possible values of z {\displaystyle z} : Let z = 3 {\displaystyle z=3} . Then f z ( x ) = ( x − 3 ) 2 − 5 = x 2 − 6 x + 4 {\displaystyle f_{z}(x)=(x-3)^{2}-5=x^{2}-6x+4} , thus gcd ( x 2 − 6 x + 4 ; x 5 − 1 ) = 1 {\displaystyle \gcd(x^{2}-6x+4;x^{5}-1)=1} . Both numbers 3 ± β {\displaystyle 3\pm \beta } are quadratic non-residues, so we need to take some other z {\displaystyle z} . Let z = 2 {\displaystyle z=2} . Then f z ( x ) = ( x − 2 ) 2 − 5 = x 2 − 4 x − 1 {\displaystyle f_{z}(x)=(x-2)^{2}-5=x^{2}-4x-1} , thus gcd ( x 2 − 4 x − 1 ; x 5 − 1 ) ≡ x − 9 ( mod 11 ) {\textstyle \gcd(x^{2}-4x-1;x^{5}-1)\equiv x-9{\pmod {11}}} . From this follows x − 9 = x − 2 − β {\textstyle x-9=x-2-\beta } , so β ≡ 7 ( mod 11 ) {\displaystyle \beta \equiv 7{\pmod {11}}} and − β ≡ − 7 ≡ 4 ( mod 11 ) {\textstyle -\beta \equiv -7\equiv 4{\pmod {11}}} . A manual check shows that, indeed, 7 2 ≡ 49 ≡ 5 ( mod 11 ) {\textstyle 7^{2}\equiv 49\equiv 5{\pmod {11}}} and 4 2 ≡ 16 ≡ 5 ( mod 11 ) {\textstyle 4^{2}\equiv 16\equiv 5{\pmod {11}}} . == Correctness proof == The algorithm finds factorization of f z ( x ) {\displaystyle f_{z}(x)} in all cases except for ones when all numbers z + λ 1 , z + λ 2 , … , z + λ n {\displaystyle z+\lambda _{1},z+\lambda _{2},\ldots ,z+\lambda _{n}} are quadratic residues or non-residues simultaneously. According to theory of cyclotomy, the probability of such an event for the case when λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} are all residues or non-residues simultaneously (that is, when z = 0 {\displaystyle z=0} would fail) may be estimated as 2 − k {\displaystyle 2^{-k}} where k {\displaystyle k} is the number of distinct values in λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} . In this way even for the worst case of k = 1 {\displaystyle k=1} and f ( x ) = ( x − λ ) n {\displaystyle f(x)=(x-\lambda )^{n}} , the probability of error may be estimated as 1 / 2 {\displaystyle 1/2} and for modular square root case error probability is at most 1 / 4 {\displaystyle 1/4} . == Complexity == Let a polynomial have degree n {\displaystyle n} . We derive the algorithm's complexity as follows: Due to the binomial theorem ( x − z ) k = ∑ i = 0 k ( k i ) ( − z ) k − i x i {\textstyle (x-z)^{k}=\sum \limits _{i=0}^{k}{\binom {k}{i}}(-z)^{k-i}x^{i}} , we may transition from f ( x ) {\displaystyle f(x)} to f ( x − z ) {\displaystyle f(x-z)} in O ( n 2 ) {\displaystyle O(n^{2})} time. Polynomial multiplication a

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • Voiceverse NFT plagiarism scandal

    Voiceverse NFT plagiarism scandal

    In January 2022, 15—the pseudonymous Massachusetts Institute of Technology (MIT) artificial intelligence researcher and creator of the non-commercial generative artificial intelligence voice synthesis research project 15.ai—discovered that the blockchain-based technology company Voiceverse had plagiarized from their platform. Voiceverse marketed itself as a service that offered AI voice cloning technology that could be purchased and traded as non-fungible tokens (NFTs). Amid heightened controversy over NFTs in the gaming industry, voice actor Troy Baker (who has been described as one of the most famous voice actors in video games) announced his partnership with Voiceverse on January 14, 2022, triggering immediate backlash over concerns about the environmental impact of NFTs, potential for fraud, predatory monetization in video games, and the potential of AI displacing jobs for human voice actors. Later that same day, 15 revealed through server logs that Voiceverse had generated voice lines using 15's free text-to-speech platform, pitch-shifted the audio to make them unrecognizable, and falsely marketed the samples as their own technology before selling them as NFTs. Within an hour of being confronted with evidence, Voiceverse confessed and stated that their marketing team had used 15.ai without proper attribution while rushing to create a technology demo to coincide with Baker's partnership announcement, further exacerbating the already negative reception to the original announcement. In response, 15 replied "Go fuck yourself"; the interaction went viral and garnered a large amount of support for the developer. News publications universally characterized this incident as Voiceverse having "stolen" from 15.ai. The next day, Baker appeared on a podcast and stated that his motivation had been to help independent creators who were unable to afford professional voice actors. Following continued backlash and the plagiarism revelation, Baker ended his partnership with Voiceverse on January 31, 2022. Subsequently, the incident was documented in multiple AI ethics databases, criticisms of predatory monetization in video games, and retrospectives as one of the earliest instances of plagiarism and theft stemming from artificial intelligence during the AI boom. == Background == === Troy Baker === Troy Baker is a prominent voice actor in the video game industry best known for his performances as Joel Miller in The Last of Us franchise. Baker has been described as "ubiquitous" by Polygon, "one of the most high-profile and prolific voice actors in video games" by Eurogamer, and "arguably the most famous voice actor in the gaming industry" by GameGuru. His other prominent roles include voicing Agent John "Jonesy" Jones in Fortnite, Booker DeWitt in BioShock Infinite, and both Batman and Joker in multiple Batman video games. As of October 2025, Baker holds the record for the most acting nominations at the BAFTA Games Awards, with five between 2013 and 2021. === Voiceverse === Voiceverse is a blockchain-based startup founded by the Bored Ape Yacht Club that marketed itself as offering AI voice cloning technology in the form of NFTs. Prior to the announcement of their partnership with Baker, Voiceverse had partnered with LOVO, Inc., an AI voice platform that, according to LOVO, could generate human-like voices. Voiceverse stated that any user who purchases a voice NFT would have unlimited and perpetual access to the voice model, which could be used to create content such as audiobooks, YouTube videos, podcasts, e-learning materials, in-game voice chat, and Zoom calls. Voiceverse promised that buyers would "OWN [sic] all of the IP" of content they created using these voices. Voiceverse's roadmap included plans to release 8,888 initial voice NFTs, a feature to add emotions to existing voices, and the ability for users to mint their own voices as NFTs. Prior to Baker's partnership, Voiceverse had also partnered with voice actors Charlet Chung, who voices D.Va in Overwatch, and Andy Milonakis of The Andy Milonakis Show. === 15.ai === 15.ai is a free web application launched in 2020 that uses artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by a pseudonymous artificial intelligence researcher known as 15, who began developing the technology as a freshman during their undergraduate research at MIT, it was an early example of an application of generative artificial intelligence during the initial stages of the AI boom. The platform showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech; the name "15.ai" references the creator's statement that a voice can be convincingly cloned with just 15 seconds of audio, as opposed to the tens of hours of data previously required. 15.ai became an Internet phenomenon in early 2021 when content utilizing it went viral on social media and quickly gained widespread use among various Internet fandoms. 15 has emphasized that it remain free and non-commercial; it only requires users to give proper credit when using the service for content creation. === NFTs in the video game industry === By early 2022, NFTs had become highly controversial within the gaming industry. Critics raised concerns about their environmental impact due to the significant energy consumption of blockchain technology. In addition, the prevalence of scams, fraud, and potential money laundering associated with NFT sales, as well as fears that NFTs were a new form of predatory monetization following the increasing frequency of loot boxes, caused vocal pushback from the gaming community. Several major gaming companies had begun exploring NFT integration into their products, though fan backlash had already forced some projects to be cancelled. On December 16, 2021, the developers of S.T.A.L.K.E.R. 2: Heart of Chernobyl announced that they would be including NFTs in the game, but cancelled within an hour of the announcement due to immediate universal backlash. Simultaneously, the rise of AI voice technology raised concerns among voice actors about potential job displacement and the devaluation of their work amidst the voice acting industry's ongoing struggles for better compensation and working conditions. == Partnership announcement and backlash == On January 14, 2022, 1:02 a.m. EST, Baker announced on Twitter that he was partnering with Voiceverse "to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create." The announcement concluded with the statement "You can hate. Or you can create." Baker's specific role with Voiceverse remained unclear at the time of the announcement. Along with Baker's announcement, Voiceverse promoted their supposed voice AI technology on Twitter by posting animated videos that featured a cat character created by NFT firm Chubbiverse. The videos concluded with text that read "The Voice Powered By Voiceverse"; Voiceverse stated on Twitter that the voices in the animations had been generated using their own AI voice synthesis technology and presented the videos as a technology demonstration of their voice NFT capabilities. The announcement provoked immediate and widespread backlash from the gaming community. Baker's tweet received thousands of replies and quote retweets (the vast majority of which were negative), far more than the number of likes; Michael McWhertor of Polygon described it as a "textbook example of being ratioed" and commented that reactions had been amplified by the final part of Baker's announcement. Michael Beckwith of Metro called Baker's approach "bizarrely aggressive". Later that day, Baker responded to the backlash by apologizing for his choice of words. He said he appreciated people's thoughts and acknowledged that the "hate/create part might have been a bit antagonistic," calling it a "bad attempt to bring levity". Despite the apology, Baker and his fellow voice actors did not distance themselves from Voiceverse at this point. At the same time, Voiceverse attempted to address the criticisms, stating that they were working to move to more environmentally friendly blockchain technology and that voice actors would receive royalties from NFT sales, with actors benefiting from any increase in NFT value. == Plagiarism revelation == On December 13, 2021, amidst the increasingly negative reactions toward NFTs among the general public, the creator of 15.ai (known pseudonymously as 15) announced that they had "no interest in incorporating NFTs into any aspect of [their] work." On January 14, 2022, 11:17 a.m. EST (10 hours after Baker's initial announcement), 15 commented on the Voiceverse venture, stating that it "sounds like a scam". Two hours later, at 1:20 p.m., 15 explicitly accused Voiceverse of "actively attempting to appropriate [15's] work for [Voiceverse's] own benefit." 15 provided evidence through

    Read more →
  • Human–robot interaction

    Human–robot interaction

    Human–robot interaction (HRI) is the study of interactions between humans and robots. Human–robot interaction is a multidisciplinary field with contributions from human–computer interaction, artificial intelligence, robotics, natural language processing, design, psychology and philosophy. A subfield known as physical human–robot interaction (pHRI) has tended to focus on device design to enable people to safely interact with robotic systems. == Origins == Human–robot interaction has been a topic of both science fiction and academic speculation even before any robots existed. Because much of active HRI development depends on natural language processing, many aspects of HRI are continuations of human communications, a field of research which is much older than robotics. The origin of HRI as a discrete problem was stated by 20th-century author Isaac Asimov in 1941, in his novel I, Robot. Asimov coined Three Laws of Robotics, namely: A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These three laws provide an overview of the goals engineers and researchers hold for safety in the HRI field, although the fields of robot ethics and machine ethics are more complex than these three principles. However, generally human–robot interaction prioritizes the safety of humans that interact with potentially dangerous robotics equipment. Solutions to this problem range from the philosophical approach of treating robots as ethical agents (individuals with moral agency), to the practical approach of creating safety zones. These safety zones use technologies such as lidar to detect human presence or physical barriers to protect humans by preventing any contact between machine and operator. Although initially robots in the human–robot interaction field required some human intervention to function, research has expanded this to the extent that fully autonomous systems are now far more common than in the early 2000s. Autonomous systems include from simultaneous localization and mapping systems which provide intelligent robot movement to natural-language processing and natural-language generation systems which allow for natural, human-esque interaction which meet well-defined psychological benchmarks. Anthropomorphic robots (machines which imitate human body structure) are better described by the biomimetics field, but overlap with HRI in many research applications. Examples of robots which demonstrate this trend include Willow Garage's PR2 robot, the NASA Robonaut, and Honda ASIMO. However, robots in the human–robot interaction field are not limited to human-like robots: Paro and Kismet are both robots designed to elicit emotional response from humans, and so fall into the category of human–robot interaction. Goals in HRI range from industrial manufacturing through Cobots, medical technology through rehabilitation, autism intervention, and elder care devices, entertainment, human augmentation, and human convenience. Future research therefore covers a wide range of fields, much of which focuses on assistive robotics, robot-assisted search-and-rescue, and space exploration. == The goal of friendly human–robot interactions == Robots are artificial agents with capacities of perception and action in the physical world often referred by researchers as workspace. Their use has been generalized in factories but nowadays they tend to be found in the most technologically advanced societies in such critical domains as search and rescue, military battle, mine and bomb detection, scientific exploration, law enforcement, entertainment and hospital care. These new domains of applications imply a closer interaction with the user, sharing the workspace but also goals in terms of task achievement. The subfield of physical human–robot interaction (pHRI) has largely focused on device design to enable people to safely interact with robotic systems but is increasingly developing algorithmic approaches in an attempt to support fluent and expressive interactions between humans and robotic systems. With the advance in AI, the research is focusing on one part towards the safest physical interaction but also on a socially correct interaction, dependent on cultural criteria. The goal is to build an intuitive, and easy communication with the robot through speech, gestures, and facial expressions. Kerstin Dautenhahn refers to friendly Human–robot interaction as "Robotiquette" defining it as the "social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans" The robot has to adapt itself to our way of expressing desires and orders and not the contrary. But every day environments such as homes have much more complex social rules than those implied by factories or even military environments. Thus, the robot needs perceiving and understanding capacities to build dynamic models of its surroundings. It needs to categorize objects, recognize and locate humans and further recognize their emotions. The need for dynamic capacities pushes forward every sub-field of robotics. Furthermore, by understanding and perceiving social cues, robots can enable collaborative scenarios with humans. For example, with the rapid rise of personal fabrication machines such as desktop 3D printers, laser cutters, etc., entering our homes, scenarios may arise where robots can collaboratively share control, co-ordinate and achieve tasks together. Industrial robots have already been integrated into industrial assembly lines and are collaboratively working with humans. The social impact of such robots have been studied and has indicated that workers still treat robots and social entities, rely on social cues to understand and work together. On the other end of HRI research the cognitive modelling of the "relationship" between human and the robots benefits the psychologists and robotic researchers the user study are often of interests on both sides. This research endeavours part of human society. For effective human – humanoid robot interaction numerous communication skills and related features should be implemented in the design of such artificial agents/systems. == General HRI research == HRI research spans a wide range of fields, some general to the nature of HRI. === Methods for perceiving humans === Methods for perceiving humans in the environment are based on sensor information. Research on sensing components and software led by Microsoft provide useful results for extracting the human kinematics (see Kinect). An example of older technique is to use colour information for example the fact that for light skinned people the hands are lighter than the clothes worn. In any case a human modelled a priori can then be fitted to the sensor data. The robot builds or has (depending on the level of autonomy the robot has) a 3D mapping of its surroundings to which is assigned the humans locations. Most methods intend to build a 3D model through vision of the environment. The proprioception sensors permit the robot to have information over its own state. This information is relative to a reference. Theories of proxemics may be used to perceive and plan around a person's personal space. A speech recognition system is used to interpret human desires or commands. By combining the information inferred by proprioception, sensor and speech the human position and state (standing, seated). In this matter, natural-language processing is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural-language data. For instance, neural-network architectures and learning algorithms that can be applied to various natural-language processing tasks including part-of-speech tagging, chunking, named-entity recognition, and semantic role labeling. === Methods for motion planning === Motion planning in dynamic environments is a challenge that can at the moment only be achieved for robots with 3 to 10 degrees of freedom. Humanoid robots or even 2 armed robots, which can have up to 40 degrees of freedom, are unsuited for dynamic environments with today's technology. However lower-dimensional robots can use the potential field method to compute trajectories which avoid collisions with humans. === Cognitive models and theory of mind === Humans exhibit negative social and emotional responses as well as decreased trust toward some robots that closely, but imperfectly, resemble humans; this phenomenon has been termed the "Uncanny Valley". However recent research in telepresence robots has established that mimicking human body postures and expressive gestures has made the robots likeable and engaging in a remote setting. Further, the presence o

    Read more →
  • Ecoinformatics

    Ecoinformatics

    Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics. A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web. Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. == Software Development Lifecycle == Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher. Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question. Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system. Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets. Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach. == Applications of Ecoinformatics Across Ecology == === Ecosystem Ecology === Source: Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases. A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science. A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data. === Urban Ecology === Source: The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity. While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area. === Infectious Disease === Source: Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data. On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission. == Databases == Source: USGS National Streamflow sensor network GBIF Neotoma Paleobiology database European Vegetation Archive USDA Forest Inventory Analysis TRY BIEN AmeriFlux TEAM iNaturalist NEON GLEON LTER CZO TERN SAEON

    Read more →
  • Web data integration

    Web data integration

    Web data integration (WDI) is the process of aggregating and managing data from different websites into a single, homogeneous workflow. This process includes data access, transformation, mapping, quality assurance and fusion of data. Data that is sourced and structured from websites is referred to as "web data". WDI is an extension and specialization of data integration that views the web as a collection of heterogeneous databases. Data integration techniques in the context of the web, forms the foundation for businesses taking advantage of data available on the ever-increasing number of publicly-accessible websites. Corporate spending on this area amounted to about USD 2.5bn in 2017, and it is expected that by 2020 the market will reach almost USD 7bn.

    Read more →
  • AlphaTensor

    AlphaTensor

    AlphaTensor is an artificial intelligence system developed by DeepMind for discovering efficient matrix multiplication algorithms using reinforcement learning. Introduced in 2022, the system was based on AlphaZero and formulated the search for matrix multiplication algorithms as a single-player game called TensorGame. AlphaTensor was designed to search for new ways to multiply matrices with fewer scalar multiplication operations. Matrix multiplication is a fundamental operation in linear algebra, numerical analysis, scientific computing, computer graphics, and machine learning. The system discovered thousands of matrix multiplication algorithms, including algorithms that rediscovered known human-designed methods and others that improved on previously known results for particular matrix sizes and mathematical settings. == Background == Matrix multiplication is one of the basic operations in numerical computing. The standard algorithm for multiplying two square matrices has cubic time complexity, while faster algorithms such as the Strassen algorithm reduce the number of multiplication operations by using more complex algebraic decompositions. Finding optimal matrix multiplication algorithms can be difficult because it involves searching through a large space of possible tensor decompositions. AlphaTensor approached this problem by representing algorithm discovery as TensorGame, in which each move corresponds to an operation that reduces a tensor representing matrix multiplication. The goal of the game is to find a low-rank decomposition of the matrix multiplication tensor, corresponding to an efficient multiplication algorithm. == Development == AlphaTensor was developed by DeepMind and described in a paper published in Nature in October 2022. The system built on the reinforcement-learning approach used in AlphaZero, which had previously been applied to games such as Go, chess, and shogi. Unlike those games, TensorGame involved a very large search space, requiring changes to the AlphaZero-style search method and neural network architecture. DeepMind released source code and discovered algorithms associated with the publication through a public GitHub repository. == Results == AlphaTensor discovered matrix multiplication algorithms over both standard arithmetic and finite fields. One widely reported result was a method for multiplying 4 × 4 matrices over the field with two elements using 47 multiplication operations, improving on the 49 operations required by applying Strassen's algorithm recursively in that setting. The system also found algorithms optimized for particular computer hardware, including algorithms designed for graphics processing units and Tensor Processing Units. DeepMind stated that some of the hardware-specific algorithms improved practical execution time compared with commonly used algorithms on the tested hardware. == Significance == AlphaTensor was described as an example of using machine learning not only to apply existing algorithms, but to assist in discovering new ones. The work was connected to broader research in algorithm discovery, automated machine learning, program synthesis, and computational complexity theory, especially the open problem of determining the optimal complexity of matrix multiplication. AlphaTensor later became part of a broader group of Google DeepMind systems for algorithm and mathematical discovery, alongside systems such as AlphaDev and AlphaEvolve.

    Read more →
  • .ai

    .ai

    .ai is the Internet country code top-level domain (ccTLD) for Anguilla, a British Overseas Territory in the Caribbean. It is administered by the government of Anguilla. It is a popular domain hack with companies and projects related to the artificial intelligence industry (AI). Google's ad targeting treats .ai as a generic top-level domain (gTLD) because "users and website owners frequently see [the domain] as being more generic than country-targeted." In 2021, Google Search analyst Gary Illyes announced that ".ai" had been added to Google’s list of generic country-code top-level domains, meaning that Google would no longer infer Anguilla-specific targeting from the ccTLD. Identity Digital began managing the domain as of January 2025. == Second and third level registrations == Registrations within off.ai, com.ai, net.ai, and org.ai are available worldwide without restriction. From 15 September 2009, second level registrations within .ai are available to everyone worldwide. == Registration == The minimum registration term allowed for .ai domains is 2 through 10 years for registration and renewal, and a 2-year renewal for domain transfer. Identity Digital is the authority in charge of managing this extension. Registrations began on 16 February 1995. The limits on the number of characters used for the domain name are, at a minimum, from 1 to 3, depending on the registrar, and always at most 63 characters. The character set supported for .ai domain names includes A–Z, a–z, 0–9, and hyphen. As of November 2022, .ai domains cannot accommodate IDN characters. There are no requirements for registering a domain, including local and foreign residents. A .ai domain can be suspended or revoked, if the domain is involved in illegal activity such as violating trademarks or copyrights. Usage must not violate the laws of Anguilla. Anguilla uses the UDRP. Filing a UDRP challenge requires using one of the ICANN Approved Dispute Resolution Service Providers. If the domain is with an ICANN accredited registrar, they should work with the arbitrator. Usually this means either doing nothing or transferring a domain. .ai domains are transferable to any desired registrars as the registration of domain is done maintaining EPP. There used to be a whois.ai-based platform of expired domains in which those could be procured and auctioned every ten days through a standard online process. The last auctions of such kind closed there in December 2024; the platform had been scheduled for shutdown on 30 June 2025, but remained online in the months following that date. == Valuation == Domains cost depends on the registrar, with yearly fees ranging from US$140 (the base fee, as established by Anguilla) to $200. As of July 2025, the highest-valued .ai domain is an undisclosed one sold on 8 November 2023, on Escrow.com, for US$1,500,000—months after an initial $300,000 sale to the same buyer. Among the publicly disclosed ones, the most valued, fin.ai, was sold for $1,000,000 in March 2025. On 16 December 2017, the .ai registry started supporting the Extensible Provisioning Protocol (EPP) and migrated all of its domains onto an EPP system. Consequently, many registrars are allowed to sell .ai domains. Since that date, the .ai ccTLD has also been popular with artificial intelligence companies and organisations. Though such trends are primarily seen among new AI based companies or startups, many established AI and Tech companies preferred not to opt for .ai domains. For example, DeepMind has its domain retained at .com; Meta has redirected its facebook.ai domain to ai.meta.com. == Impact on Anguilla's economy == The registration fees earned from the .ai domains go to the treasury of the Government of Anguilla. As per a 2018 New York Times report, the total revenue generated out of selling .ai domains was $2.9 million. In 2023, Anguilla's government made about US$32 million from fees collected for registering .ai domains; that amounted to over 10% of gross domestic product for the territory. "In the years before the real breakthrough of AI, revenue from .ai domains made up less than 1% of our state income, by 2025 it will be around 47%," explained Jose Vanterpool, Minister of Infrastructure and Communications (MICUHITES), in an interview with BBC. The high 90% renewal rate of .ai domains and the 2025 renewal wave of domains registered in 2023 are driving another surge in state revenues, according to Domaintechnik.

    Read more →
  • Conceptualization (information science)

    Conceptualization (information science)

    In information science, a conceptualization is an abstract simplified view of some selected parts of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them. An explicit specification of a conceptualization is an ontology, and it may occur that a conceptualization can be realized by several distinct ontologies. An ontological commitment in describing ontological comparisons is taken to refer to that subset of elements of an ontology shared with all the others. "An ontology is language-dependent", its objects and interrelations described within the language it uses, while a conceptualization is always the same, more general, its concepts existing "independently of the language used to describe it". The relation between these terms is shown in the figure to the right. Not all workers in knowledge engineering use the term "conceptualization", but instead refer to the conceptualization itself, or to the ontological commitment of all its realizations, as an overarching ontology. == Purpose and implementation == As a higher level abstraction, a conceptualization facilitates the discussion and comparison of its various ontologies, facilitating knowledge sharing and reuse. Each ontology based upon the same overarching conceptualization maps the conceptualization into specific elements and their relationships. The question then arises as to how to describe the "conceptualization" in terms that can encompass multiple ontologies. This issue has been called the Tower of Babel problem, that is, how can persons used to one ontology talk with others using a different ontology? This problem is easily grasped, but a general resolution is not at hand. It can be a "bottom-up" or a "top-down" approach, or something in between. However, in more artificial situations, such as information systems, the idea of a "conceptualization" and the "ontological commitment" of various ontologies that realize the "conceptualization" is possible. The formation of a conceptualization and its ontologies involves these steps: specification of the conceptualization ontology concepts: every definition involves the definitions of other terms relationships between the concepts: this step maps conceptual relationships onto the ontology structure groups of concepts: this step may lead to the creation of sub-ontologies formal description of ontology commitments, for example, to make them computer readable An example of moving conception into a language leading to a variety of ontologies is the expression of a process in pseudocode (a strictly structured form of ordinary language) leading to implementation in several different formal computer languages like Lisp or Fortran. The pseudocode makes it easier to understand the instructions and compare implementations, but the formal languages make possible the compilation of the ideas as computer instructions. Another example is mathematics, where a very general formulation (the analog of a conceptualization) is illustrated with "applications" that are more specialized examples. For instance, aspects of a function space can be illustrated using a vector space or a topological space that introduce interpretations of the "elements" of the conceptualization and additional relationships between them but preserve the connections required in the function space.

    Read more →
  • Sparse identification of non-linear dynamics

    Sparse identification of non-linear dynamics

    Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data. Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO and sparse Bayesian inference) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data. It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks. == Mathematical Overview == First, consider a dynamical system of the form x ˙ = d d t x ( t ) = f ( x ( t ) ) , {\displaystyle {\dot {\textbf {x}}}={\frac {d}{dt}}{\textbf {x}}(t)={\textbf {f}}({\textbf {x}}(t)),} where x ( t ) ∈ R n {\displaystyle {\textbf {x}}(t)\in \mathbb {R} ^{n}} is a state vector (snapshot) of the system at time t {\displaystyle t} and the function f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots. With x {\displaystyle {\textbf {x}}} and x ˙ {\displaystyle {\dot {\textbf {x}}}} sampled at m {\displaystyle m} equidistant points in time ( t 1 , t 2 , ⋯ , t m {\displaystyle t_{1},t_{2},\cdots ,t_{m}} ), these can be arranged into matrices of the form X = [ x T ( t 1 ) x T ( t 2 ) ⋮ x T ( t m ) ] = [ x 1 ( t 1 ) x 2 ( t 1 ) ⋯ x n ( t 1 ) x 1 ( t 2 ) x 2 ( t 2 ) ⋯ x n ( t 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( t m ) x 2 ( t m ) ⋯ x n ( t m ) ] , {\displaystyle {\bf {{X}={\begin{bmatrix}\mathbf {x} ^{\mathsf {T}}(t_{1})\\\mathbf {x} ^{\mathsf {T}}(t_{2})\\\vdots \\\mathbf {x} ^{\mathsf {T}}(t_{m})\end{bmatrix}}={\begin{bmatrix}x_{1}(t_{1})&x_{2}(t_{1})&\cdots &x_{n}(t_{1})\\x_{1}(t_{2})&x_{2}(t_{2})&\cdots &x_{n}(t_{2})\\\vdots &\vdots &\ddots &\vdots \\x_{1}(t_{m})&x_{2}(t_{m})&\cdots &x_{n}(t_{m})\end{bmatrix}},}}} and similarly for X ˙ {\displaystyle {\dot {\mathbf {X} }}} . Next, a library Θ ( X ) {\displaystyle \mathbf {\Theta } (\mathbf {X} )} of nonlinear candidate functions of the columns of X {\displaystyle {\textbf {X}}} is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on): Θ ( X ) = [ | | | | | | 1 X X 2 X 3 ⋯ sin ⁡ ( X ) cos ⁡ ( X ) ⋯ | | | | | | ] {\displaystyle \ \ \ {\bf {{\Theta }({\bf {{X})={\begin{bmatrix}\vline &\vline &\vline &\vline &&\vline &\vline &\\1&{\bf {X}}&{\bf {{X}^{2}}}&{\bf {{X}^{3}}}&\cdots &\sin({\bf {{X})}}&\cos({\bf {{X})}}&\cdots \\\vline &\vline &\vline &\vline &&\vline &\vline &\end{bmatrix}}}}}}} The number of possible model structures from this library is combinatorially high. f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} is then substituted by Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} and a vector of coefficients Ξ = [ ξ 1 ξ 2 ⋯ ξ n ] {\displaystyle {\bf {{\Xi }=\left[{\bf {{\xi }_{1}{\bf {{\xi }_{2}\cdots {\bf {{\xi }_{n}}}}}}}\right]}}} determining the active terms in f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} : X ˙ = Θ ( X ) Ξ {\displaystyle {\dot {\bf {X}}}={\bf {{\Theta }({\bf {{X}){\bf {\Xi }}}}}}} Because only a few terms are expected to be active at each point in time, an assumption is made that f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} admits a sparse representation in Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} . This then becomes an optimization problem in finding a sparse Ξ {\displaystyle {\bf {\Xi }}} which optimally embeds X ˙ {\displaystyle {\dot {\textbf {X}}}} . In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting ( L 1 {\displaystyle L_{1}} ) regularization ξ k = arg ⁡ min ξ k ′ | | X ˙ k − Θ ( X ) ξ k ′ | | 2 + λ | | ξ k ′ | | 1 , {\displaystyle {\bf {{\xi }_{k}={\underset {\bf {{\xi }'_{k}}}{\arg \min }}\left|\left|{\dot {\bf {X}}}_{k}-{\bf {{\Theta }({\bf {{X}){\bf {{\xi }'_{k}}}}}}}\right|\right|_{2}+\lambda \left|\left|{\bf {{\xi }'_{k}}}\right|\right|_{1},}}} where λ {\displaystyle \lambda } is a regularization parameter. Finally, the sparse set of ξ k {\displaystyle {\bf {{\xi }_{k}}}} can be used to reconstruct the dynamical system: x ˙ k = Θ ( x ) ξ k {\displaystyle {\dot {x}}_{k}={\bf {{\Theta }({\bf {{x}){\bf {{\xi }_{k}}}}}}}}

    Read more →
  • Data drilling

    Data drilling

    Data drilling (also drilldown) refers to any of various operations and transformations on tabular, relational, and multidimensional data. The term has widespread use in various contexts, but is primarily associated with specialized software designed specifically for data analysis. == Common data drilling operations == There are certain operations that are common to applications that allow data drilling. Among them are: Query operations: tabular query pivot query === Tabular query === Tabular query operations consist of standard operations on data tables. Among these operations are: search sort filter (by value) filter (by extended function or condition) transform (e.g., by adding or removing columns) Consider the following example: Fred and Wilma table (Fig 001): gender, fname, lname, home male, fred, chopin, Poland male, fred, flintstone, bedrock male, fred, durst, usa female, wilma, flintstone, bedrock female, wilma, rudolph, usa female, wilma, webb, usa male, fred, johnson, usa The preceding is an example of a simple flat file table formatted as comma-separated values. The table includes first name, last name, gender and home country for various people named fred or wilma. Although the example is formatted this way, it is important to emphasize that tabular query operations (as well as all data drilling operations) can be applied to any conceivable data type, regardless of the underlying formatting. The only requirement is that the data be readable by the software application in use. === Pivot query === A pivot query allows multiple representations of data according to different dimensions. This query type is similar to tabular query, except it also allows data to be represented in summary format, according to a flexible user-selected hierarchy. This class of data drilling operation is formally, (and loosely) known by different names, including crosstab query, pivot table, data pilot, selective hierarchy, intertwingularity and others. To illustrate the basics of pivot query operations, consider the Fred and Wilma table (Fig 001). A quick scan of the data reveals that the table has redundant information. This redundancy could be consolidated using an outline or a tree structure or in some other way. Moreover, once consolidated, the data could have many different alternate layouts. Using a simple text outline as output, the following alternate layouts are all possible with a pivot query: Summarize by gender (Fig 001): female flintstone, wilma rudolph, wilma webb, wilma male chopin, fred flintstone, fred durst, fred johnson, fred (Dimensions = gender; Tabular fields = lname, fname;) Summarize by home, lname (Fig 001): bedrock flintstone fred wilma Poland chopin fred usa ... (Dimensions = home, lname; Tabular fields = fname;) ==== Uses ==== Pivot query operations are useful for summarizing a corpus of data in multiple ways, thereby illustrating different representations of the same basic information. Although this type of operation appears prominently in spreadsheets and desktop database software, its flexibility is arguably under-utilized. There are many applications that allow only a 'fixed' hierarchy for representing data, and this represents a substantial limitation. == Drillup == Drillup is the opposite of drilldown. For example, if you drilldown to see the revenue of one product, then you might want to drillup to see the revenue of all products.

    Read more →
  • Qlone

    Qlone

    Qlone is a 3D scanning app based on photogrammetry for creation of 3D models on mobile devices. The resultant 3D models can be exported for external use. Qlone was featured at the Apple Worldwide Developers Conference in 2021. It was also featured on BBC Click. == Qlone features == === 3D scanning === 3D scanning with Qlone requires the use of an included mat design. The user prints the mat onto a sheet of paper, then places the object to be scanned in the centre of the mat. An augmented reality dome within the Qlone app guides the user through the subsequent scanning process. The iOS version of Qlone allows scanning without the mat. === 3D editing === Qlone's editing features allow users to adjust 3D scanned models using texture mapping, polygon mesh size simplification, digital sculpting, cleaning and smoothing, and artistic effects. === File export === Qlone exports directly to multiple 3D platforms including SketchFab, i.materialise, Lens Studio for Snapchat, Shapeways and CGTrader. Models can also be exported in different 3D formats for use in other 3D tools – OBJ, STL, FBX, USDZ, GLB (Binary gLTF), PLY, and X3D. == Use in Science, Education and Academia == Due to its inexpensive, simple and accessible nature for creating 3D models, Qlone was used in many academically educational and scientific research projects. The European Space Agency used Qlone to scan rocks in a Tele-Robotic rock collection experiment. Neurosurgeons from the University of Southern California and surgeons from Tulane University School of Medicine used Qlone to create 3D models of cadaveric specimens and anatomical models with the aim of increasing access to such components for enhancing anatomy training and allowing realistic surgical simulations for neurosurgeons and practitioners worldwide. Archaeologists from Texas A&M University used Qlone to create 3D replicas of artifacts and models and students from Vancouver iTech Preparatory Middle School used Qlone to create 3D scans of more than 100 artifacts from Fort Vancouver National Historic Site.

    Read more →
  • NewSQL

    NewSQL

    NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts. == History == The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of a new generation of database management systems. One of the first NewSQL systems was the H-Store parallel database system. == Applications == Typical applications are characterized by heavy OLTP transaction volumes. OLTP transactions; are short-lived (i.e., no user stalls) touch small amounts of data per transaction use indexed lookups (no table scans) have a small number of forms (a small number of queries with different arguments). However, some support hybrid transactional/analytical processing (HTAP) applications. Such systems improve performance and scalability by omitting heavyweight recovery or concurrency control. == List of NewSQL-databases == Apache Trafodion Clustrix CockroachDB Couchbase CrateDB Google Spanner MySQL Cluster NuoDB OceanBase Pivotal GemFire XD SequoiaDB SingleStore was formerly known as MemSQL. TIBCO Active Spaces TiDB TokuDB TransLattice Elastic Database VoltDB YDB YugabyteDB == Features == The two common distinguishing features of NewSQL database solutions are that they support online scalability of NoSQL databases and the relational data model (including ACID consistency) using SQL as their primary interface. NewSQL systems can be loosely grouped into three categories: === New architectures === NewSQL systems adopt various internal architectures. Some systems employ a cluster of shared-nothing nodes, in which each node manages a subset of the data. They include components such as distributed concurrency control, flow control, and distributed query processing. === SQL engines === The second category are optimized storage engines for SQL. These systems provide the same programming interface as SQL, but scale better than built-in engines. === Transparent sharding === These systems automatically split databases across multiple nodes using Raft or Paxos consensus algorithm.

    Read more →
  • Microsoft SQL Server Master Data Services

    Microsoft SQL Server Master Data Services

    Microsoft SQL Server Master Data Services (MDS) is a Master Data Management (MDM) product from Microsoft that ships as a part of the Microsoft SQL Server relational database management system. Master data management (MDM) allows an organization to discover and define non-transactional lists of data, and compile maintainable, reliable master lists. Master Data Services first shipped with Microsoft SQL Server 2008 R2. Microsoft SQL Server 2016 introduced enhancements to Master Data Services, such as improved performance and security, and the ability to clear transaction logs, create custom indexes, share entity data between different models, and support for many-to-many relationships. == Overview == In Master Data Services, the model is the highest level container in the structure of your master data. You create a model to manage groups of similar data. A model contains one or more entities, and entities contain members that are the data records. An entity is similar to a table. Like other MDM products, Master Data Services aims to create a centralized data source and keep it synchronized, and thus reduce redundancies, across the applications which process the data. Sharing the architectural core with Stratature +EDM, Master Data Services uses a Microsoft SQL Server database as the physical data store. It is a part of the Master Data Hub, which uses the database to store and manage data entities. It is a database with the software to validate and manage the data, and keep it synchronized with the systems that use the data. The master data hub has to extract the data from the source system, validate, sanitize and shape the data, remove duplicates, and update the hub repositories, as well as synchronize the external sources. The entity schemas, attributes, data hierarchies, validation rules and access control information are specified as metadata to the Master Data Services runtime. Master Data Services does not impose any limitation on the data model. Master Data Services also allows custom Business rules, used for validating and sanitizing the data entering the data hub, to be defined, which is then run against the data matching the specified criteria. All changes made to the data are validated against the rules, and a log of the transaction is stored persistently. Violations are logged separately, and optionally the owner is notified, automatically. All the data entities can be versioned. Master Data Services allows the master data to be categorized by hierarchical relationships, such as employee data are a subtype of organization data. Hierarchies are generated by relating data attributes. Data can be automatically categorized using rules, and the categories are introspected programmatically. Master Data Services can also expose the data as Microsoft SQL Server views, which can be pulled by any SQL-compatible client. It uses a role-based access control system to restrict access to the data. The views are generated dynamically, so they contain the latest data entities in the master hub. It can also push out the data by writing to some external journals. Master Data Services also includes a web-based UI for viewing and managing the data. It uses ASP.NET in the back-end. The Silverlight front-end was replaced with HTML5 in SQL Server 2019. Master Data Services provides a Web service interface to expose the data, as well as an API, which internally uses the exposed web services, exposing the feature set, programmatically, to access and manipulate the data. It also integrates with Active Directory for authentication purposes. Unlike +EDM, Master Data Services supports Unicode characters, as well as support multilingual user interfaces. SQL Server 2016 introduced a significant performance increase in Master Data Services over previous versions. == Terminology == Model is the highest level of an MDS instance. It is the primary container for specific groupings of master data. In many ways it is very similar to the idea of a database. Entities are containers created within a model. Entities provide a home for members, and are in many ways analogous to database tables. (e.g. Customer) Members are analogous to the records in a database table (Entity) e.g. Will Smith. Members are contained within entities. Each member is made up of two or more attributes. Attributes are analogous to the columns within a database table (Entity) e.g. Surname. Attributes exist within entities and help describe members (the records within the table). Name and Code attributes are created by default for each entity and serve to describe and uniquely identify leaf members. Attributes can be related to other attributes from other entities which are called 'domain-based' attributes. This is similar to the concept of a foreign key. Other attributes however, will be of type 'free-form' (most common) or 'file'. Attribute Groups are explicitly defined collections of particular attributes. Say you have an entity "customer" that has 50 attributes — too much information for many of your users. Attribute groups enable the creation of custom sets of hand-picked attributes that are relevant for specific audiences. (e.g. "customer - delivery details" that would include just their name and last known delivery address). This is very similar to a database view. Hierarchies organize members into either Derived or Explicit hierarchical structures. Derived hierarchies, as the name suggests, are derived by the MDS engine based on the relationships that exist between attributes. Explicit hierarchies are created by hand using both leaf and consolidated members. Business Rules can be created and applied against model data to ensure that custom business logic is adhered to. In order to be committed into the system data must pass all business rule validations applied to them. e.g. Within the Customer Entity you may want to create a business rule that ensures all members of the 'Country' Attribute contain either the text "USA" or "Canada". The Business Rule once created and ran will then verify all the data is correct before it accepts it into the approved model. Versions provide system owners / administrators with the ability to Open, Lock or Commit a particular version of a model and the data contained within it at a particular point in time. As the content within a model varies, grows or shrinks over time versions provide a way of managing metadata so that subscribing systems can access to the correct content.

    Read more →