AI Coding Neovim

AI Coding Neovim — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • IMazing

    IMazing

    iMazing is mobile device management software that allows users to transfer files and data between iOS devices (iPhone, iPad and iPod Touch) and macOS or Windows computers, in addition to many other features beyond the scope of what Apple's own tools enable. == History == Developed by DigiDNA, iMazing was initially released in 2008 as DiskAid, enabling users to transfer data and files from the iPhone or iPod Touch to Mac or Windows computers. DiskAid was renamed iMazing in 2014. Version 2.0 was released on September 13, 2016. In August 2021, version 2.14 of iMazing added a spyware detection feature. The feature is based on Amnesty International’s Mobile Verification Toolkit to detect Pegasus Spyware following the publication of Pegasus Project. == Description == With iMazing, an iPhone or iPad can be used similarly to an external hard drive. It performs tasks that iTunes doesn’t offer, including incremental backups of iOS devices, browsing and exporting text and voicemail messages, managing apps, encryption, and migrating data from an old phone to a new one. The menu bar app iMazing Mini enables automatic, wireless and encrypted backups of iPhones. The iMazing HEIC Converter is a free desktop app for Mac and PC that lets users convert photos from HEIC format to JPG or PNG.

    Read more →
  • Sieve of Eratosthenes

    Sieve of Eratosthenes

    In mathematics, the sieve of Eratosthenes is an ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e., not prime) the multiples of each prime, starting with the first prime number, 2. The multiples of a given prime are generated as a sequence of numbers starting from that prime, with constant difference between them that is equal to that prime. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime. Once all the multiples of each discovered prime have been marked as composites, the remaining unmarked numbers are primes. The earliest known reference to the sieve (Ancient Greek: κόσκινον Ἐρατοσθένους, kóskinon Eratosthénous) is in Nicomachus of Gerasa's Introduction to Arithmetic, an early 2nd-century CE book which attributes it to Eratosthenes of Cyrene, a 3rd-century BCE Greek mathematician, though describing the sieving by odd numbers instead of by primes. One of a number of prime number sieves, it is one of the most efficient ways to find all of the smaller primes. It may be used to find primes in arithmetic progressions. == Overview == A prime number is a natural number that has exactly two distinct natural number divisors: the number 1 and itself. To find all the prime numbers less than or equal to a given integer n by Eratosthenes's method: Create a list of consecutive integers from 2 through n: (2, 3, 4, ..., n). Initially, let p equal 2, the smallest prime number. Enumerate the multiples of p by counting in increments of p from 2p to n, and mark them in the list (these will be 2p, 3p, 4p, ...; the p itself should not be marked). Find the smallest number in the list greater than p that is not marked. If there was no such number, stop. Otherwise, let p now equal this new number (which is the next prime), and repeat from step 3. When the algorithm terminates, the numbers remaining not marked in the list are all the primes below n. The main idea here is that every value given to p will be prime, because if it were composite it would be marked as a multiple of some other, smaller prime. Note that some of the numbers may be marked more than once (e.g., 15 will be marked both for 3 and 5). The key property of the sieve is that only additions are needed, no multiplications or divisions are used. As a refinement, it is sufficient to mark the numbers in step 3 starting from p2, as all the smaller multiples of p will have already been marked at that point. This means that the algorithm is allowed to terminate in step 4 when p2 is greater than n. Another refinement is to initially list odd numbers only, (3, 5, ..., n), and count in increments of 2p in step 3, thus marking only odd multiples of p. This actually appears in the original algorithm. This can be generalized with wheel factorization, forming the initial list only from numbers coprime with the first few primes and not just from odds (i.e., numbers coprime with 2), and counting in the correspondingly adjusted increments so that only such multiples of p are generated that are coprime with those small primes, in the first place. === Example === To find all the prime numbers less than or equal to 30, proceed as follows. First, generate a list of natural numbers from 2 to 30: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 The first number in the list is 2; cross out every 2nd number in the list after 2 by counting up from 2 in increments of 2 (these will be all the multiples of 2 in the list): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 The next number in the list after 2 is 3; cross out every 3rd number in the list after 3 by counting up from 3 in increments of 3 (these will be all the multiples of 3 in the list): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 The next number not yet crossed out in the list after 3 is 5; cross out every 5th number in the list after 5 by counting up from 5 in increments of 5 (i.e. all the multiples of 5): 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 The next number not yet crossed out in the list after 5 is 7; the next step would be to cross out every 7th number in the list after 7, but they are all already crossed out at this point, as these numbers (14, 21, 28) are also multiples of smaller primes because 7 × 7 is greater than 30. The numbers not crossed out at this point in the list are all the prime numbers below 30: 2 3 5 7 11 13 17 19 23 29 == Algorithm and variants == === Pseudocode === The sieve of Eratosthenes can be expressed in pseudocode, as follows: algorithm Sieve of Eratosthenes is input: an integer n > 1. output: all prime numbers from 2 through n. let A be an array of Boolean values, indexed by integers 2 to n, initially all set to true. for i = 2, 3, 4, ..., not exceeding √n do if A[i] is true for j = i2, i2+i, i2+2i, i2+3i, ..., not exceeding n do set A[j] := false return all i such that A[i] is true. This algorithm produces all primes not greater than n. It includes a common optimization, which is to start enumerating the multiples of each prime i from i2. The time complexity of this algorithm is O(n log log n), provided the array update is an O(1) operation, as is usually the case. === Segmented sieve === As Sorenson notes, the problem with the sieve of Eratosthenes is not the number of operations it performs but rather its memory requirements. For large n, the range of primes may not fit in memory; worse, even for moderate n, its cache use is highly suboptimal. The algorithm walks through the entire array A, exhibiting almost no locality of reference. A solution to these problems is offered by segmented sieves, where only portions of the range are sieved at a time. These have been known since the 1970s, and work as follows: Divide the range 2 through n into segments of some size Δ ≥ √n. Find the primes in the first (i.e. the lowest) segment, using the regular sieve. For each of the following segments, in increasing order, with m being the segment's topmost value, find the primes in it as follows: Set up a Boolean array of size Δ. Mark as non-prime the positions in the array corresponding to the multiples of each prime p ≤ √m found so far, by enumerating its multiples in steps of p starting from the lowest multiple of p between m - Δ and m. The remaining non-marked positions in the array correspond to the primes in the segment. It is not necessary to mark any multiples of these primes, because all of these primes are larger than √m, as for k ≥ 1, one has ( k Δ + 1 ) 2 > ( k + 1 ) Δ {\displaystyle (k\Delta +1)^{2}>(k+1)\Delta } . If Δ is chosen to be √n, the space complexity of the algorithm is O(√n), while the time complexity is the same as that of the regular sieve. For ranges with upper limit n so large that the sieving primes below √n as required by the page segmented sieve of Eratosthenes cannot fit in memory, a slower but much more space-efficient sieve like the pseudosquares prime sieve, developed by Jonathan P. Sorenson, can be used instead. === Incremental sieve === An incremental formulation of the sieve generates primes indefinitely (i.e., without an upper bound) by interleaving the generation of primes with the generation of their multiples (so that primes can be found in gaps between the multiples), where the multiples of each prime p are generated directly by counting up from the square of the prime in increments of p (or 2p for odd primes). The generation must be initiated only when the prime's square is reached, to avoid adverse effects on efficiency. It can be expressed symbolically under the dataflow paradigm as primes = [2, 3, ...] \ [[p², p²+p, ...] for p in primes], using list comprehension notation with \ denoting set subtraction of arithmetic progressions of numbers. Primes can also be produced by iteratively sieving out the composites through divisibility testing by sequential primes, one prime at a time. It is not the sieve of Eratosthenes but is often confused with it, even though the sieve of Eratosthenes directly generates the composites instead of testing for them. Trial division has worse theoretical complexity than that of the sieve of Eratosthenes in generating ranges of primes. When testing each prime, the optimal trial division algorithm uses all prime numbers not exceeding its square root, whereas the sieve of Eratosthenes produces each composite from its prime factors only, and gets the primes "for free", between the composites. The widely known 1975 functional sieve code by David Turner is often presented as an example of the sieve of Eratosthenes but is actually a sub-optimal trial division sieve. == Algorithmic complexity == The sieve of Eratosthenes is a popular way to benchmark computer performance. The time complexity of calculating all primes below n in the random access machine model is O(n log log n) ope

    Read more →
  • Predictor–corrector method

    Predictor–corrector method

    In numerical analysis, predictor–corrector methods belong to a class of algorithms designed to integrate ordinary differential equations – to find an unknown function that satisfies a given differential equation. All such algorithms proceed in two steps: The initial, "prediction" step, starts from a function fitted to the function-values and derivative-values at a preceding set of points to extrapolate ("anticipate") this function's value at a subsequent, new point. The next, "corrector" step refines the initial approximation by using the predicted value of the function and another method to interpolate that unknown function's value at the same subsequent point. == Predictor–corrector methods for solving ODEs == When considering the numerical solution of ordinary differential equations (ODEs), a predictor–corrector method typically uses an explicit method for the predictor step and an implicit method for the corrector step. === Example: Euler method with the trapezoidal rule === A simple predictor–corrector method (known as Heun's method) can be constructed from the Euler method (an explicit method) and the trapezoidal rule (an implicit method). Consider the differential equation y ′ = f ( t , y ) , y ( t 0 ) = y 0 , {\displaystyle y'=f(t,y),\quad y(t_{0})=y_{0},} and denote the step size by h {\displaystyle h} . First, the predictor step: starting from the current value y i {\displaystyle y_{i}} , calculate an initial guess value y ~ i + 1 {\displaystyle {\tilde {y}}_{i+1}} via the Euler method, y ~ i + 1 = y i + h f ( t i , y i ) . {\displaystyle {\tilde {y}}_{i+1}=y_{i}+hf(t_{i},y_{i}).} Next, the corrector step: improve the initial guess using trapezoidal rule, y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle y_{i+1}=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.} That value is used as the next step. === PEC mode and PECE mode === There are different variants of a predictor–corrector method, depending on how often the corrector method is applied. The Predict–Evaluate–Correct–Evaluate (PECE) mode refers to the variant in the above example: y ~ i + 1 = y i + h f ( t i , y i ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} It is also possible to evaluate the function f only once per step by using the method in Predict–Evaluate–Correct (PEC) mode: y ~ i + 1 = y i + h f ( t i , y ~ i ) , y i + 1 = y i + 1 2 h ( f ( t i , y ~ i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},{\tilde {y}}_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},{\tilde {y}}_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} Additionally, the corrector step can be repeated in the hope that this achieves an even better approximation to the true solution. If the corrector method is run twice, this yields the PECECE mode: y ~ i + 1 = y i + h f ( t i , y i ) , y ^ i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ^ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\{\hat {y}}_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )},\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\hat {y}}_{i+1}){\bigr )}.\end{aligned}}} The PECEC mode has one fewer function evaluation than PECECE mode. More generally, if the corrector is run k times, the method is in P(EC)k or P(EC)kE mode. If the corrector method is iterated until it converges, this could be called PE(CE)∞.

    Read more →
  • Algorism

    Algorism

    Algorism is the technique of performing basic arithmetic by writing numbers in place value form and applying a set of memorized rules and facts to the digits. One who practices algorism is known as an algorist. This positional notation system has largely superseded earlier calculation systems that used a different set of symbols for each numerical magnitude, such as Roman numerals, and in some cases required a device such as an abacus. == Etymology == The word algorism comes from the name Al-Khwārizmī (c. 780–850), a Persian mathematician, astronomer, geographer and scholar in the House of Wisdom in Baghdad, whose name means "the native of Khwarezm", which is now in modern-day Uzbekistan. He wrote a treatise in Arabic language in the 9th century, which was translated into Latin in the 12th century under the title Algoritmi de numero Indorum. This title means "Algoritmi on the numbers of the Indians", where "Algoritmi" was the translator's Latinization of Al-Khwarizmi's name. Al-Khwarizmi was the most widely read mathematician in Europe in the late Middle Ages, primarily through his other book, the Algebra. In late medieval Latin, algorismus, the corruption of his name, simply meant the "decimal number system" that is still the meaning of modern English algorism. During the 17th century, the French form for the word – but not its meaning – was changed to algorithm, following the model of the word logarithm, this form alluding to the ancient Greek arithmos = number. English adopted the French very soon afterwards, but it wasn't until the late 19th century that "algorithm" took on the meaning that it has in modern English. In English, it was first used about 1230 and then by Chaucer in 1391. Another early use of the word is from 1240, in a manual titled Carmen de Algorismo composed by Alexandre de Villedieu. It begins thus: Haec algorismus ars praesens dicitur, in qua / Talibus Indorum fruimur bis quinque figuris. which translates as: This present art, in which we use those twice five Indian figures, is called algorismus. The word algorithm also derives from algorism, a generalization of the meaning to any set of rules specifying a computational procedure. Occasionally algorism is also used in this generalized meaning, especially in older texts. == History == Starting with the integer arithmetic developed in India using base 10 notation, Al-Khwārizmī along with other mathematicians in medieval Islam, documented new arithmetic methods and made many other contributions to decimal arithmetic (see the articles linked below). These included the concept of the decimal fractions as an extension of the notation, which in turn led to the notion of the decimal point. This system was popularized in Europe by Leonardo of Pisa, now known as Fibonacci.

    Read more →
  • AI data center

    AI data center

    An AI data center is a specialized data center facility designed for the computationally intensive tasks of training and running inference for artificial intelligence (AI) and machine learning models. Unlike general-purpose data centers, they are optimized for the parallel processing demands of AI workloads, typically using hardware such as AI accelerators (e.g., GPUs, TPUs) and high-speed interconnects. The global push to construct these specialized facilities accelerated dramatically during the AI boom of the 2020s. Memory manufacturers prioritized production of High Bandwidth Memory (HBM) essential for AI servers, which led to a global memory supply shortage amid a broader competition for advanced chips, power, and infrastructure. Major tech companies are estimated to spend $650 billion on AI data centers in 2026. == Architecture == Data centers for building and running large machine learning models contain specialized computer chips, GPUs, that use 2 to 4 times as much energy as their regular CPU counterparts (250-500 watts). AI data centers use 60 or more kilowatts per server rack, whereas more standard data centers typically use 5 to 10 kilowatts per rack. == Operators == As of August 2025, The Information tracked 18 planned or existing AI data centers in the United States, operated by Amazon Web Services, CoreWeave, Crusoe, Meta, Microsoft/OpenAI, Oracle, Tesla, and xAI. Other AI data center operators include Digital Realty and Alibaba. Data centers are also being built in China, India, Europe, Saudi Arabia, and Canada. The New Yorker described CoreWeave as the most prominent AI data center operator in the United States. Two types of data center providers for machine learning have been noted: hyperscalers and neoclouds. The Verge listed large technology companies such as Google, Meta, Microsoft, Oracle and Amazon as hyperscalers. The New York Times described neoclouds as "a new generation of data center providers". CoreWeave, Nebius, Nscale, and Lambda have been described as examples of neoclouds. In January 2025, OpenAI, in partnership with Oracle and Softbank, announced the Stargate project, which as of September 2025 is composed of six built or proposed AI data centers in the United States. In response to the Stargate project, Amazon launched in October 2025 an AI data center on 1,200 acres of farmland in Indiana. This data center, known as Project Rainier, is one of the largest AI data centers in the world, with Amazon spending $11 billion on the project. Rainier is specifically intended for training and running machine learning models from Anthropic. As of that time, this facility contains seven data centers (out of an estimated 30 planned) and will use 2.2 gigawatts of electricity (equivalent to 1 million households) and millions of gallons of water per year. Computer chips from Annapurna Labs and Anthropic, Trainium 2, were designed for use in such facilities. Amazon pumped millions of gallons of water out of the ground to construct the data center, and as of June 2025, Indiana state officials are investigating whether this dewatering process led to dry wells for local residents. In November 2025, Anthropic announced a plan in partnership with Fluidstack to develop artificial intelligence infrastructure in the United States, including data centers in New York and Texas, worth $50 billion. Other AI data center projects include the Colossus supercomputer from xAI, a Louisiana-based project from Meta, Hyperion, expected to use 5 GW of power, and a second Ohio-based Meta project, Prometheus, with a capacity of 1 GW. A 3,200-acre AI data center, capable of 4.4-4.5 GW of power and located on the decommissioned Homer City Generating Station, is under construction as of 2025, and will use seven 30-acre gas generating stations supplied by EQT. As of December 2025, CRH is working on over 100 data centers in the United States. In 2025, ExxonMobil and NextEra announced plans to build a data center powered by natural gas and using carbon capture technology, with 1.2 GW of power capacity. They previously purchased 2,500 acres of land in the Southeastern United States and plan to market the data center to an artificial intelligence company. The increased interest in AI data centers has led to several executives from companies in that space becoming billionaires, including CoreWeave, QTS, Nebius, Astera Labs, Groq, Fermi (which is connected to former United States Secretary of Energy Rick Perry), Snowflake and Cipher Mining. Several companies involved in cryptocurrency mining, such as Bitdeer, CoreWeave, Cipher Mining, TeraWulf, IREN, Core Scientific, and CleanSpark have also been involved with AI data centers. == Finances == Between January and August 2024, Microsoft, Meta, Google and Amazon collectively spent $125 billion on AI data centers. Citigroup forecasted that $2.8 trillion would be spent on AI data centers by 2030, while McKinsey and Company estimated that almost $7 trillion would be spent globally by that time. According to S&P Global, $61 billion has been spent on the data center market as a whole in 2025, while debt issuance for data centers was $182 billion during the same year. Large technology companies have offloaded the financial risks of building AI data centers by setting up special purpose vehicles or by contracting with neoclouds. For example, Meta's Hyperion was mostly funded by Blue Owl Capital, which did so using a bond offering from PIMCO. Those bonds were sold to a number of clients, including BlackRock. Meta did not borrow money itself and instead established a special purpose vehicle from which it would rent the data center. This deal was structured by Morgan Stanley for $30 billion, the largest known private capital transaction as of 2025. Neoclouds such as CoreWeave have gone into debt to buy computer chips from Nvidia for their data centers, and the chips themselves have been used for loan collateral. As of December 2025, CoreWeave took out three GPU-backed loans, collectively worth $12.4 billion, from private credit firms (Blackstone, Coatue, BlackRock, PIMCO) and from banks (Goldman Sachs, JPMorgan Chase, Wells Fargo). Thus, these companies provide an indirect connection between private credit and established banks. Data centers have also established asset-backed securities, and debt for data centers has its own derivative financial products. The real estate industry, including asset managers, public companies and private investors, has also invested in data centers. == Energy sourcing == == Environmental footprint == Average AI data centers have an electricity footprint equivalent to 100,000 households, and use billions of gallons of water for cooling their hardware. In 2025, the International Energy Agency estimated that the larger AI data centers currently under construction could consume as much electricity as 2 million households. A 2024 report from the United States Department of Energy stated that data centers overall used 17 billion gallons of water per year in the United States, primarily due to "rapid proliferation of AI servers", and that this usage was forecasted to grow to nearly 80 billion gallons by 2028. Researchers estimated that AI data centers in the United States would emit 24-44 million metric tons of carbon dioxide and use 731–1,125 million cubic meters of water per year between 2024 and 2030. Peaking power plants, which have been proposed as a power source for AI data centers, emit sulfur dioxide and have historically been located disproportionately near communities of color in the United States. Reciprocating internal combustion engines, proposed as another power source for a data center, emit PM 2.5, nitrogen oxides, and volatile organic compounds. == AI data centers in the United States == In the United States, both the Biden administration and second Trump administration supported the construction of AI data centers. In January 2025, then-president Joe Biden signed an executive order for federal government agencies to support AI data centers on federal sites built by private companies, study their effect on energy prices, and encourage their use of renewable energy. In April 2025, the United States Department of Energy suggested 16 possible sites, including Los Alamos National Laboratory, Sandia National Laboratories and Oak Ridge National Laboratory. In its July 2025 AI Action Plan, the second Trump administration supported increased production of AI data centers. Several US states have incentivized local data center construction. For example, in 2024, lawmakers in Michigan approved tax breaks for data center equipment and construction material. Some data center companies have also invested or promised to invest in the infrastructure of local communities. In December 2025, Democratic senators Elizabeth Warren, Chris Van Hollen, and Richard Blumenthal wrote to seven technology companies (Google, Microsoft, Amazon, Meta, CoreWeave, Digital Realty, and Equinix) that they w

    Read more →
  • Information access

    Information access

    Information access is the freedom or ability to identify, obtain and make use of database or information effectively. There are various research efforts in information access for which the objective is to simplify and make it more effective for human users to access and further process large and unwieldy amounts of data and information. == Technology == Several technologies applicable to the general area are Information Retrieval, Text Mining, Machine Translation, and Text Categorisation. During discussions on free access to information as well as on information policy, information access is understood as concerning the insurance of free and closed access to information. Information access covers many issues including copyright, open source, privacy, and security. == Groups == Groups such as the American Library Association, the American Association of Law Libraries, Ralph Nader's Taxpayers Assets Project have advocated for free access to legal information. The vendor neutral citation movement in the legal field is working to ensure that courts will accept citations from cases on the web which do not have the traditional (copyrighted) page numbers from the West Publishing company. There is a worldwide Free Access to Law Movement which advocates free access to legal information. The Wired article "Who Owns The Law" is an introduction to the access to legal information issue. Postsecondary organizations such as K-12 work to share information. They feel it is a legal and moral obligation to provide access (including to people with disabilities or impairments) to information through the services and programs they offer. Some effects of charging for information access, such as literature searches for physicians, is studied in the article "Fee or Free: The Effect of Charging on Information Demand". In this study, a $5 charge resulted in a 77% decrease in searches.

    Read more →
  • Ubiquitous robot

    Ubiquitous robot

    Ubiquitous robot is a term used in an analogous way to ubiquitous computing. Software useful for "integrating robotic technologies with technologies from the fields of ubiquitous and pervasive computing, sensor networks, and ambient intelligence". The emergence of mobile phone, wearable computers and ubiquitous computing makes it likely that human beings will live in a ubiquitous world in which all devices are fully networked. The existence of ubiquitous space resulting from developments in computer and network technology will provide motivations to offer desired services by any IT device at any place and time through user interactions and seamless applications. This shift has hastened the ubiquitous revolution, which has further manifested itself in the new multidisciplinary research area, ubiquitous robotics. It initiates the third generation of robotics following the first generation of the industrial robot and the second generation of the personal robot. Ubiquitous robot (Ubibot) is a robot incorporating three components including virtual software robot or avatar, real-world mobile robot and embedded sensor system in surroundings. Software robot within a virtual world can control a real-world robot as a brain and interact with human beings. Researchers of KAIST, Korea describe these three components as a Sobot (Software robot), Mobot (Mobile robot), and Embot (Embedded robot).

    Read more →
  • Species distribution modelling

    Species distribution modelling

    Species distribution modelling (SDM), also known as environmental (or ecological) niche modelling (ENM), habitat suitability modelling, predictive habitat distribution modelling, and range mapping uses ecological models to predict the distribution of a species across geographic space and time using environmental data. The environmental data are most often climate data (e.g. temperature, precipitation), but can include other variables such as soil type, water depth, and land cover. SDMs are used in several research areas in conservation biology, ecology and evolution. These models can be used to understand how environmental conditions influence the occurrence or abundance of a species, and for predictive purposes (ecological forecasting). Predictions from an SDM may be of a species' future distribution under climate change, a species' past distribution in order to assess evolutionary relationships, or the potential future distribution of an invasive species. Predictions of current and/or future habitat suitability can be useful for management applications (e.g. reintroduction or translocation of vulnerable species, reserve placement in anticipation of climate change). There are two main types of SDMs. Correlative SDMs, also known as climate envelope models, bioclimatic models, or resource selection function models, model the observed distribution of a species as a function of environmental conditions. Mechanistic SDMs, also known as process-based models or biophysical models, use independently derived information about a species' physiology to develop a model of the environmental conditions under which the species can exist. The extent to which such modelled data reflect real-world species distributions will depend on a number of factors, including the nature, complexity, and accuracy of the models used and the quality of the available environmental data layers; the availability of sufficient and reliable species distribution data as model input; and the influence of various factors such as barriers to dispersal, geologic history, or biotic interactions, that increase the difference between the realized niche and the fundamental niche. Environmental niche modelling may be considered a part of the discipline of biodiversity informatics. == History == A. F. W. Schimper used geographical and environmental factors to explain plant distributions in his 1898 Pflanzengeographie auf physiologischer Grundlage (Plant Geography Upon a Physiological Basis) and his 1908 work of the same name. Andrew Murray used the environment to explain the distribution of mammals in his 1866 The Geographical Distribution of Mammals. Robert Whittaker's work with plants and Robert MacArthur's work with birds strongly established the role the environment plays in species distributions. Elgene O. Box constructed environmental envelope models to predict the range of tree species. His computer simulations were among the earliest uses of species distribution modelling. The adoption of more sophisticated generalised linear models (GLMs) made it possible to create more sophisticated and realistic species distribution models. The expansion of remote sensing and the development of GIS-based environmental modelling increase the amount of environmental information available for model-building and made it easier to use. == Correlative vs mechanistic models == === Correlative SDMs === SDMs originated as correlative models. Correlative SDMs model the observed distribution of a species as a function of geographically referenced climatic predictor variables using multiple regression approaches. Given a set of geographically referred observed presences of a species and a set of climate maps, a model defines the most likely environmental ranges within which a species lives. Correlative SDMs assume that species are at equilibrium with their environment and that the relevant environmental variables have been adequately sampled. The models allow for interpolation between a limited number of species occurrences. For these models to be effective, it is required to gather observations not only of species presences, but also of absences, that is, where the species does not live. Records of species absences are typically not as common as records of presences, thus often "random background" or "pseudo-absence" data are used to fit these models. If there are incomplete records of species occurrences, pseudo-absences can introduce bias. Since correlative SDMs are models of a species' observed distribution, they are models of the realized niche (the environments where a species is found), as opposed to the fundamental niche (the environments where a species can be found, or where the abiotic environment is appropriate for the survival). For a given species, the realized and fundamental niches might be the same, but if a species is geographically confined due to dispersal limitation or species interactions, the realized niche will be smaller than the fundamental niche. Correlative SDMs are easier and faster to implement than mechanistic SDMs, and can make ready use of available data. Since they are correlative however, they do not provide much information about causal mechanisms and are not good for extrapolation. They will also be inaccurate if the observed species range is not at equilibrium (e.g. if a species has been recently introduced and is actively expanding its range). In standard SDMs, the distribution of a single species is often modeled, with unique parameters describing how environmental (abiotic) factors influence its occurrence probability. This allows for differentiated responses to environmental drivers among species, but can be problematic for data-deficient species. In contrast, similarities in environmental responses can be accounted for in multi-species SDMs, which model several species jointly using shared or hierarchically related parameters. However, neither approach explicitly accounts for community-level biotic interactions, which can be important in explaining species diversity patterns. Joint species distribution models (joint SDMs or J-SDMs) address this by modeling species co-occurrence patterns directly. The occurrence probability of a given species is thus influenced not only by abiotic drivers but also by inferred biotic associations with other species. This can improve accuracy for rarer taxa and provide insights into community ecology. Both standard SDMs and J-SDMs can be used to generate community-level metrics, such as species richness, by aggregating outputs across multiple species. These can be important for decision-making such as conservation planning. === Mechanistic SDMs === Mechanistic SDMs are more recently developed. In contrast to correlative models, mechanistic SDMs use physiological information about a species (taken from controlled field or laboratory studies) to determine the range of environmental conditions within which the species can persist. These models aim to directly characterize the fundamental niche, and to project it onto the landscape. A simple model may simply identify threshold values outside of which a species can't survive. A more complex model may consist of several sub-models, e.g. micro-climate conditions given macro-climate conditions, body temperature given micro-climate conditions, fitness or other biological rates (e.g. survival, fecundity) given body temperature (thermal performance curves), resource or energy requirements, and population dynamics. Geographically referenced environmental data are used as model inputs. Because the species distribution predictions are independent of the species' known range, these models are especially useful for species whose range is actively shifting and not at equilibrium, such as invasive species. Mechanistic SDMs incorporate causal mechanisms and are better for extrapolation and non-equilibrium situations. However, they are more labor-intensive to create than correlational models and require the collection and validation of a lot of physiological data, which may not be readily available. The models require many assumptions and parameter estimates, and they can become very complicated. Dispersal, biotic interactions, and evolutionary processes present challenges, as they aren't usually incorporated into either correlative or mechanistic models. Correlational and mechanistic models can be used in combination to gain additional insights. For example, a mechanistic model could be used to identify areas that are clearly outside the species' fundamental niche, and these areas can be marked as absences or excluded from analysis. See for a comparison between mechanistic and correlative models. == Niche models (correlative) == There are a variety of mathematical methods that can be used for fitting, selecting, and evaluating correlative SDMs. Models include "profile" methods, which are simple statistical techniques that use e.g. environmental distance to known sites of occurrence such as

    Read more →
  • Amália (LLM)

    Amália (LLM)

    Amália is a Portuguese large language model (LLM) announced in November 2024 by the Portuguese Prime-Minister Luís Montenegro. Its final version is expected to be launched in 2026. It is being developed by Center for Responsible AI (Centro para a AI Responsável) and by the research centers of NOVA School of Science and Technology and Instituto Superior Técnico. == History == In 2024 it was announced that the Portuguese Agency for Administrative Modernization (Agência para a Modernização Administrativa) transpose this LLM to Portuguese Public Administration. According to Paulo Dimas (CEO of the Center for Responsible AI) the three fundamental points of this LLM project are the linguistic variant (European Portuguese), cultural representation and data protection. In April 2025 it was announced that Amália had entered beta phase with an improved version being expected to be launched in September 2025. The beta version released in September is available only to the Public Administration, but the website launched in October reiterates the final version will be an open model.

    Read more →
  • Basic Formal Ontology

    Basic Formal Ontology

    Basic Formal Ontology (BFO) is a top-level ontology developed by Barry Smith and colleagues to promote interoperability among domain ontologies. The BFO methodology accomplishes this through a process of downward population. BFO is a formal ontology. The structure of BFO is based on a division of entities into two disjoint categories of continuant and occurrent, the former consists of objects and spatial regions, the latter contains processes conceived as extended through (or spanning) time. BFO thereby seeks to consolidate both time and space within a single framework A guide to building BFO-conformant domain ontologies was published by MIT Press in 2015. In 2021, the standard ISO/IEC 21838-2:2021 Information Technology — Top-level Ontologies (TLO) — Part 2: Basic Formal Ontology (BFO) was published by the Joint Technical Committee of the International Standards Organization and the International Electrotechnical Commission. ISO/IEC 21838 is a multi-part standard. Part 1 of the standard specifies the requirements that must be met if an ontology is to be classified as a top-level ontology by the standard. == History == BFO arose against the background of research in ontologies in the domain of geospatial information science by David Mark, Pierre Grenon, Achille Varzi and others, with a special role for the study of vagueness and of the ways sharp boundaries in the geospatial and other domains are created by fiat. BFO has passed through four major releases. 2001: release of BFO 1 2007: release of BFO 1.1 2015: release of BFO 2.0 2020: release of BFO 2020 2021: release of BFO 2020 as an ISO/IEC Standard The current revision was released in 2020, and this forms the basis of the standard ISO/IEC 21838-2, which was released by the Joint Committee of the International Standards Organization and International Electrotechnical Commission in 2021. == Applications == BFO has been adopted as a foundational ontology by over 650 ontology projects, principally in the areas of biomedical ontology, security and defense (intelligence) ontology, and industry ontologies. Example applications of BFO can be seen in the Ontology for Biomedical Investigations (OBI). In January 2024, BFO and the Common Core Ontologies (CCO), a suite of BFO-extension ontologies, were adopted as the "baseline standards for formal DOD and IC ontology" development work in the DOD and Intelligence Community. A memorandum to this effect was signed by the chief data officers of the DOD, the Office of the Director of National Intelligence and the Chief Digital and Artificial Intelligence Office.

    Read more →
  • Behavior selection algorithm

    Behavior selection algorithm

    In artificial intelligence, a behavior selection algorithm, or action selection algorithm, is an algorithm that selects appropriate behaviors or actions for one or more intelligent agents. In game artificial intelligence, it selects behaviors or actions for one or more non-player characters. Common behavior selection algorithms include: Finite-state machines Hierarchical finite-state machines Decision trees Behavior trees Hierarchical task networks Hierarchical control systems Utility systems Dialogue tree (for selecting what to say) == Related concepts == In application programming, run-time selection of the behavior of a specific method is referred to as the strategy design pattern.

    Read more →
  • Artificial empathy

    Artificial empathy

    Artificial empathy or computational empathy is the development of AI systems—such as companion robots or virtual agents—that can detect emotions and respond to them in an empathic way. Although such technology can be perceived as scary or threatening, it could also have a significant advantage over humans for roles in which emotional expression can be important, such as in the health care sector. An October 2025 review and meta-analysis in the British Medical Bulletin found that AI chatbots were rated as showing more empathy than human healthcare professionals in 13 of 15 studies that compared them. Care-givers who perform emotional labor above and beyond the requirements of paid labor can experience chronic stress or burnout, and can become desensitized to patients. Artificial empathy could also help the socialization of care-givers, or serve as role model for emotional detachment. A broader definition of artificial empathy is "the ability of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals (s)he emits (e.g., facial expression, voice, gesture) or to predict a person's reaction (including, but not limited to internal states) when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.)". A 2025 study reported that some multimodal large language models can recognize basic facial expressions with human-level accuracy on a commonly used research dataset of posed facial expressions. == Areas of research == There are a variety of philosophical, theoretical, and applicative questions related to artificial empathy. For example: Which conditions would have to be met for a robot to respond competently to a human emotion? What models of empathy can or should be applied to Social and Assistive Robotics? Must the interaction of humans with robots imitate affective interaction between humans? Can a robot help science learn about affective development of humans? Would robots create unforeseen categories of inauthentic relations? What relations with robots can be considered authentic? How can we assess artificial empathy in AI systems? == Examples of artificial empathy research and practice == People often communicate and make decisions based on inferences about each other's internal states (e.g., emotional, cognitive, and physical states) that are in turn based on signals emitted by the person such as facial expression, body gesture, voice, and words. Broadly speaking, artificial empathy focuses on developing non-human models that achieve similar objectives using similar data. === Streams of artificial empathy research === Artificial empathy has been applied in various research disciplines, including artificial intelligence and business. Two main streams of research in this domain are: the use of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals he or she emits (e.g., facial expression, voice, gesture) the use of nonhuman models to predict a person's reaction when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.). Research on affective computing, such as emotional speech recognition and facial expression detection, falls within the first stream of artificial empathy. Contexts that have been studied include oral interviews, call centers, human-computer interaction, sales pitches, and financial reporting. The second stream of artificial empathy has been researched more in marketing contexts, such as advertising, branding, customer reviews, in-store recommendation systems, movies, and online dating. === Artificial empathy applications in practice === With the increasing volume of visual, audio, and text data in commerce, many business applications for artificial empathy have followed. For example, Affectiva analyses viewers' facial expressions from video recordings while they are watching video advertisements in order to optimize the content design of video ads. Software like HireVue, BarRaiser, a hiring intelligence firm, helps firms make recruitment decisions by analyzing audio and video information from candidates' video interviews. Lapetus Solutions develops a model to estimate an individual's longevity, health status, and disease susceptibility from a face photo. Their technology has been applied in the insurance industry. == Artificial empathy and human services == Although artificial intelligence cannot yet replace social workers themselves, the technology has been deployed in that field. Florida State University published a study about Artificial Intelligence being used in the human services field. The research used computer algorithms to analyze health records for combinations of risk factors that could predict a future suicide attempt. The article reports, "machine learning—a future frontier for artificial intelligence—can predict with 80% to 90% accuracy whether someone will attempt suicide as far off as two years into the future. The algorithms become even more accurate as a person's suicide attempt gets closer. For example, the accuracy climbs to 92% one week before a suicide attempt when artificial intelligence focuses on general hospital patients". Such algorithmic machines can help social workers. Social work operates on a cycle of engagement, assessment, intervention, and evaluation with clients. Earlier assessment for risk of suicide can lead to earlier interventions and prevention, therefore saving lives. The system would learn, analyze, and detect risk factors, alerting the clinician of a patient's suicide risk score (analogous to a patient's cardiovascular risk score). Then, social workers could step in for further assessment and preventive intervention.

    Read more →
  • Line integral convolution

    Line integral convolution

    In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field (such as fluid motion) at high spatial resolutions. The LIC technique was first proposed by Brian Cabral and Leith Casey Leedom in 1993. In LIC, discrete numerical line integration is performed along the field lines (curves) of the vector field on a uniform grid. The integral operation is a convolution of a filter kernel and an input texture, often white noise. In signal processing, this process is known as a discrete convolution. == Overview == Traditional visualizations of vector fields use small arrows or lines to represent vector direction and magnitude. This method has a low spatial resolution, which limits the density of presentable data and risks obscuring characteristic features in the data. More sophisticated methods, such as streamlines and particle tracing techniques, can be more revealing but are highly dependent on proper seed points. Texture-based methods, like LIC, avoid these problems since they depict the entire vector field at point-like (pixel) resolution. Compared to other integration-based techniques that compute field lines of the input vector field, LIC has the advantage that all structural features of the vector field are displayed, without the need to adapt the start and end points of field lines to the specific vector field. In other words, it shows the topology of the vector field. In user testing, LIC was found to be particularly good for identifying critical points. == Algorithm == === Informal description === LIC causes output values to be strongly correlated along the field lines, but uncorrelated in orthogonal directions. As a result, the field lines contrast each other and stand out visually from the background. Intuitively, the process can be understood with the following example: the flow of a vector field can be visualized by overlaying a fixed, random pattern of dark and light paint. As the flow passes by the paint, the fluid picks up some of the paint's color, averaging it with the color it has already acquired. The result is a randomly striped, smeared texture where points along the same streamline tend to have a similar color. Other physical examples include: whorl patterns of paint, oil, or foam on a river visualisation of magnetic field lines using randomly distributed iron filings fine sand being blown by strong wind === Formal mathematical description === Although the input vector field and the result image are discretized, it pays to look at it from a continuous viewpoint. Let v {\displaystyle \mathbf {v} } be the vector field given in some domain Ω {\displaystyle \Omega } . Although the input vector field is typically discretized, we regard the field v {\displaystyle \mathbf {v} } as defined in every point of Ω {\displaystyle \Omega } , i.e. we assume an interpolation. Streamlines, or more generally field lines, are tangent to the vector field in each point. They end either at the boundary of Ω {\displaystyle \Omega } or at critical points where v = 0 {\displaystyle \mathbf {v} =\mathbf {0} } . For the sake of simplicity, critical points and boundaries are ignored in the following. A field line σ {\displaystyle {\boldsymbol {\sigma }}} , parametrized by arc length s {\displaystyle s} , is defined as d σ ( s ) d s = v ( σ ( s ) ) | v ( σ ( s ) ) | . {\displaystyle {\frac {d{\boldsymbol {\sigma }}(s)}{ds}}={\frac {\mathbf {v} ({\boldsymbol {\sigma }}(s))}{|\mathbf {v} ({\boldsymbol {\sigma }}(s))|}}.} Let σ r ( s ) {\displaystyle {\boldsymbol {\sigma }}_{\mathbf {r} }(s)} be the field line that passes through the point r {\displaystyle \mathbf {r} } for s = 0 {\displaystyle s=0} . Then the image gray value at r {\displaystyle \mathbf {r} } is set to D ( r ) = ∫ − L / 2 L / 2 k ( s ) N ( σ r ( s ) ) d s {\displaystyle D(\mathbf {r} )=\int _{-L/2}^{L/2}k(s)N({\boldsymbol {\sigma }}_{\mathbf {r} }(s))ds} where k ( s ) {\displaystyle k(s)} is the convolution kernel, N ( r ) {\displaystyle N(\mathbf {r} )} is the noise image, and L {\displaystyle L} is the length of field line segment that is followed. D ( r ) {\displaystyle D(\mathbf {r} )} has to be computed for each pixel in the LIC image. If carried out naively, this is quite expensive. First, the field lines have to be computed using a numerical method for solving ordinary differential equations, like a Runge–Kutta method, and then for each pixel the convolution along a field line segment has to be calculated. The final image will normally be colored in some way. Typically, some scalar field in Ω {\displaystyle \Omega } (like the vector length) is used to determine the hue, while the grayscale LIC output determines the brightness. Different choices of convolution kernels and random noise produce different textures; for example, pink noise produces a cloudy pattern where areas of higher flow stand out as smearing, suitable for weather visualization. Further refinements in the convolution can improve the quality of the image. === Programming description === Algorithmically, LIC takes a vector field and noise texture as input, and outputs a texture. The process starts by generating in the domain of the vector field a random gray level image at the desired output resolution. Then, for every pixel in this image, the forward and backward streamline of a fixed arc length is calculated. The value assigned to the current pixel is computed by a convolution of a suitable convolution kernel with the gray levels of all the noise pixels lying on a segment of this streamline. This creates a gray level LIC image. == Versions == === Basic === Basic LIC images are grayscale images, without color and animation. While such LIC images convey the direction of the field vectors, they do not indicate orientation; for stationary fields, this can be remedied by animation. Basic LIC images do not show the length of the vectors (or the strength of the field). === Color === The length of the vectors (or the strength of the field) is usually coded in color; alternatively, animation can be used. === Animation === LIC images can be animated by using a kernel that changes over time. Samples at a constant time from the streamline would still be used, but instead of averaging all pixels in a streamline with a static kernel, a ripple-like kernel constructed from a periodic function multiplied by a Hann function acting as a window (in order to prevent artifacts) is used. The periodic function is then shifted along the period to create an animation. === Fast LIC (FLIC) === The computation can be significantly accelerated by re-using parts of already computed field lines, specializing to a box function as convolution kernel k ( s ) {\displaystyle k(s)} and avoiding redundant computations during convolution. The resulting fast LIC method can be generalized to convolution kernels that are arbitrary polynomials. === Oriented Line Integral Convolution (OLIC) === Because LIC does not encode flow orientation, it cannot distinguish between streamlines of equal direction but opposite orientation. Oriented Line Integral Convolution (OLIC) solves this issue by using a ramp-like asymmetric kernel and a low-density noise texture. The kernel asymmetrically modulates the intensity along the streamline, producing a trace that encodes orientation; the low-density of the noise texture prevents smeared traces from overlapping, aiding readability. Fast Rendering of Oriented Line Integral Convolution (FROLIC) is a variation that approximates OLIC by rendering each trace in discrete steps instead of as a continuous smear. === Unsteady Flow LIC (UFLIC) === For time-dependent vector fields (unsteady flow), a variant called Unsteady Flow LIC has been designed that maintains the coherence of the flow animation. An interactive GPU-based implementation of UFLIC has been presented. === Parallel === Since the computation of an LIC image is expensive but inherently parallel, the process has been parallelized and, with availability of GPU-based implementations, interactive on PCs. === Multidimensional === Note that the domain Ω {\displaystyle \Omega } does not have to be a 2D domain: the method is applicable to higher dimensional domains using multidimensional noise fields. However, the visualization of the higher-dimensional LIC texture is problematic; one way is to use interactive exploration with 2D slices that are manually positioned and rotated. The domain Ω {\displaystyle \Omega } does not have to be flat either; the LIC texture can be computed also for arbitrarily shaped 2D surfaces in 3D space. == Applications == This technique has been applied to a wide range of problems since it first was published in 1993, both scientific and creative, including: Representing vector fields: visualization of steady (time-independent) flows (streamlines) visual exploration of 2D autonomous dynamical systems wind mapping water flow mapping Artistic effects for image generation and stylization: pencil drawing (auto

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Lion algorithm

    Lion algorithm

    Lion algorithm (LA) is one among the bio-inspired (or) nature-inspired optimization algorithms (or) that are mainly based on meta-heuristic principles. It was first introduced by B. R. Rajakumar in 2012 in the name, Lion’s Algorithm. It was further extended in 2014 to solve the system identification problem. This version was referred as LA, which has been applied by many researchers for their optimization problems. == Inspiration from lion’s social behaviour == Lions form a social system called a "pride", which consists of 1–3 pair of lions. A pride of lions shares a common area known as territory in which a dominant lion is called as territorial lion. The territorial lion safeguards its territory from outside attackers, especially nomadic lions. This process is called territorial defense. It protects the cubs till they become sexually matured. The maturity period is about 2–4 years. The pride undergoes survival fights to protect its territory and the cubs from nomadic lions. Upon getting defeated by the nomadic lions, the dominating nomadic lion takes the role of territorial lion by killing or driving out the cubs of the pride. The lioness of the pride give birth to cubs though the new territorial lion. When the cubs of the pride mature and considered to be stronger than the territorial lion, they take over the pride. This process is called territorial take-over. If territorial take-over happens, either the old territorial lion, which is considered to be laggard, is driven out or it leaves the pride. The stronger lions and lioness form the new pride and give birth to their own cubs == Terminology == In the LA, the terms that are associated with lion’s social system are mapped to the terminology of optimization problems. Few of such notable terms are related here. Lion: A potential solution to be generated or determined as optimal (or) near-optimal solution of the problem. The lion can be a territorial lion and lioness, cubs and nomadic lions that represent the solution based on the processing steps of the LA. Territorial lion: The strongest solution of the pride that tends to meet the objective function. Nomadic lion: A random solution, sometimes termed as nomad, to facilitate the exploration principle Laggard lion: Poor solutions that are failed in the survival fight. Pride: A pool of potential solutions i.e. a lion, lioness and their cubs, that are potential solutions of the search problem. Fertility evaluation: A process of evaluating whether the territorial lion and lioness are able to provide potential solutions in the future generations i.e. It ensures that the lion or lioness converge at every generation. Survival fight: It is a greedy selection process, which is often carried out between the pride and nomadic lion. == Algorithm == The steps involved in LA are given below: Pride Generation: Generate X m a l e {\displaystyle X^{male}} , X f e m a l e {\displaystyle X^{female}} and X 1 n o m a d {\displaystyle X_{1}^{nomad}} Determine f ( X m a l e ) {\displaystyle f(X^{male})} , f ( X f e m a l e ) {\displaystyle f(X^{female})} , f ( X 1 n o m a d ) {\displaystyle f(X_{1}^{nomad})} Initialize f r e f {\displaystyle f^{ref}} as f ( X m a l e ) {\displaystyle f(X^{male})} and N g {\displaystyle N_{g}} as 0 Memorize X m a l e {\displaystyle X^{male}} and X f e m a l e {\displaystyle X^{female}} Apply Fertility evaluation Process Generation of cubpool by mating Gender clustering: Define X c u b m a l e {\displaystyle X_{cub}^{male}} and X c u b f e m a l e {\displaystyle X_{cub}^{female}} Initialize a g e c u b {\displaystyle age_{cub}} as zero Apply Cub growth function Territorial defense: If X m a l e {\displaystyle X^{male}} (or pride) fails in the survival fight i.e. X 1 n o m a d {\displaystyle X_{1}^{nomad}} defeats the pride, go to step 4, else continue Increase a g e c u b {\displaystyle age_{cub}} by 1 and check whether cub attains maturity i.e., if a g e c u b > a g e m a x {\displaystyle age_{cub}>age_{max}} , go to Step 9, else continue Territorial takeover: If X c u b m a l e {\displaystyle X_{cub}^{male}} and X c u b f e m a l e {\displaystyle X_{cub}^{female}} are found to be closer to optimal solution, update X m a l e {\displaystyle X^{male}} and X f e m a l e {\displaystyle X^{female}} Increment N g {\displaystyle N_{g}} by 1 Repeat from Step 5, if termination criterion is not violated, else return X m a l e {\displaystyle X^{male}} as the near-optimal solution == Variants == The LA has been further taken forward to adopt in different problem areas. According to the characteristics of the problem area, significant amendment has been done in the processes and the models used in the LA. Accordingly, diverse variants have been developed by the researchers. They can be broadly grouped as hybrid LAs and non-hybrid LAs. Hybrid LAs are the LAs that are amended by the principle of other meta-heuristics, whereas the Non-hybrid LAs take any scientific amendment inside its operation that are felt to be essential to attend the respective problem area. == Applications == LA is applied in diverse engineering applications that range from network security, text mining, image processing, electrical systems, data mining and many more. Few of the notable applications are discussed here. Networking applications: In WSN, LA is used to solve the cluster head selection problem by determining optimal cluster head. Route discovery problem in both the VANET and MANET are also addressed by the LA in the literature. It is also used to detect attacks in advanced networking scenarios such as Software-Defined Networks (SDN) Power Systems: LA has attended generation rescheduling problem in a deregulated environment, optimal localization and sizing of FACTS devices for power quality enhancement and load-frequency controlling problem Cloud computing: LA is used in optimal container-resource allocation problem in cloud environment and cloud security

    Read more →