AI Chat On Google

AI Chat On Google — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Voice search

    Voice search

    Voice search, also called voice-enabled search, allows the user to use a voice to search the Internet, a website, or an app. In a broader definition, voice search includes open-domain keyword query on any information on the Internet, for example in Google Voice Search, Cortana, Siri and Amazon Echo. Voice search is often interactive, involving several rounds of interaction that allows a system to ask for clarification. Voice search is a type of dialog system. Voice search is not a replacement for typed search. Rather the search terms, experience and use cases can differ heavily depending on the input type. == Supported language == Language is the most essential factor for a system to understand, and provide the most accurate results of what the user searches. This covers across languages, dialects, and accents, as users want a voice assistant that both understands them and speaks to them understandably. While spoken and written languages differ, voice search should support natural spoken language instead of only transforming voice into text and doing a regular text search with the help speech recognition. For example, in typed search an eCommerce user can easily copy and paste an alphanumeric product code to search field, but when speaking the search terms can be very different, such as "show me the new Bluetooth headphones by Samsung". == How it works == The difference between text and voice search is not only the input type. The mechanism must include an automatic speech recognition (ASR) for input, but it can also include natural language understanding for natural spoken search queries such as "What's the population for the United States" It can include text-to-speech (TTS) or a regular display for output modalities. Users might sometimes be required to activate the search by using a wake word. Then, the search system will detect the language spoken by the user. It will then detect the keywords and context of the sentence. Lastly, the device will return results depending on its output. A device with a screen might display the results, while a device without a screen will speak them back to the searcher.

    Read more →
  • Very large database

    Very large database

    A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies. == Definition == The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements have continually changed what is considered very large. One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet". == Sizes of a VLDB database == There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded. == VLDB challenges == Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources. === Configuration === Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases. === Administration === The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases. === Availability and maintenance === When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable. ==== Backup and recovery ==== Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution. === Performance === Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (linear time); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (sub-linear time). Other effects can be caching becoming less efficient because proportionally less data can be cached and while some indexes such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt. Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase. Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine. ==== Partitioning ==== Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery., bulk movements due to information lifecycle management (ILM), reducing contention as well as allowing optimization of some query processing. === Storage === In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention, high throughput, and high availability. === Server resources === The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve. == Relationship to big data == VLDB is not the same as big data, but the storage aspect of big data may involve a VLDB database. That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional RDBMS's might encounter.

    Read more →
  • Basic Formal Ontology

    Basic Formal Ontology

    Basic Formal Ontology (BFO) is a top-level ontology developed by Barry Smith and colleagues to promote interoperability among domain ontologies. The BFO methodology accomplishes this through a process of downward population. BFO is a formal ontology. The structure of BFO is based on a division of entities into two disjoint categories of continuant and occurrent, the former consists of objects and spatial regions, the latter contains processes conceived as extended through (or spanning) time. BFO thereby seeks to consolidate both time and space within a single framework A guide to building BFO-conformant domain ontologies was published by MIT Press in 2015. In 2021, the standard ISO/IEC 21838-2:2021 Information Technology — Top-level Ontologies (TLO) — Part 2: Basic Formal Ontology (BFO) was published by the Joint Technical Committee of the International Standards Organization and the International Electrotechnical Commission. ISO/IEC 21838 is a multi-part standard. Part 1 of the standard specifies the requirements that must be met if an ontology is to be classified as a top-level ontology by the standard. == History == BFO arose against the background of research in ontologies in the domain of geospatial information science by David Mark, Pierre Grenon, Achille Varzi and others, with a special role for the study of vagueness and of the ways sharp boundaries in the geospatial and other domains are created by fiat. BFO has passed through four major releases. 2001: release of BFO 1 2007: release of BFO 1.1 2015: release of BFO 2.0 2020: release of BFO 2020 2021: release of BFO 2020 as an ISO/IEC Standard The current revision was released in 2020, and this forms the basis of the standard ISO/IEC 21838-2, which was released by the Joint Committee of the International Standards Organization and International Electrotechnical Commission in 2021. == Applications == BFO has been adopted as a foundational ontology by over 650 ontology projects, principally in the areas of biomedical ontology, security and defense (intelligence) ontology, and industry ontologies. Example applications of BFO can be seen in the Ontology for Biomedical Investigations (OBI). In January 2024, BFO and the Common Core Ontologies (CCO), a suite of BFO-extension ontologies, were adopted as the "baseline standards for formal DOD and IC ontology" development work in the DOD and Intelligence Community. A memorandum to this effect was signed by the chief data officers of the DOD, the Office of the Director of National Intelligence and the Chief Digital and Artificial Intelligence Office.

    Read more →
  • Very large database

    Very large database

    A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies. == Definition == The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements have continually changed what is considered very large. One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet". == Sizes of a VLDB database == There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded. == VLDB challenges == Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources. === Configuration === Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases. === Administration === The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases. === Availability and maintenance === When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable. ==== Backup and recovery ==== Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution. === Performance === Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (linear time); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (sub-linear time). Other effects can be caching becoming less efficient because proportionally less data can be cached and while some indexes such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt. Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase. Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine. ==== Partitioning ==== Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery., bulk movements due to information lifecycle management (ILM), reducing contention as well as allowing optimization of some query processing. === Storage === In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention, high throughput, and high availability. === Server resources === The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve. == Relationship to big data == VLDB is not the same as big data, but the storage aspect of big data may involve a VLDB database. That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional RDBMS's might encounter.

    Read more →
  • Zesta

    Zesta

    Zesta is an online food ordering and delivery platform operating across the African region. Formerly known as Square Eats, the company rebranded to Zesta in 2025. Zesta connects customers with restaurants and stores, offering delivery services for food, groceries, parcel delivery and other essentials. == History == Zesta was originally founded as Square Eats in 2020 by twin brothers Henry Newman and Randall Newman when they were 21 years old. It was launched in Gaborone, Botswana, and quickly gained traction as a leading food delivery service in the country. The company halted operations and took a strategic decision to reinvent the business in 2022. In 2025, the company announced its rebranding to Zesta, highlighting its commitment to evolving beyond food delivery to become a super app. === COVID-19 initiative === During the COVID-19 pandemic, Zesta (then Square Eats) implemented measures to ensure safety and hygiene, including providing free gloves and hand sanitizer to drivers and introducing contactless delivery options. These efforts positioned the platform as a trusted service during the pandemic. == Service == Zesta facilitates delivery from a wide range of merchant partners via a smartphone app, available on iOS and Android platforms, or through its website. Customers can browse their favorite restaurants, place orders, and have meals delivered to their doorstep efficiently.

    Read more →
  • TurboQuant

    TurboQuant

    TurboQuant is an online vector quantization algorithm for compressing high-dimensional Euclidean vectors while preserving their geometric structure. It was proposed in 2025 by Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni in the paper TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. The paper lists Zandieh and Mirrokni as affiliated with Google Research, Daliri with New York University, and Hadian with Google DeepMind. The method was developed for applications including large language model (LLM) inference, key–value (KV) cache compression, vector databases, and nearest neighbor search. TurboQuant consists of two related algorithms: TurboQuantmse, which is optimized for mean squared error (MSE), and TurboQuantprod, which is optimized for unbiased inner product estimation. The algorithm uses a random rotation of input vectors, applies scalar quantizers to the rotated coordinates, and, for inner-product estimation, applies a one-bit Quantized Johnson–Lindenstrauss (QJL) transform to the residual error. == Background == Vector quantization is a compression method that maps high-dimensional vectors to a finite set of codewords. The problem has roots in Shannon's source coding theory and rate–distortion theory. In machine learning and information retrieval, vector quantization is used to reduce the memory required to store embeddings, activation vectors, and other numerical representations. In Transformer-based large language models, the KV cache stores key and value vectors from previous tokens during autoregressive decoding. The size of this cache grows with context length, the number of attention heads, and the number of concurrent requests, making it a major memory bottleneck in LLM serving. Similar compression problems appear in vector search, where large collections of embedding vectors must be stored and searched efficiently. Earlier approaches to vector quantization include product quantization, scalar quantization, and data-dependent k-means codebook construction. The TurboQuant paper argues that many existing methods either require offline preprocessing and calibration or suffer from suboptimal distortion guarantees in online settings. == Algorithm == === TurboQuantmse === TurboQuantmse is the version of the algorithm optimized for mean-squared error. For a unit vector x ∈ S d − 1 {\displaystyle x\in S^{d-1}} , the algorithm first applies a random rotation matrix Π ∈ R d × d {\displaystyle \Pi \in \mathbb {R} ^{d\times d}} and sets z = Π x {\displaystyle z=\Pi x} . Each coordinate of the rotated vector follows a shifted and scaled beta distribution, which converges to a normal distribution in high dimensions. In high dimensions, distinct coordinates also become nearly independent, allowing the algorithm to apply scalar quantizers independently to each coordinate. The scalar quantizer is constructed by solving a one-dimensional continuous k-means or Lloyd–Max quantization problem. If the centroids are c 1 , c 2 , … , c 2 b {\displaystyle c_{1},c_{2},\ldots ,c_{2^{b}}} , the quantization step stores, for each coordinate, i d x j = ⁡ a r g m i n k ∈ [ 2 b ] | z j − c k | . {\displaystyle \mathrm {idx} _{j}=\operatorname {} {arg\,min}_{k\in [2^{b}]}|z_{j}-c_{k}|.} During dequantization, the stored index for each coordinate is replaced by the corresponding centroid, giving a reconstructed rotated vector z ~ {\displaystyle {\tilde {z}}} . The algorithm then rotates back: x ~ = Π ⊤ z ~ . {\displaystyle {\tilde {x}}=\Pi ^{\top }{\tilde {z}}.} The paper gives the following bound for TurboQuantmse: D m s e ≤ 3 π 2 ⋅ 1 4 b . {\displaystyle D_{\mathrm {mse} }\leq {\frac {\sqrt {3\pi }}{2}}\cdot {\frac {1}{4^{b}}}.} It also reports finer-grained MSE values of approximately 0.36, 0.117, 0.03, and 0.009 for bit-widths b = 1 , 2 , 3 , 4 {\displaystyle b=1,2,3,4} , respectively. === TurboQuantprod === TurboQuantprod is optimized for unbiased inner-product estimation. The authors note that an MSE-optimized quantizer may introduce bias when used to estimate inner products. To address this, TurboQuantprod first applies TurboQuantmse with bit-width b − 1 {\displaystyle b-1} , then applies a one-bit Quantized Johnson–Lindenstrauss transform to the remaining residual vector. Let r = x − Q m s e − 1 ( Q m s e ( x ) ) {\displaystyle r=x-Q_{\mathrm {mse} }^{-1}(Q_{\mathrm {mse} }(x))} be the residual after MSE quantization, and let γ = ‖ r ‖ 2 {\displaystyle \gamma =\|r\|_{2}} . The QJL step stores a sign vector for the residual. For γ ≠ 0 {\displaystyle \gamma \neq 0} , this can be written using the normalized residual u = r / γ {\displaystyle u=r/\gamma } : q j l = sign ⁡ ( S u ) , {\displaystyle qjl=\operatorname {sign} (Su),} where S ∈ R d × d {\displaystyle S\in \mathbb {R} ^{d\times d}} is a random projection matrix. Since the sign function is invariant under positive rescaling, this is equivalent to sign ⁡ ( S r ) {\displaystyle \operatorname {sign} (Sr)} when r ≠ 0 {\displaystyle r\neq 0} . If γ = 0 {\displaystyle \gamma =0} , the residual correction is zero. TurboQuantprod stores the MSE quantization, the QJL sign vector, and the residual norm: Q p r o d ( x ) = [ Q m s e ( x ) , q j l , γ ] . {\displaystyle Q_{\mathrm {prod} }(x)=\left[Q_{\mathrm {mse} }(x),qjl,\gamma \right].} The dequantized vector is reconstructed as x ~ = x ~ m s e + π / 2 d γ S ⊤ q j l . {\displaystyle {\tilde {x}}={\tilde {x}}_{\mathrm {mse} }+{\frac {\sqrt {\pi /2}}{d}}\,\gamma S^{\top }qjl.} The paper proves that TurboQuantprod is unbiased for inner-product estimation: E x ~ [ ⟨ y , x ~ ⟩ ] = ⟨ y , x ⟩ . {\displaystyle \mathbb {E} _{\tilde {x}}\left[\langle y,{\tilde {x}}\rangle \right]=\langle y,x\rangle .} It also gives the distortion bound D p r o d ≤ 3 π 2 ⋅ ‖ y ‖ 2 2 d ⋅ 1 4 b . {\displaystyle D_{\mathrm {prod} }\leq {\frac {\sqrt {3\pi }}{2}}\cdot {\frac {\|y\|_{2}^{2}}{d}}\cdot {\frac {1}{4^{b}}}.} == Performance and applications == The TurboQuant paper reports that the algorithm achieves near-optimal distortion rates within a small constant factor of information-theoretic lower bounds. The authors report that, for KV cache quantization, TurboQuant achieved quality neutrality at 3.5 bits per channel and marginal degradation at 2.5 bits per channel. In long-context LLM experiments using Llama 3.1 8B Instruct, the paper evaluated the method on a "needle-in-a-haystack" retrieval task with document lengths from 4,000 to 104,000 tokens. It reported that TurboQuant matched the uncompressed full-precision baseline while using more than 4× compression, and compared the method against PolarQuant, SnapKV, PyramidKV, and KIVI. Google Research stated that TurboQuant was evaluated on long-context benchmarks including LongBench, Needle in a Haystack, ZeroSCROLLS, RULER, and L-Eval using open-source models including Gemma and Mistral. According to a report in Tom's Hardware, Google described the method as reducing KV-cache memory by at least six times and achieving up to an eightfold improvement in attention-logit computation on Nvidia H100 GPUs compared with unquantized 32-bit keys. TurboQuant has also been applied to nearest-neighbor vector search. The original paper reports experiments on DBpedia entity embeddings and GloVe embeddings, comparing TurboQuant with product quantization and other vector-search quantization baselines. == Relationship to other methods == TurboQuant is related to several methods for efficient large language model inference and high-dimensional search: Product quantization – a vector quantization technique widely used for approximate nearest-neighbor search Quantization (machine learning) – reducing the numerical precision of weights, activations, or cached tensors in machine learning models PagedAttention – a memory-management algorithm for LLM serving that reduces fragmentation in the KV cache Johnson–Lindenstrauss lemma – a result in high-dimensional geometry used in random projection methods Lloyd's algorithm – an algorithm for scalar and vector quantization, including k-means-style codebook construction Unlike PagedAttention, which focuses on memory allocation and cache layout, TurboQuant reduces the numerical storage cost of the vectors themselves. Unlike many product-quantization methods, TurboQuant is designed to be data-oblivious and online, avoiding dataset-specific codebook training. == Limitations == The strongest performance claims for TurboQuant come from the original paper and Google Research's own publication. Coverage in technology media has noted that the broader impact of the method will depend on real-world implementation details, workloads, and hardware architectures.

    Read more →
  • Voiceverse NFT plagiarism scandal

    Voiceverse NFT plagiarism scandal

    In January 2022, 15—the pseudonymous Massachusetts Institute of Technology (MIT) artificial intelligence researcher and creator of the non-commercial generative artificial intelligence voice synthesis research project 15.ai—discovered that the blockchain-based technology company Voiceverse had plagiarized from their platform. Voiceverse marketed itself as a service that offered AI voice cloning technology that could be purchased and traded as non-fungible tokens (NFTs). Amid heightened controversy over NFTs in the gaming industry, voice actor Troy Baker (who has been described as one of the most famous voice actors in video games) announced his partnership with Voiceverse on January 14, 2022, triggering immediate backlash over concerns about the environmental impact of NFTs, potential for fraud, predatory monetization in video games, and the potential of AI displacing jobs for human voice actors. Later that same day, 15 revealed through server logs that Voiceverse had generated voice lines using 15's free text-to-speech platform, pitch-shifted the audio to make them unrecognizable, and falsely marketed the samples as their own technology before selling them as NFTs. Within an hour of being confronted with evidence, Voiceverse confessed and stated that their marketing team had used 15.ai without proper attribution while rushing to create a technology demo to coincide with Baker's partnership announcement, further exacerbating the already negative reception to the original announcement. In response, 15 replied "Go fuck yourself"; the interaction went viral and garnered a large amount of support for the developer. News publications universally characterized this incident as Voiceverse having "stolen" from 15.ai. The next day, Baker appeared on a podcast and stated that his motivation had been to help independent creators who were unable to afford professional voice actors. Following continued backlash and the plagiarism revelation, Baker ended his partnership with Voiceverse on January 31, 2022. Subsequently, the incident was documented in multiple AI ethics databases, criticisms of predatory monetization in video games, and retrospectives as one of the earliest instances of plagiarism and theft stemming from artificial intelligence during the AI boom. == Background == === Troy Baker === Troy Baker is a prominent voice actor in the video game industry best known for his performances as Joel Miller in The Last of Us franchise. Baker has been described as "ubiquitous" by Polygon, "one of the most high-profile and prolific voice actors in video games" by Eurogamer, and "arguably the most famous voice actor in the gaming industry" by GameGuru. His other prominent roles include voicing Agent John "Jonesy" Jones in Fortnite, Booker DeWitt in BioShock Infinite, and both Batman and Joker in multiple Batman video games. As of October 2025, Baker holds the record for the most acting nominations at the BAFTA Games Awards, with five between 2013 and 2021. === Voiceverse === Voiceverse is a blockchain-based startup founded by the Bored Ape Yacht Club that marketed itself as offering AI voice cloning technology in the form of NFTs. Prior to the announcement of their partnership with Baker, Voiceverse had partnered with LOVO, Inc., an AI voice platform that, according to LOVO, could generate human-like voices. Voiceverse stated that any user who purchases a voice NFT would have unlimited and perpetual access to the voice model, which could be used to create content such as audiobooks, YouTube videos, podcasts, e-learning materials, in-game voice chat, and Zoom calls. Voiceverse promised that buyers would "OWN [sic] all of the IP" of content they created using these voices. Voiceverse's roadmap included plans to release 8,888 initial voice NFTs, a feature to add emotions to existing voices, and the ability for users to mint their own voices as NFTs. Prior to Baker's partnership, Voiceverse had also partnered with voice actors Charlet Chung, who voices D.Va in Overwatch, and Andy Milonakis of The Andy Milonakis Show. === 15.ai === 15.ai is a free web application launched in 2020 that uses artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by a pseudonymous artificial intelligence researcher known as 15, who began developing the technology as a freshman during their undergraduate research at MIT, it was an early example of an application of generative artificial intelligence during the initial stages of the AI boom. The platform showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech; the name "15.ai" references the creator's statement that a voice can be convincingly cloned with just 15 seconds of audio, as opposed to the tens of hours of data previously required. 15.ai became an Internet phenomenon in early 2021 when content utilizing it went viral on social media and quickly gained widespread use among various Internet fandoms. 15 has emphasized that it remain free and non-commercial; it only requires users to give proper credit when using the service for content creation. === NFTs in the video game industry === By early 2022, NFTs had become highly controversial within the gaming industry. Critics raised concerns about their environmental impact due to the significant energy consumption of blockchain technology. In addition, the prevalence of scams, fraud, and potential money laundering associated with NFT sales, as well as fears that NFTs were a new form of predatory monetization following the increasing frequency of loot boxes, caused vocal pushback from the gaming community. Several major gaming companies had begun exploring NFT integration into their products, though fan backlash had already forced some projects to be cancelled. On December 16, 2021, the developers of S.T.A.L.K.E.R. 2: Heart of Chernobyl announced that they would be including NFTs in the game, but cancelled within an hour of the announcement due to immediate universal backlash. Simultaneously, the rise of AI voice technology raised concerns among voice actors about potential job displacement and the devaluation of their work amidst the voice acting industry's ongoing struggles for better compensation and working conditions. == Partnership announcement and backlash == On January 14, 2022, 1:02 a.m. EST, Baker announced on Twitter that he was partnering with Voiceverse "to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create." The announcement concluded with the statement "You can hate. Or you can create." Baker's specific role with Voiceverse remained unclear at the time of the announcement. Along with Baker's announcement, Voiceverse promoted their supposed voice AI technology on Twitter by posting animated videos that featured a cat character created by NFT firm Chubbiverse. The videos concluded with text that read "The Voice Powered By Voiceverse"; Voiceverse stated on Twitter that the voices in the animations had been generated using their own AI voice synthesis technology and presented the videos as a technology demonstration of their voice NFT capabilities. The announcement provoked immediate and widespread backlash from the gaming community. Baker's tweet received thousands of replies and quote retweets (the vast majority of which were negative), far more than the number of likes; Michael McWhertor of Polygon described it as a "textbook example of being ratioed" and commented that reactions had been amplified by the final part of Baker's announcement. Michael Beckwith of Metro called Baker's approach "bizarrely aggressive". Later that day, Baker responded to the backlash by apologizing for his choice of words. He said he appreciated people's thoughts and acknowledged that the "hate/create part might have been a bit antagonistic," calling it a "bad attempt to bring levity". Despite the apology, Baker and his fellow voice actors did not distance themselves from Voiceverse at this point. At the same time, Voiceverse attempted to address the criticisms, stating that they were working to move to more environmentally friendly blockchain technology and that voice actors would receive royalties from NFT sales, with actors benefiting from any increase in NFT value. == Plagiarism revelation == On December 13, 2021, amidst the increasingly negative reactions toward NFTs among the general public, the creator of 15.ai (known pseudonymously as 15) announced that they had "no interest in incorporating NFTs into any aspect of [their] work." On January 14, 2022, 11:17 a.m. EST (10 hours after Baker's initial announcement), 15 commented on the Voiceverse venture, stating that it "sounds like a scam". Two hours later, at 1:20 p.m., 15 explicitly accused Voiceverse of "actively attempting to appropriate [15's] work for [Voiceverse's] own benefit." 15 provided evidence through

    Read more →
  • Virtual directory

    Virtual directory

    In computing, the term virtual directory has a couple of meanings. It may simply designate (for example in IIS) a folder which appears in a path but which is not actually a subfolder of the preceding folder in the path. However, this article will discuss the term in the context of directory services and identity management. A virtual directory or virtual directory server (VDS) in this context is a software layer that delivers a single access point for identity management applications and service platforms. A virtual directory operates as a high-performance, lightweight abstraction layer that resides between client applications and disparate types of identity-data repositories, such as proprietary and standard directories, databases, web services, and applications. A virtual directory receives queries and directs them to the appropriate data sources by abstracting and virtualizing data. The virtual directory integrates identity data from multiple heterogeneous data stores and presents it as though it were coming from one source. This ability to reach into disparate repositories makes virtual directory technology ideal for consolidating data stored in a distributed environment. As of 2011, virtual directory servers most commonly use the LDAP protocol, but more sophisticated virtual directories can also support SQL as well as DSML and SPML. Industry experts have heralded the importance of the virtual directory in modernizing the identity infrastructure. According to Dave Kearns of Network World, "Virtualization is hot and a virtual directory is the building block, or foundation, you should be looking at for your next identity management project." In addition, Gartner analyst, Bob Blakley said that virtual directories are playing an increasingly vital role. In his report, “The Emerging Architecture of Identity Management,” Blakley wrote: “In the first phase, production of identities will be separated from consumption of identities through the introduction of a virtual directory interface.” == Capabilities == Virtual directories can have some or all of the following capabilities: Aggregate identity data across sources to create a single point of access. Create high-availability for authoritative data stores. Act as identity firewall by preventing denial-of-service attacks on the primary data stores through an additional virtual layer. Support a common searchable namespace for centralized authentication. Present a unified virtual view of user information stored across multiple systems. Delegate authentication to backend sources through source-specific security means. Virtualize data sources to support migration from legacy data stores without modifying the applications that rely on them. Enrich identities with attributes pulled from multiple data stores, based on a link between user entries. Some advanced identity virtualization platforms can also: Enable application-specific, customized views of identity data without violating internal or external regulations governing identity data. Reveal contextual relationships between objects through hierarchical directory structures. Develop advanced correlation across diverse sources using correlation rules. Build a global user identity by correlating unique user accounts across various data stores, and enrich identities with attributes pulled from multiple data stores, based on a link between user entries. Enable constant data refresh for real-time updates through a persistent cache. == Advantages == Virtual directories: Enable faster deployment because users do not need to add and sync additional application-specific data sources Leverage existing identity infrastructure and security investments to deploy new services Deliver high availability of data sources Provide application-specific views of identity data which can help avoid the need to develop a master enterprise schema Allow a single view of identity data without violating internal or external regulations governing identity data Act as identity firewalls by preventing denial-of-service attacks on the primary data-stores and providing further security on access to sensitive data Can reflect changes made to authoritative sources in real-time Leverages existing update processes of authoritative sources, so no separate (sometimes manual) process to update a central directory is needed Present a unified virtual view of user information from multiple systems so that it appears to reside in a single system Can secure all backend storage locations with a single security policy == Disadvantages == An original disadvantage is public perception of "push & pull technologies" which is the general classification of "virtual directories" depending on the nature of their deployment. Virtual directories were initially designed and later deployed with "push technologies" in mind, which also contravened with privacy laws of the United States. This is no longer the case. There are, however, other disadvantages in the current technologies. The classical virtual directory based on proxy cannot modify underlying data structures or create new views based on the relationships of data from across multiple systems. So if an application requires a different structure, such as a flattened list of identities, or a deeper hierarchy for delegated administration, a virtual directory is limited. Many virtual directories cannot correlate same-users across multiple diverse sources in the case of duplicate users Virtual directories without advanced caching technologies cannot scale to heterogeneous, high-volume environments. == Sample terminology == Unify metadata: Extract schemas from the local data source, map them to a common format, and link the same identities from different data silos based on a unique identifier. Namespace joining: Create a single large directory by bringing multiple directories together at the namespace level. For instance, if one directory has the namespace "ou=internal,dc=domain,dc=com" and a second directory has the namespace "ou=external,dc=domain,dc=com," then creating a virtual directory with both namespaces is an example of namespace joining. Identity joining: Enrich identities with attributes pulled from multiple data stores, based on a link between user entries. For instance if the user joeuser exists in a directory as "cn=joeuser,ou=users" and in a database with a username of "joeuser" then the "joeuser" identity can be constructed from both the directory and the database. Data remapping: The translation of data inside of the virtual directory. For instance, mapping “uid” to “samaccountname,” so a client application that only supports a standard LDAP-compliant data source is able to search an Active Directory namespace, as well. Query routing: Route requests based on certain criteria, such as “write operations going to a master, while read operations are forwarded to replicas.” Identity routing: Virtual directories may support the routing of requests based on certain criteria (such as write operations going to a master while read operations being forwarded to replicas). Authoritative source: A "virtualized" data repository, such as a directory or database, that the virtual directory can trust for user data. Server groups: Group one or more servers containing the same data and functionality. A typical implementation is the multi-master, multi-replica environment in which replicas process "read" requests and are in one server group, while masters process "write" requests and are in another, so that servers are grouped by their response to external stimuli, even though all share the same data. == Use cases == The following are sample use cases of virtual directories: Integrating multiple directory namespaces to create a central enterprise directory. Supporting infrastructure integrations after mergers and acquisitions. Centralizing identity storage across the infrastructure, making identity information available to applications through various protocols (including LDAP, JDBC, and web services). Creating a single access point for web access management (WAM) tools. Enabling web single sign-on (SSO) across varied sources or domains. Supporting role-based, fine-grained authorization policies Enabling authentication across different security domains using each domain’s specific credential checking method. Improving secure access to information both inside and outside of the firewall.

    Read more →
  • Celia (virtual assistant)

    Celia (virtual assistant)

    Celia is an artificially intelligent virtual assistant developed by Huawei for their latest HarmonyOS and Android-based EMUI smartphones that lack Google Services and a Google Assistant. The assistant can perform day-to-day tasks, which include making a phone call, setting a reminder and checking the weather. It was unveiled on 7 April 2020 and got publicly released on 27 April 2020 via an OTA update solely to selected devices that can update their software to EMUI 10.1. Huawei had initially referred to the new assistant in late 2019 by having announced that there would be an English version of their already 2018 Chinese speaker assistant—Xiaoyi—to be released into the European markets. Due to the on-going China–United States trade war, the company's newly released smartphones were left without any Google services, including the loss of Google Assistant. This subsequently led to the development and release of Celia. AI technology is integrated into the software of Celia, which allows it to translate text using a phones camera and to identify everyday objects — similar to that of Google Lens. == Features == Celia has many features that are similar to that of its rivals: the Google Assistant and Siri. It can be triggered by the words, 'Hey Celia' or be summoned by pressing and holding down on the power button. The default search engine for Celia is Bing, but this can be changed in settings. Celia can make calls, check the agenda, send a message, show the weather, set alarms and control home appliances. The assistant also has the ability to integrate itself with the stock apps of the EMUI software and toggle with the device's settings, such as by turning on the flashlight and playing multimedia content, but with the users command. With the AI that is installed in Celia, it can identify food, everyday objects and translate text using the phones camera. In China, Chinese Xiaoyi packs with an LLM model called PanGu-Σ 3.0 AI on HarmonyOS 4.0 major upgrade improvements from Celia, making the assistant smarter and more advanced compared to when it was launched in 2020 on EMUI handsets in China and internationally, surpassing Apple and Google by the being the first in the AI industry, with a dedicated AI system framework of APIs on the latest operating system that evolves to a complete large dedicated AI software stack called Harmony Intelligence of Pangu Embedded variant model and MindSpore AI framework with Neural Network Runtime on OpenHarmony-based HarmonyOS NEXT base system to replace the dual framework system with a single frame HarmonyOS 5.0 version by Q4 2024, first introduced on June 21, 2024, in Developer Beta 1 preview release at HDC 2024. == Availability by country and language == Currently, Celia is available only in German, English, French and Spanish, and has been released in Germany, the UK, France, Spain, Chile, Mexico and Colombia. Huawei has said, that there will be more regions and languages to come. == Compatible devices == Celia only became available with the EMUI 10.1 update that was released in April, which means that a limited number of devices are compatible with it. More devices will be added to the list throughout the coming months as Celia's availability increases. The current list is shown below: === Huawei P series === Huawei P50 (Pro) Huawei P40 (Lite, Pro & Pro+) Huawei P30 (Pro) === Huawei Mate series === Huawei Mate 40 Huawei Mate 30 (Lite, Pro & RS Porche Design) Huawei MatePad Pro Huawei Mate 20 (Pro, 20X 4G, 20X 5G and RS Porche Design) Huawei Mate X & Xs === Huawei Nova series === Huawei Nova 6 (Nova 6 5G & Nova 6 SE) Huawei Nova 5 (Nova 5 Pro, Nova 5i Pro & Nova 5Z) Huawei Nova Y60 === Huawei Enjoy series === Huawei Enjoy 10S == Issues == Technology news website Engadget has noted that when saying, 'Hey Celia', out aloud in the presence of an iPhone, Siri will respond along with Celia; this is apparently because 'Celia' sounds similar to 'Siri'.

    Read more →
  • Australian Geoscience Data Cube

    Australian Geoscience Data Cube

    The Australian Geoscience Data Cube (AGDC) is an approach to storing, processing and analyzing large collections of Earth observation data. The technology is designed to meet challenges of national interest by being agile and flexible with vast amounts of layered grid data. The AGDC reduces processing time of traditional image analysis by calibrating, pre-computing known extents, pixel alignment and storing metadata in a cell lattice structure. The temporal-pixel aligned data can often be analysed faster across space and time dimensions than previous scene based techniques. This allows the AGDC to be flexible in tackling future challenges and improve analysis times on every-increasing data repositories of earth observation. The AGDC has also been used internationally to allow countries to maintain ecologically sustainable programs and reduce the difficulty curve of utilizing Remote Sensing data. == Background == The AGDC was originally conceived by Geoscience Australia but is now maintained in a partnership between Geoscience Australia, Commonwealth Scientific and Industrial Research Organisation (CSIRO) and National Computational Infrastructure National Facility (Australia) (NCI). This is made possible by the funding from the partnership and a number of organisations such as National Collaborative Research Infrastructure Strategy (NCRIS). == Analysis ready data, ingestion and indexing == The data processed in the cube is made analysis ready before being ingested and indexed into the AGDC. Analysis ready data is pre-processed data that has applied corrections for instrument calibration (gains and offsets), geolocation (spatial alignment) and radiometry (solar illumination, incidence angle, topography, atmospheric interference). The ingestion process manages the translation of datasets into the storage units while maintaining a database index. The data within the storage and index can be accessed via API calls often compiled within code such as Python (programming language). Example: s2a_l1c = dc.load(product='s2a_level1c_granule',x=(147.36, 147.41), y=(-35.1, -35.15), measurements=['04','03','02'], output_crs='EPSG:4326', resolution=(-0.00025,0.00025)) === Datasets currently stored === Geoscience Australia Landsat Surface Reflectance (1987 to present) Landsat Pixel Quality Landsat Fractional Cover Landsat NDVI === Datasets that have been piloted === USGS Landsat Surface Reflectance SRTM DEM Himawari 8 MODIS Sentinel-2 L1C / S2A Australian Gridded Climate Data == Open source == The AGDC code base is situated in GitHub as an open repository. The core code base moved to the Open Data Cube in early 2017 as part of an international collaboration. Whilst the code base is the Open Data Cube, individual cubes exist as their own right such as the AGDC on the National Computational Infrastructure National Facility (Australia) (NCI) using the High-Performance Computing Cluster HPCC. The core code can be installed on personal computers or public computers (using git) and has many unit tests. Documentation for the code base exists on Read the Docs. == Challenges of the AGDC == The AGDC is designed to meet nationally significant challenges such as the following. Sustainability Environment Water resource management Disaster assist Policy development Community planning Forest preservation Carbon measurement == International awards == The AGDC won the 2016 Content Platform of the Year award from Geospatial World Forum.

    Read more →
  • Bartels–Stewart algorithm

    Bartels–Stewart algorithm

    In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation A X − X B = C {\displaystyle AX-XB=C} . Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of A {\displaystyle A} and B {\displaystyle B} to transform A X − X B = C {\displaystyle AX-XB=C} into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur algorithm. It remains a standard approach for solving Sylvester equations when X {\displaystyle X} is of small to moderate size. == The algorithm == Let X , C ∈ R m × n {\displaystyle X,C\in \mathbb {R} ^{m\times n}} , and assume that the eigenvalues of A {\displaystyle A} are distinct from the eigenvalues of B {\displaystyle B} . Then, the matrix equation A X − X B = C {\displaystyle AX-XB=C} has a unique solution. The Bartels–Stewart algorithm computes X {\displaystyle X} by applying the following steps: 1.Compute the real Schur decompositions R = U T A U , {\displaystyle R=U^{T}AU,} S = V T B T V . {\displaystyle S=V^{T}B^{T}V.} The matrices R {\displaystyle R} and S {\displaystyle S} are block-upper triangular matrices, with diagonal blocks of size 1 × 1 {\displaystyle 1\times 1} or 2 × 2 {\displaystyle 2\times 2} . 2. Set F = U T C V . {\displaystyle F=U^{T}CV.} 3. Solve the simplified system R Y − Y S T = F {\displaystyle RY-YS^{T}=F} , where Y = U T X V {\displaystyle Y=U^{T}XV} . This can be done using forward substitution on the blocks. Specifically, if s k − 1 , k = 0 {\displaystyle s_{k-1,k}=0} , then ( R − s k k I ) y k = f k + ∑ j = k + 1 n s k j y j , {\displaystyle (R-s_{kk}I)y_{k}=f_{k}+\sum _{j=k+1}^{n}s_{kj}y_{j},} where y k {\displaystyle y_{k}} is the k {\displaystyle k} th column of Y {\displaystyle Y} . When s k − 1 , k ≠ 0 {\displaystyle s_{k-1,k}\neq 0} , columns [ y k − 1 ∣ y k ] {\displaystyle [y_{k-1}\mid y_{k}]} should be concatenated and solved for simultaneously. 4. Set X = U Y V T . {\displaystyle X=UYV^{T}.} === Computational cost === Using the QR algorithm, the real Schur decompositions in step 1 require approximately 10 ( m 3 + n 3 ) {\displaystyle 10(m^{3}+n^{3})} flops, so that the overall computational cost is 10 ( m 3 + n 3 ) + 2.5 ( m n 2 + n m 2 ) {\displaystyle 10(m^{3}+n^{3})+2.5(mn^{2}+nm^{2})} . === Simplifications and special cases === In the special case where B = − A T {\displaystyle B=-A^{T}} and C {\displaystyle C} is symmetric, the solution X {\displaystyle X} will also be symmetric. This symmetry can be exploited so that Y {\displaystyle Y} is found more efficiently in step 3 of the algorithm. == The Hessenberg–Schur algorithm == The Hessenberg–Schur algorithm replaces the decomposition R = U T A U {\displaystyle R=U^{T}AU} in step 1 with the decomposition H = Q T A Q {\displaystyle H=Q^{T}AQ} , where H {\displaystyle H} is an upper-Hessenberg matrix. This leads to a system of the form H Y − Y S T = F {\displaystyle HY-YS^{T}=F} that can be solved using forward substitution. The advantage of this approach is that H = Q T A Q {\displaystyle H=Q^{T}AQ} can be found using Householder reflections at a cost of ( 5 / 3 ) m 3 {\displaystyle (5/3)m^{3}} flops, compared to the 10 m 3 {\displaystyle 10m^{3}} flops required to compute the real Schur decomposition of A {\displaystyle A} . == Software and implementation == The subroutines required for the Hessenberg-Schur variant of the Bartels–Stewart algorithm are implemented in the SLICOT library. These are used in the MATLAB control system toolbox. == Alternative approaches == For large systems, the O ( m 3 + n 3 ) {\displaystyle {\mathcal {O}}(m^{3}+n^{3})} cost of the Bartels–Stewart algorithm can be prohibitive. When A {\displaystyle A} and B {\displaystyle B} are sparse or structured, so that linear solves and matrix vector multiplies involving them are efficient, iterative algorithms can potentially perform better. These include projection-based methods, which use Krylov subspace iterations, methods based on the alternating direction implicit (ADI) iteration, and hybridizations that involve both projection and ADI. Iterative methods can also be used to directly construct low rank approximations to X {\displaystyle X} when solving A X − X B = C {\displaystyle AX-XB=C} .

    Read more →
  • Overcategorization

    Overcategorization

    Overcategorization or category clutter is a phenomenon during classification where too many categories or classes are assigned to a document, record, or item. Overcategorization is related to the library and information science (LIS) concepts of document classification and subject indexing. It is also related to online shopping where excessive product categories can overwhelm users with too many choices or make it more difficult for customers to find the products they need. Although these categories are intended to improve organization and ease of navigation when shipping online, too many categories can lower customer satisfaction, increase difficulty navigating the online store, and reduce future shopping intentions. In LIS, the ideal number of terms that should be assigned to classify an item are measured by the variables precision and recall. Assigning few category labels that are most closely related to the content of the item being classified will result in searches that have high precision, I.e., where a high proportion of the results are closely related to the query. Assigning more category labels to each item will reduce the precision of each search, but increase the recall, retrieving more relevant results. Related LIS concepts include exhaustivity of indexing and information overload. == Basic principles == If too many categories are assigned to a given document, the implications for users depend on how informative the links are. If the user is able to distinguish between useful and not useful links, the damage is limited: The user only wastes time selecting links. In many cases, however, the user cannot judge whether or not a given link will turn out to be fruitful. In that case he or she has to follow the link and to read or skim another document. The worst case scenario is, of course, that even after reading the new document the user is unable to decide whether or not it might be useful if its subject matter is not thoroughly investigated. Overcategorization also has another unpleasant implication: It makes the system (for example in Wikipedia) difficult to maintain in a consistent way. If the system is inconsistent, it means that when the user considers the links in a given category, he or she will not find all documents relevant to that category. Basically, the problem of overcategorization should be understood from the perspective of relevance and the traditional measures of recall and precision. If too few relevant categories are assigned to a document, recall may decrease. If too many non-relevant categories are assigned, precision becomes lower. The hard job is to say which categories are fruitful or relevant for future use of the document.

    Read more →
  • Central Equipment Identity Register

    Central Equipment Identity Register

    A Central Equipment Identity Register (CEIR) is a database of mobile equipment identifiers (IMEI – for networks of GSM standard, MEID – for networks of CDMA standard). Such an identifier is assigned to each SIM slot of the mobile device. Different kinds of IMEIs could be, White, for devices that are allowed to register in the cellular network; Black, for devices that are prohibited to register in the cellular network; and Grey, for devices in intermediate status (when it is not yet defined in which of the lists - black or white - the device should be placed). Depending on the rules of mobile equipment registration in a country the CEIR database may contain other lists or fields beside IMEI. For example, the subscriber number (MSISDN), which is bound to the IMEI, the ID of the individual (passport data, National ID, etc.) who registered IMEI in the database, details of the importer who brought the device into the country, etc. == History == Originally abbreviation CEIR stood for IMEI Database, created and provided by GSM Association. It was proposed to blacklist the IMEIs of stolen or lost phones. It was assumed that any MNO would be able to receive this list to block the registration of such devices on their network. Thus, it turns out that a stolen phone, once blacklisted by the GSMA CEIR, cannot be used on a large number of cellular networks, which means that the theft of mobile devices will become meaningless. However, it soon became clear that the MNOs on their initiative were not going to do this because if many phones stopped working in their networks, but works in another, it puts them at a disadvantage and can lead to an outflow of subscribers. It became clear that the blocking of stolen devices should be introduced simultaneously in all mobile networks of the country by legislative measures at the initiative of the communications regulator. In this case, as a rule, a national IMEI database is created, which contains general lists of blocked IMEIs. Since the registration in the cellular operator's network is directly blocked by a network node called EIR (Equipment Identity Register), the system that contains the national IMEI base became known as Central EIR (CEIR). To avoid confusion the database of GSM Association was renamed to IMEI Database - IMEI DB (it was in 2003-2008, see “Document History” at IMEI Database File Format Specification). Also sometimes a common IMEI database for several EIRs is called SEIR (Shared EIR). In each country, the CEIR can interact with IMEI DB differently. National CEIR may not communicate with IMEI DB at all. Firstly, it is separately decided whether CEIR will send information about its blacklist to IMEI DB (which IMEIs are placed in it or removed from there). Secondly, upon receipt of the blacklist from IMEI DB, the regulator decides from which countries it will receive it (IMEI DB stores the information exactly who blacklisted the IMEI). For example, you can get a list from neighboring countries, from countries in your region, from around the world. In addition to the blacklist, the GSMA is developing a list of IMEIs allocated to manufacturers for use in their devices. The manufacturer for each new device model gets at least one TAC (Type Allocation Code) allocated by GSMA, consisting of 8 digits, to which he can add a 6-digit serial number to obtain the IMEI. Thus, with one TAC, a manufacturer can release up to 1 million devices with a unique IMEI. Usually, CEIR receives a list of allocated TACs from the GSMA, since if the first 8 digits of the IMEI of a device are not in this list, this is a sign that it is counterfeit. If the central database of identifiers does not work with GSM networks, but with CDMA, then for the same purposes it is necessary to interact with another worldwide database that contains MEIDs – MEID Database. A system that directly blocks the registration of a mobile device on a cellular network – EIR. Each MNO must have at least one EIR, to which IMEI check requests (CheckIMEI) are sent when registering a device on the network. A typical EIR and CERI interaction scheme: The CEIR accumulates black, white, and grey lists using various data sources and verification methods. These lists are periodically transmitted to all EIRs. EIR uses them when processing every CheckIMEI request to determine whether to allow the device on the network or not. EIR can transmit some data to the CEIR database too. Usually, changes in a grey list – new IMEIs on the network that are not in any list – are transmitted from EIR to CEIR. In addition to synchronizing lists across multiple networks, the main function of CEIR is to implement the scenarios of changes at these lists. This usually requires interaction with various IT systems (databases) of other organizations and/or with subscribers. Еxamples of such scenarios: Whitelisting the IMEI of devices imported by the legal entity Whitelisting the IMEI of devices manufactured domestically Whitelisting the IMEI of devices imported by individual Blacklisting the IMEI of stolen/lost devices Binding IMEI to the subscriber's number and, vice versa, unbinding IMEI from the subscriber == System implementation results == The goals and results of CEIR implementation in a country are usually: Reducing mobile phone theft Reducing the import of devices stolen in other countries Reducing the presence of counterfeit devices on the market (null IMEI, incorrect IMEI, changed IMEI) Reducing illegal imports of mobile devices (increase in the collection of customs duties) Additionally, CEIR most often contributes to the solution of such problems: Combating various mobile fraud schemes Obtaining more accurate statistics on the state of the mobile communications market for the regulator Fight against terrorism (the ability to block the device at once in all mobile networks of the country). Known results achieved in some countries: Great Britain – reducing mobile phone theft. Turkey – reducing mobile phone theft, decreasing the current account deficit of Turkey and maximizing tax revenues. Uzbekistan – preventing black import of mobile devices by 98%, increase in revenues from the import of mobile devices by 700%. Kenya – disposing the market of counterfeit mobile equipment. Azerbaijan – disposing the market of counterfeit mobile equipment. Ukraine – increasing of legally imported mobile devices by 95%, increase in revenues from the import of mobile devices. == CEIR and EIR manufacturers == Some countries have used local developers to implement CEIR for their country (Great Britain, Turkey, India, and Azerbaijan). EIR is a system that is standardized in a 2G-5G networks. Such system may be established at mobile network even it doesn’t use black list and there are no CEIR in a country. Some developers of MNO’s signal core include EIR in a complex solution. However, its standard capabilities are usually lacking for specific requirements when implementing CEIR.

    Read more →
  • Storage area network

    Storage area network

    A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems. Newer SAN configurations enable hybrid SAN and allow traditional block storage that appears as local storage but also object storage for web services through APIs. == Storage architectures == Storage area networks (SANs) are sometimes referred to as network behind the servers and historically developed out of a centralized data storage model, but with its own data network. A SAN is, at its simplest, a dedicated network for data storage. In addition to storing data, SANs allow for the automatic backup of data, and the monitoring of the storage as well as the backup process. A SAN is a combination of hardware and software. It grew out of data-centric mainframe architectures, where clients in a network can connect to several servers that store different types of data. To scale storage capacities as the volumes of data grew, direct-attached storage (DAS) was developed, where disk arrays or just a bunch of disks (JBODs) were attached to servers. In this architecture, storage devices can be added to increase storage capacity. However, the server through which the storage devices are accessed is a single point of failure, and a large part of the LAN network bandwidth is used for accessing, storing and backing up data. To solve the single point of failure issue, a direct-attached shared storage architecture was implemented, where several servers could access the same storage device. DAS was the first network storage system and is still widely used where data storage requirements are not very high. Out of it developed the network-attached storage (NAS) architecture, where one or more dedicated file server or storage devices are made available in a LAN. Therefore, the transfer of data, particularly for backup, still takes place over the existing LAN. If more than a terabyte of data was stored at any one time, LAN bandwidth became a bottleneck. Therefore, SANs were developed, where a dedicated storage network was attached to the LAN, and terabytes of data are transferred over a dedicated high speed and bandwidth network. Within the SAN, storage devices are interconnected. Transfer of data between storage devices, such as for backup, happens behind the servers and is meant to be transparent. In a NAS architecture data is transferred using the TCP and IP protocols over Ethernet. Distinct protocols were developed for SANs, such as Fibre Channel, iSCSI, Infiniband. Therefore, SANs often have their own network and storage devices, which have to be bought, installed, and configured. This makes SANs inherently more expensive than NAS architectures. == Components == SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays, JBODs and tape libraries. === Host layer === Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN. Such servers have host adapters, which are cards that attach to slots on the server motherboard (usually PCI slots) and run with a corresponding firmware and device driver. Through the host adapters the operating system of the server can communicate with the storage devices in the SAN. In Fibre channel deployments, a cable connects to the host adapter through the gigabit interface converter (GBIC). GBICs are also used on switches and storage devices within the SAN, and they convert digital bits into light impulses that can then be transmitted over the Fibre Channel cables. Conversely, the GBIC converts incoming light impulses back into digital bits. The predecessor of the GBIC was called gigabit link module (GLM). === Fabric layer === The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables. SAN network devices move data within the SAN, or between an initiator, such as an HBA port of a server, and a target, such as the port of a storage device. When SANs were first built, hubs were the only devices that were Fibre Channel capable, but Fibre Channel switches were developed and hubs are now rarely found in SANs. Switches have the advantage over hubs that they allow all attached devices to communicate simultaneously, as a switch provides a dedicated link to connect all its ports with one another. When SANs were first built, Fibre Channel had to be implemented over copper cables, these days multimode optical fibre cables are used in SANs. SANs are usually built with redundancy, so SAN switches are connected with redundant links. SAN switches connect the servers with the storage devices and are typically non-blocking allowing transmission of data across all attached wires at the same time. SAN switches are for redundancy purposes set up in a meshed topology. A single SAN switch can have as few as 8 ports and up to 32 ports with modular extensions. So-called director-class switches can have as many as 128 ports. In switched SANs, the Fibre Channel switched fabric protocol FC-SW-6 is used under which every device in the SAN has a hardcoded World Wide Name (WWN) address in the host bus adapter (HBA). If a device is connected to the SAN its WWN is registered in the SAN switch name server. In place of a WWN, or worldwide port name (WWPN), SAN Fibre Channel storage device vendors may also hardcode a worldwide node name (WWNN). The ports of storage devices often have a WWN starting with 5, while the bus adapters of servers start with 10 or 21. === Storage layer === The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fibre Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fibre Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available. The various storage devices in a SAN are said to form the storage layer. It can include a variety of hard disk and magnetic tape devices that store data. In SANs, disk arrays are joined through a RAID which makes a lot of hard disks look and perform like one big storage device. Every storage device, or even partition on that storage device, has a logical unit number (LUN) assigned to it. This is a unique number within the SAN. Every node in the SAN, be it a server or another storage device, can access the storage by referencing the LUN. The LUNs allow for the storage capacity of a SAN to be segmented and for the implementation of access controls. A particular server, or a group of servers, may, for example, be only given access to a particular part of the SAN storage layer, in the form of LUNs. When a storage device receives a request to read or write data, it will check its access list to establish whether the node, identified by its LUN, is allowed to access the storage area, also identified by a LUN. LUN masking is a technique whereby the host bus adapter and the SAN software of a server restrict the LUNs for which commands are accepted. In doing so LUNs that should never be accessed by the server are masked. Another method to restrict server access to particular SAN storage devices is fabric-based access control, or zoning, which is enforced by the SAN networking devices and servers. Under zoning, server access is restricted to storage devices that are in a particular SAN zone. == Network protocols == A mapping layer to other protocols is used to form a network: ATA over Ethernet (AoE), mapping of AT Attachment (ATA) over Ethernet Fibre Channel Protocol (FCP), a mapping of SCSI over Fibre Channel Fibre Channel over Ethernet (FCoE) ESCON over Fibre Channel (FICON), used by mainframe computers HyperSCSI, mapping of SCSI over Ethernet iFCP or SANoIP mapping of FCP over IP iSCSI, mapping of SCSI over TCP/IP iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Network block device, mapping device node requests on UNIX-like systems over stream sockets like TCP/IP SCSI RDMA Protocol (SRP), another SCSI implementation for remote direct memory access (RDMA) transports Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Para

    Read more →
  • Virtual data room

    Virtual data room

    A virtual data room (sometimes called a VDR or Deal Room) is an online repository of information that is used for the storing and distribution of documents. In many cases, a virtual data room is used to facilitate the due diligence process during an M&A transaction, loan syndication, or private equity and venture capital transactions. This due diligence process has traditionally used a physical data room to accomplish the disclosure of documents. For reasons of cost, efficiency and security, virtual data rooms have widely replaced the more traditional physical data room. A virtual data room is an extranet to which the bidders and their advisers are given access via the internet. An extranet is essentially a website with limited controlled access, using a secure log-on supplied by the vendor, which can be disabled at any time, by the vendor, if a bidder withdraws. Much of the information released is confidential and restrictions are applied to the viewer's ability to release this to third parties (by means of forwarding, copying or printing). This can be effectively applied to protect the data using digital rights management. The virtual data room provides access to secure documents for authorized users through a dedicated web site, or through secure agent applications. In the process of mergers and acquisitions the data room is set up as part of the central repository of data relating to companies or divisions being acquired or sold. The data room enables the interested parties to view information relating to the business in a controlled environment where confidentiality can be preserved. Conventionally this was achieved by establishing a supervised, physical data room in secure premises with controlled access. In most cases, with a physical data room, only one bidder team can access the room at a time. A virtual data room is designed to have the same advantages as a conventional data room (controlling access, viewing, copying and printing, etc.) with fewer disadvantages. Due to their increased efficiency, many businesses and industries have moved to using virtual data rooms instead of physical data rooms. In 2006, a spokesperson for a company which sets up virtual deal rooms was reported claiming that the process reduced the bidding process by about thirty days compared to physical data rooms. In the process of startup fundraising, a virtual data room is set up to be a central location for key data, documents, and financials. These are shared with venture capital and angel investors and allows them to streamline due diligence. == Application == Any business dealing with private data can apply VDRs when secure transaction processing is required. This includes financial institutions that need to negotiate confidential customer information without involving third parties. VDRs have traditionally been used for IPOs and real estate asset management. Technology companies may use them to exchange and review code or confidential data needed for operations. The same is true for clients, who entrust their valuable code only to the most qualified people in the organisation. The code is not something that can be printed out and brought in a folder. It resides on a computer and must be used together. VDR can find application in any business that manages data in the form of documents, especially law firms, financial advisers or the B2B sector. The latter work with documents that must always be handled and controlled confidentially, and it is difficult to store them securely when they are on a server that other people can access. In addition, in B2B, it is important to close the deal as quickly as possible: the average sales cycle is one to three months. VDR can be compared to a locked filing cabinet where all those folders and documents are kept. It automates the mathematics of pricing to prevent revenue leakage, and initially integrates CRM to ensure accurate synchronisation of all account data, which is important for B2B in particular and sales in general. While virtual data rooms offer many advantages, they are not suitable for every industry. For example, some governments may decide to continue using physical data rooms for highly confidential information sharing. The damage from potential cyberattacks and data breaches exceeds the benefits offered by virtual data rooms. In such cases, the use of VDRs is not considered. Data breaches have particularly affected the US healthcare system from March 2021 to March 2022 - according to IBM Security the cost of the breach was a record high of $10.1 million.

    Read more →