AI Code Ui

AI Code Ui — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI content watermarking

    AI content watermarking

    AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window

    Read more →
  • Google Cloud Dataflow

    Google Cloud Dataflow

    Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. Dataflow provides a fully managed service for executing Apache Beam pipelines, offering features like autoscaling, dynamic work rebalancing, and a managed execution environment. Dataflow is suitable for large-scale, continuous data processing jobs, and is one of the major components of Google's big data architecture on the Google Cloud Platform. At its core, Dataflow's architecture is designed to abstract away infrastructure management, allowing developers to focus purely on the logic of their data processing tasks. When a pipeline written using the Apache Beam SDK is submitted, Dataflow translates this high-level definition into an optimized job graph. The service then provisions and manages a fleet of Google Compute Engine workers to execute this graph in a highly parallelized and fault-tolerant manner. This serverless approach, combined with intelligent autoscaling of both the number of workers (horizontal) and the resources per worker (vertical), ensures that jobs have the precise amount of computational power needed at any given time, optimizing both performance and cost. The service's deep integration with the Google Cloud ecosystem makes it a powerful tool for a variety of use cases beyond simple data movement. For real-time analytics, Dataflow can ingest unbounded streams of data from Cloud Pub/Sub, perform complex transformations, and load results into BigQuery for immediate querying. In machine learning workflows, it is commonly used to preprocess and transform massive datasets stored in Cloud Storage, preparing them for training models in Vertex AI. This versatility makes it the central processing engine for modern ETL (Extract, Transform, Load) operations, streaming analytics, and large-scale data preparation within the cloud. == History == Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in April, 2015. In January, 2016 Google donated the underlying SDK, the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. The donated code formed the original basis for Apache Beam. In August 2022, there was an incident where user timers were broken for certain Dataflow streaming pipelines in multiple regions, which was later resolved. Throughout 2023 and 2024, there have been various other updates and incidents affecting Google Cloud Dataflow, as documented in the release notes and service health history. The donation of the Dataflow SDK to the Apache Software Foundation was a pivotal moment, establishing Apache Beam as a unified, open-source programming model for defining both batch and streaming data pipelines. This strategic move decoupled the pipeline definition from the execution engine. As a result, developers could write portable data processing logic that was not locked into Google's ecosystem. A Beam pipeline can be executed on various runners, including Apache Flink, Apache Spark, and, of course, the highly optimized Google Cloud Dataflow service, providing flexibility and future-proofing data processing investments. == Features == Google Cloud Dataflow supports both batch and streaming data processing pipelines. It automatically handles resource provisioning, data sharding, and scaling according to workload, reducing manual configuration needed for large-scale data operations. == Use cases == Dataflow is used for ETL (Extract, Transform, Load) data pipelines, real-time analytics, and event stream processing for companies in industries such as finance, advertising, and IoT.

    Read more →
  • IQTELL

    IQTELL

    IQTELL was a productivity app that allowed users to manage email, tasks, projects, calendars, contacts, Evernotes and more in a single app. IQTELL was available as a web app, as well as an iOS and Android app. All user information was automatically synced between all devices. iOS and Android apps supported offline access. The app could be used to implement concepts and techniques described in the book Getting Things Done by David Allen. == History == IQTELL was created by Ran Flam and released in 2013. In 2014, mobile apps for iOS and Android were released. In 2015, Premium and Platinum subscription plans were introduced (while maintaining the free user version). In April 2017, a new web app was launched. On July 31, 2017, all IQTell services have been closed. == Productivity methods == IQTell was designed to fit in with the Getting Things Done (GTD) productivity methods. Users may have had utilized GTD lists, such as Inbox, Actions, Projects, Someday, Ticklers, and Reference information to process their Inbox items into relevant GTD lists. Using the web app, iOS and/or Android apps, users could deploy macros/shortcuts to quickly process their email. Email was turned into tasks (actions), projects, etc. The original email was removed from the email inbox. The email became a part of the items created (e.g. actions, project, etc.) and could also be viewed in the All Mail folder (if Gmail), or the Archive folder (if non-Gmail). Users had flexibility to use the out-of-the-box macros/shortcuts as well as edit/create additional macros. IQTELL features included email, calendars, contacts, list management, sharing and collaboration with team members. All of the features were compatible with commonly used organization software such as Evernote and iCloud.

    Read more →
  • Bring your own encryption

    Bring your own encryption

    Bring your own encryption (BYOE), also known as bring your own key (BYOK), is a cloud computing security model that allows cloud service customers to use their own encryption software and manage their own encryption keys. == Overview == BYOE enables cloud service customers to utilize a virtual instance of their encryption software alongside their cloud-hosted business applications to encrypt their data. In this model, hosted business applications are configured to process all data through the encryption software. This software then writes the ciphertext version of the data to the cloud service provider's physical data store and decrypts ciphertext data upon retrieval requests. This approach provides enterprises with control over their keys and the ability to generate their own master key using internal hardware security modules (HSM), which are then transmitted to the cloud provider's HSM. When the data is no longer needed, such as when users discontinue the cloud service, the keys can be deleted, rendering the encrypted data permanently inaccessible. This practice is known as crypto-shredding. == Potential Advantages == Organizations can store data with unique encryption that only they can access. Multiple organizations can share the same hardware infrastructure via cloud services like Amazon Web Services (AWS) or Google Cloud while maintaining encryption to comply with regulations such as HIPAA. == Potential Challenges == Resource utilization may be higher compared to traditional encryption practices when multiple users share the same hardware and use their own encryption. Efforts to minimize resource utilization issues may potentially impact security benefits.

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • OpenFog Consortium

    OpenFog Consortium

    The OpenFog Consortium (sometimes stylized as Open Fog Consortium) was a consortium of high tech industry companies and academic institutions across the world aimed at the standardization and promotion of fog computing in various capacities and fields. The consortium was founded by Cisco Systems, Intel, Microsoft, Princeton University, Dell, and ARM Holdings in 2015 and now has 57 members across the North America, Asia, and Europe, including Forbes 500 companies and noteworthy academic institutions. The OpenFog consortium merged with the Industrial Internet Consortium, now the Industry IoT Consortium, on January 31, 2019. == History == OpenFog was created on November 19, 2015, by ARM Holdings, Cisco Systems, Dell, Intel, Microsoft, and Princeton University. The idea for a consortium centered on the advancement and dissemination of fog computing was thought up by Helder Antunes, a Cisco executive with a history in IoT, Mung Chiang, then a Princeton University professor and now President of Purdue University, and Dr. Tao Zhang, a Cisco Distinguished Engineer and CIO for the IEEE Communications Society then and now a manager at the National Institute of Standards and Technologies (NIST). The project was executed from concept to launch by Armando Pereira at PVentures Consulting, a Silicon Valley–based high-tech consulting firm. OpenFog released its reference architecture for fog computing on February 13, 2017. The Fog World Congress 2017, with Dr. Tao Zhang as its General Chair, was hosted in October 2017 by OpenFog, in conjunction with the IEEE Communications Society, as the first congress devoted to fog computing. == Administration == The OpenFog Consortium was governed by its board of directors, which is chaired by Cisco Senior Director Helder Antunes. The board of directors is made up of 11 seats, each representing one of the following companies and institutions: ARM, AT&T, Cisco, Dell, Intel, Microsoft, Princeton University, IEEE, GE, ZTE and Shanghai Tech University. The consortium's general membership comprised 13 academic members: Aalto University, Arizona State University, California Institute of Technology, Georgia State University, National Chiao Tung University, National Taiwan University, Shanghai Research Centre for Wireless Communication, Chinese University of Hong Kong, University of Colorado Boulder, University of Southern California, University of Pisa, Vanderbilt University, Wayne State University, and 20 additional members: Hitachi, Internet Initiative Japan, Itochu, Kii, Nebbiolo, PrismTech, NEC, NGD Systems, NTT Communications, OSIsoft, Real-time Innovations, relayr, Sakura Internet, Stichting imec Nederland, Toshiba, TTT Tech, Fujitsu, FogHorn Systems, TTTech and MARSEC. == Published work == The OpenFog Consortium published the white paper, "OpenFog Reference Architecture". This document outlines the eight pillars of an OpenFog architecture:Security; Scalability; Open; Autonomy; Programmability; RAS (reliability, availability and serviceability); Agility; and Hierarchy. It also incorporates a glossary for fog computing terms. In July 2018, the IEEE Standards Association announced it had adopted the OpenFog Reference Architecture as the first standard for fog computing.

    Read more →
  • Apache CarbonData

    Apache CarbonData

    Apache CarbonData is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in Hadoop namely RCFile and ORC. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == CarbonData was developed at Huawei in 2013. The project was donated to the Apache Community in 2015 submitted to the Apache Incubator in June 2016. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category. Apache CarbonData has been a top-level Apache Software Foundation (ASF)-sponsored project since May 1, 2017.

    Read more →
  • Orleans (software framework)

    Orleans (software framework)

    Orleans is a cross-platform software framework for building scalable and robust distributed interactive applications based on the .NET Framework or on the more recent .NET. == Overview == Orleans was originally created by the eXtreme Computing Group at Microsoft Research and introduced the virtual actor model as a new approach to building distributed systems for the cloud. Orleans scales from a single on-premises server to highly-available and globally distributed applications in the cloud. The virtual actor model is based on the actor model but has several differences: A virtual actor always exists, it cannot be explicitly created or destroyed. Virtual actors are automatically instantiated. If a server hosting an actor crashes, the next message sent to the actor causes it to be reinstantiated automatically. The server that an actor is on is transparent to the application code. Orleans can automatically create multiple instances of the same stateless actor. Starting with cloud services for the Halo franchise, the framework has been used by a number of cloud services at Microsoft and other companies since 2011. The core Orleans technology was transferred to 343 Industries and is available as open source since January 2015. The source code is licensed under MIT License and hosted on GitHub. Orleans runs on Microsoft Windows, Linux, and macOS and is compatible with .NET Standard 2.0 and above. == Features == Some Orleans features include: Persistence Distributed ACID transactions Streams Timers & Reminders Fault tolerance == Related implementations == The Electronic Arts BioWare division created Project Orbit. It is a Java implementation of virtual actors that was heavily inspired by the Orleans project.

    Read more →
  • Fling (social network)

    Fling (social network)

    Fling was a social media app available for IOS and Android. It was founded in 2014 by Marco Nardone and was taken offline in August 2016. == Overview == In 2012, Marco Nardone founded the startup Unii and launched Unii.com, a social network intended for students in the UK. While working on this service, Nardone had the idea for a messaging service where pictures could be sent to strangers in January 2014. The app Fling was then developed and released between March and July 2014. After a month, it already had 375,000 downloads and 180,000 active users on iOS. Users were able to take pictures inside the app and send them to 50 random people all over the world. The recipient could then choose to answer via chat or reply by sending a picture themselves. The app was used by many users as a medium to exchange sexually explicit pictures and for sexting with strangers. This led to the app being removed from the App Store in June 2015. In the 19 days that followed, flings developers rewrote the App almost completely from scratch, working around the clock. The feature to message random strangers was removed, and the app was readmitted into the App Store as a messenger App resembling Snapchat. But the redesigned Application did not have the success of its predecessor. The funding ran out and the parent company Unii went bankrupt. The company was not able to pay their content moderation team anymore, leading to a new surge of pornographic content on the App. Shortly after that, the Social Network was taken offline in August 2016. It has been inactive since. During the 2 years Fling was online, $21 million was raised from investors while generating no revenue at all. Of this $21 million (£16.5m), £5 million came from Nardone's father. == Allegations against CEO == Former employees made multiple allegations against Marco Nardone, the Founder and CEO of Unii and Fling. According to these claims, he behaved erratic and abusive, throwing "things across the office". He hired his girlfriend as the head of human resources to handle issues between him and his staff. Employees who left the company often had "some part of their pay held back". According to the reports, he also spent the money raised from investors irresponsibly, having no clear concept of a budget. Some of that money was used on expensive restaurants in London, a luxurious office for CEO Nardone and advertisements for Fling on Twitter and Facebook. Nardone also spent time partying in Ibiza with two employees, while the developer team in London frantically tried to get Fling back online after it being removed from the App Store. In December 2017 he pleaded guilty to assaulting his girlfriend at a domestic violence court.

    Read more →
  • SAP Cloud Infrastructure

    SAP Cloud Infrastructure

    SAP Cloud Infrastructure is an SAP-operated IaaS cloud platform, used to run SAP’s cloud business and customer-facing deployments for SAP and non-SAP workloads. It is developed and operated with open-source technologies within SAP’s data center network, based on OpenStack and Kubernetes and supporting SAP S/4HANA and general-purpose applications. It offers compute, storage, and platform services that are accessible to SAP customers. == History == In 2012, SAP promoted aspects of cloud computing. In October 2012, SAP announced a platform as a service called the SAP Cloud Platform. In May 2013, a managed private cloud called the S/4HANA Enterprise Cloud service was announced. SAP Converged Cloud was announced in January 2015. SAP Converged Cloud was originally developed as SAP's internal standardized Infrastructure as a Service (IaaS) offering to support SAP’s cloud solutions. Originating from SAP Converged Cloud, SAP Cloud Infrastructure was developed and announced as SAP’s cloud computing offering that is provided for both SAP and customer workloads. In 2025, it had a global footprint of 15 regions and 29 data centers, encompassing more than 200,000 active VMs and over 6,000 hypervisors. In September 2025, SAP announced an expansion of its European “SAP Sovereign Cloud” portfolio, explicitly naming SAP Cloud Infrastructure (alongside SAP Sovereign Cloud On-Site) as part of the stack positioned for public sector and regulated environments. == Services and Features == SAP Cloud Infrastructure (SCI) is an infrastructure-as-a-service (IaaS) offering by SAP that provides virtual compute, storage, and networking services, together with identity, key management, and operational services. SCI follows a self-service model and is managed via APIs and a web-based user interface. === Compute === SCI provides virtual machine instances that can be provisioned from operating system images and selected in predefined sizes (“flavors”). It supports lifecycle operations such as create/modify/resize/delete, power control, and snapshots; instances can be organized into server groups to influence placement policies. === Storage === SCI provides persistent storage services including: Block storage (virtual volumes) with attach/detach to instances, online expansion, cloning, snapshots, and provisioning volumes from images or snapshots. Object storage (containers and objects) managed via API/CLI with access control lists (ACLs) and configurable redundancy options. File storage (shared file systems) with access controls, online resize, snapshots/restore, and replication across availability zones. === Networking === SCI provides software-defined networking (SDN) for tenant networks (networks, subnets, routers) and connectivity features such as floating IPs for public reachability. Network security controls include security groups and firewall policies; connectivity options include BGP-based VPN networking. === Load balancing and DNS === SCI includes managed load balancing for distributing traffic across backend instances and an authoritative DNS service (DNSaaS) with API-based management of DNS zones and records, including options for zone sharing/transfer across projects/tenants and service integrations for automated record creation. === Identity, access, and key management === SCI includes identity and access management for authentication/authorization in projects/tenants (for example token handling, role assignment, and credential management) and key/secrets management for storing and controlling access to secret material such as keys and certificates, including support for different backends (depending on configuration). === Cloud-native services === SCI includes a container image registry (image push/pull, access policies, and lifecycle controls) and an auto-scaling capability for file shares based on configurable rules. === Observability and audit === SCI includes metrics and audit logging capabilities for operational monitoring and for listing/filtering audit-relevant events across services. === Availability and service levels === SCI documentation describes availability-related features such as load balancing, storage redundancy options, and replication for file shares across availability zones. SAP cloud services are governed by contractual service-level agreements (SLA); SAP Cloud Infrastructure references an SLA supplement defining infrastructure-specific terms when referenced in order forms. === SAP cloud services === SAP cloud services can run on different underlying infrastructures, including SAP Cloud Infrastructure in addition to SAP NS2 or hyperscalers. SAP cloud solutions available on SAP Cloud Infrastructure include SAP Cloud ERP, SAP HCM, SAP Solutions for Spend Management, Supply Chain Management, Business Transformation Management, and SAP Business Technology Platform (including related analytics and business data solutions). For example, SAP HANA Cloud documentation lists SAP Cloud Infrastructure as one of the supported infrastructures alongside hyperscalers. === Sustainability === SAP describes sustainability initiatives for its data centers, including energy-efficient infrastructure (for example, advanced cooling systems and power management), renewable electricity usage where feasible, and operational practices such as recycling electronic waste and minimizing water usage. SAP also references environmental management and energy management standards such as ISO 14001 and ISO 50001 for its data center operations. SAP-owned data centers run with 100% renewable electricity and that renewable electricity has been used since 2014 to power SAP facilities including owned data centers and co-locations. == SAP Cloud Infrastructure for SAP Sovereign Cloud == SAP Sovereign Cloud is a portfolio of SAP solutions designed to help organizations adopt SAP cloud solutions such as the SAP Cloud ERP while maintaining control over data, infrastructure, and compliance in line with local laws and regulations. The portfolio offers multiple deployment options, including SAP Cloud Infrastructure and SAP Sovereign Cloud On-Site, alongside sovereign hyperscaler-based options such as via SAP NS2, and targets customers such as public-sector bodies and other highly regulated organizations. In Europe, SAP Cloud Infrastructure is an Infrastructure-as-a-Service (IaaS) deployment option within SAP Sovereign Cloud for SAP and customer / third party workloads, operated on SAP’s data center network and developed using open-source technologies, with customer data stored within the European Union. Sovereignty-related characteristics for the SAP Cloud Infrastructure include: EU footprint and ownership model: SAP-operated data centers in Germany include sites in St. Leon-Rot and Walldorf, and co-location sites in Frankfurt. EU AI Cloud: EU AI Cloud is a sovereign AI offering for Europe that provides secure, compliant environments for building and running AI, including governed access to auditable large language models from SAP and partners. It offers AI models on the SAP Cloud Infrastructure and SAP Business Technology Platform (SAP BTP), enabling deployment of AI applications and models on high-performance European infrastructure (including accelerator/GPU-based compute for AI workloads). Availability zones and secure interconnect: Three availability zones in three independent data centers in Germany, connected via SAP-owned fiber on SAP-owned property. Facility and security standards: ISO/IEC 27001 governance of delivery and operations of SAP cloud services and SAP-owned data centers. Additional facility and availability standards: EN 50600 availability class 3 (European data centre standard) and/or ISO/IEC 22237 availability class 3 (international equivalent). Technology foundation: Based on open-source cloud infrastructure framework (OpenStack) and Kubernetes, without dependencies on hyperscaler technologies. Sovereignty controls: Data sovereignty (data residency), operational sovereignty (administration and maintenance restricted to approved, security-cleared personnel), technical sovereignty (locally hosted control planes with separation via encryption or dedicated infrastructure), and legal sovereignty (use of locally based legal entities or those in approved countries). Classified information processing: Roadmap to meet high and very high requirements for handling classified or sensitive information under European regulatory and security regimes. Public-sector readiness and EU sovereignty assurance levels: Implemented to meet SEAL-3 (Digital Resilience) and SEAL-4 (Full Digital Sovereignty) of the European Commission’s Cloud Sovereignty Framework. Staffing constraints: Operations model selectable to restrict sensitive operations to vetted personnel from EU or NATO countries.

    Read more →
  • MetroHero

    MetroHero

    MetroHero is a semi-defunct real-time transit tracking and performance analysis application for the Washington Metro rapid transit system. Originally available on iOS, Android, and the web, it allows users to view live maps of all trains on a specific line, summary statistics relating to real-time system performance, and user feedback on current Metro conditions. The app launched in 2015, followed by ARIES for Transit, a related project from the same developers, and continued functioning until its original developers shut it down in 2023. Afterwards, forks of the application went live to allow for its continued public use, and the Washington Metropolitan Area Transit Authority (WMATA), Metro's operator, announced that it would launch a similar app. The app has been described by local news media as popular and well-liked among Washington, D.C.-area residents. == History and main development == MetroHero was initially developed by James and Jennifer Pizzurro, who both attended George Washington University and studied computer science. They said that they were inspired to create the app after experiencing train delays and searching for an app to track a train after boarding; such an app did not exist for the Washington Metro. The development of the app was not endorsed by WMATA, but it did use publicly available data from the agency. MetroHero launched as an Android application in September 2015, followed by the release of an iOS-compatible web app in December of that year. A standalone iOS app launched in April 2018, but the web app remained supported. By April 2018, MetroHero had approximately 13,000 monthly active users. James Pizzurro has stated that the app's intended audience was regular Metro commuters who wanted to communicate with each other about active problems, as opposed to tourists and riders who only wanted train time data. Throughout the application's development, the Pizzurros had been advocates for Metro's transparency with riders and the community by providing more high-quality data and taking on the feedback of developers. In particular, they criticized Metro's reluctance to uniquely identify individual train trips and its decision to obscure data under certain circumstances, which have posed problems for MetroHero's data collection. In addition to their work on MetroHero, the app's developers led or participated in other initiatives related to transit in the Greater Washington area. In 2019, MetroHero partnered with a local transit group to analyze Metrobus data and publish a "Metrobus Report Card", along with proposed goals and recommendations based on the report's findings. Based on this experience, MetroHero's developers began a sister project, the Adherence + Reliability + Integrity Evaluation System for Transit (ARIES for Transit), which displays data and issues grades for Washington- and Baltimore-area transit systems. Separately, James Pizzurro used MetroHero data to inform Rail Transit OPS, an independent Metro oversight group, and assist in its documentation of Metro system incidents. == Application == The MetroHero application uses several interfaces, including an overall dashboard and a live map, to display data to its users. On the dashboard, system-wide train summary data, such as the number of operating trains and headway adherence, is visible. The map offers a visual representation of all trains' positions throughout the system, filtered by line. Individual stations and trains can be selected to see ratings and comments provided by other users, including both positive and negative notes like cleanliness and crowdedness. Additionally, a list of train wait times is given, along with aggregate data like average wait time. Any train delays or service incidents are visible in the app. MetroHero uses several data sources for the various components of its application. Train positions and other operational data are provided by WMATA as part of its initiative to release open data for third-party developers. However, MetroHero's developers noted that the Metro-provided information is sometimes inaccurate and incomplete, thereby limiting the accuracy of MetroHero. The app also collects crowdsourced data from its users, who can report conditions in train cars and stations and add to reports sent by other people. Additionally, MetroHero parses data from Twitter feeds to learn about system incidents, including delays and fires. In addition to the web app, Android app, and iOS app, MetroHero's initial developers maintained automated social media accounts that alerted customers about Metro service; these accounts were discontinued upon the original app's eventual shutdown. MetroHero also hosts archived performance data for later review, a feature that is sometimes used after major incidents. == Shutdown and future == In February 2023, James Pizzurro announced that MetroHero would be shut down on July 1, 2023, citing "positive changes ... in the app landscape and in WMATA's data management and communication" and the costs and time associated with maintaining the app. Shortly before the application's end date, the Pizzurros shared MetroHero's source code on GitHub, which prompted others to fork the code and begin maintaining new instances of MetroHero to succeed the original app. The original website went offline on July 1, as planned. Historically, WMATA has not offered its own real-time map or similar service, citing other apps from third parties which accomplished the same task. However, on June 30, 2023, Randy Clarke, WMATA's general manager, announced that Metro would begin offering a similar service as MetroHero did. The app, initially named MetroMeter, was planned to begin operating in early July and would provide real-time information on trains, headways, and service schedules. Metro also noted its intentions to extend this service to Metrobus and MetroAccess. On July 20, Metro announced that the app had been renamed to MetroPulse and launched it in beta. MetroHero's other project, ARIES for Transit, was not affected by the shutdown. == Reception == MetroHero was generally well-received and has been recognized for its usage among Washington-area commuters. DCist called it one of the "most praised" Metro tracking apps, and WMATA publicly acknowledged its popularity when announcing its decision to establish MetroPulse. Chris Barnes, a member of the Metro Riders' Advisory Council, said that the app is considered important among riders because it fulfills a need for riders to have reliable and transparent transit information, albeit somewhat hindered by flaws in WMATA's data.

    Read more →
  • Google Cloud Dataflow

    Google Cloud Dataflow

    Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. Dataflow provides a fully managed service for executing Apache Beam pipelines, offering features like autoscaling, dynamic work rebalancing, and a managed execution environment. Dataflow is suitable for large-scale, continuous data processing jobs, and is one of the major components of Google's big data architecture on the Google Cloud Platform. At its core, Dataflow's architecture is designed to abstract away infrastructure management, allowing developers to focus purely on the logic of their data processing tasks. When a pipeline written using the Apache Beam SDK is submitted, Dataflow translates this high-level definition into an optimized job graph. The service then provisions and manages a fleet of Google Compute Engine workers to execute this graph in a highly parallelized and fault-tolerant manner. This serverless approach, combined with intelligent autoscaling of both the number of workers (horizontal) and the resources per worker (vertical), ensures that jobs have the precise amount of computational power needed at any given time, optimizing both performance and cost. The service's deep integration with the Google Cloud ecosystem makes it a powerful tool for a variety of use cases beyond simple data movement. For real-time analytics, Dataflow can ingest unbounded streams of data from Cloud Pub/Sub, perform complex transformations, and load results into BigQuery for immediate querying. In machine learning workflows, it is commonly used to preprocess and transform massive datasets stored in Cloud Storage, preparing them for training models in Vertex AI. This versatility makes it the central processing engine for modern ETL (Extract, Transform, Load) operations, streaming analytics, and large-scale data preparation within the cloud. == History == Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in April, 2015. In January, 2016 Google donated the underlying SDK, the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. The donated code formed the original basis for Apache Beam. In August 2022, there was an incident where user timers were broken for certain Dataflow streaming pipelines in multiple regions, which was later resolved. Throughout 2023 and 2024, there have been various other updates and incidents affecting Google Cloud Dataflow, as documented in the release notes and service health history. The donation of the Dataflow SDK to the Apache Software Foundation was a pivotal moment, establishing Apache Beam as a unified, open-source programming model for defining both batch and streaming data pipelines. This strategic move decoupled the pipeline definition from the execution engine. As a result, developers could write portable data processing logic that was not locked into Google's ecosystem. A Beam pipeline can be executed on various runners, including Apache Flink, Apache Spark, and, of course, the highly optimized Google Cloud Dataflow service, providing flexibility and future-proofing data processing investments. == Features == Google Cloud Dataflow supports both batch and streaming data processing pipelines. It automatically handles resource provisioning, data sharding, and scaling according to workload, reducing manual configuration needed for large-scale data operations. == Use cases == Dataflow is used for ETL (Extract, Transform, Load) data pipelines, real-time analytics, and event stream processing for companies in industries such as finance, advertising, and IoT.

    Read more →
  • Spike (application)

    Spike (application)

    Spike is a cross-platform email client and AI-powered communication app, available on Windows, MacOS, iOS, Android and the web. It has a chat-like, conversational view for emails with AI-powered inbox management and integrated collaboration features. Depending on the selected plan, it can be used solely as an email application or as a full suite of business communication tools. == History == Founded in 2013 by Erez Pilosof and Dvir Ben-Aroya, Spike is a software application that puts existing e-mails into a multimedia messaging, chat-like interface enhanced with video and voice calls. The application was initially named Hop. In 2019, the developers completed a $5 million funding round including investment from Wix.com and NFX Capital. In 2020, Spike raised $8m in a Series A funding round led by Insight Partners with the participation from previous rounds' investors. In 2021 Spike announced a collaboration with Meta to launch on the Oculus Store and would become one of the first productivity apps to launch in Meta's new virtual world, known as the Metaverse. In June 2023, the company introduced its corporate offering — Teamspace, a corporate communication platform for teams with features such as company-wide channels for broad conversations, private groups for specific topics or projects, direct one-on-one conversations, video meetings, file collaboration, AI-powered email messaging, and custom email domain. It supports file management, search capabilities, and project management. Built on open-protocol technology, Spike Teamspace enables users to send and receive messages from all email providers. Regardless of whether the other party is using Spike. == Company operations == Spike is developed and operated by SpikeNow LTD. Dvir Ben Aroya serves as Spike’s CEO and Erez Pilosof is the CTO. The company is headquartered in Tel Aviv, Israel. == Mode of use == The app enables users to organize email into three types of "conversations,"a traditional inbox/sent format, by subject, or by people. Spike users can also make audio and video calls to each other, and other features include a calendar, contact list, and Groups. Spike is available for Microsoft Windows, MacOS, iOS and Android, and as a web version, and works with Gmail, Outlook, Exchange, iCloud, Yahoo! Mail and IMAP email providers. == Features == Since 2023, the platform features an AI-driven assistant, Magic AI, for customized email creation, document summarization, research, content generation, advanced note-taking, project management, and real-time translation. Since 2023, Spike offers custom email domain management. It supports team collaboration through Channels, uniting members globally with access to historical messages, and combines email with real-time messaging via Conversational Email. The Shared Inbox allows team collaboration on emails, while Groups support private conversations and invitations. It also features integrated video meetings, real-time collaboration on documents and notes, and email hosting with custom domains. Super Search enables retrieval of various content, and the Priority Inbox organizes emails by priority. Collaborative Tasks offer real-time updates and tracking. The platform allows voice message sending from mobile devices and integrates multiple calendar platforms into a unified schedule. File Management optimizes attachment handling, and the Unified Inbox consolidates emails from multiple accounts. Spike ensures data security with AES-256 encryption and private keys. The platform features AI-powered inbox management and communication tools. In May 2025, Spike launched its AI Feed feature, which automatically summarizes unread messages in a unified stream and enables bulk email actions. Additional AI capabilities include email composition assistance, document summarization, content generation, note-taking enhancement, and real-time translation.

    Read more →
  • Data cube

    Data cube

    In computer programming, a data cube (or datacube) is a multi-dimensional array of values. Typically, the term "data cube" is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. Even though it is called a cube, a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. In satellite image timeseries, dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite. For example, in online analytical processing, an OLAP cube about a company would have dimensions that could be the company subsidiaries, the company products, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only a few values with the rest being empty, i.e. undefined, while sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, and in the second case they are called dense, although there is no hard delineation between the two. Data cubes may be stored in database management systems (DBMS) as part of array DBMS. Spatio-temporal databases and geospatial databases may also be represented as coverage data. == History == Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data, Zarr and Hierarchical Data Format for general scientific data, and TIFF for imagery. In 1992, Peter Baumann introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of data manipulation languages like SQL. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by Venky Harinarayan, Anand Rajaraman and Jeff Ullman. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German Gesellschaft für Informatik. Datacube Inc. was an image processing company selling hardware and software applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements. == Standardization == In 2018, the ISO SQL database language was extended with data cube functionality as "SQL – Part 15: Multi-dimensional arrays (SQL/MDA)". Web Coverage Processing Service is a geo data cube analytics language issued by the Open Geospatial Consortium in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of coverage data. An industry standard for querying business data cubes, originally developed by Microsoft, is MultiDimensional eXpressions. == Implementation == Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL, NumPy, PDL, and S-Lang are examples, allow the programmer to manipulate complete film clips and other data en masse with simple expressions derived from linear algebra and vector mathematics. Some languages (such as PDL) distinguish between a list of images and a data cube, while many (such as IDL) do not. Array DBMSs (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the rasdaman system since 1994. == Applications == Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube. === Mathematics === In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a matrix; more generally, a tensor may be represented as an n-dimensional data cube. === Science and engineering === For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and RGB (or other color space) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the Open Geospatial Consortium WCPS geo data cube query language standard. A data cube is also used in the field of imaging spectroscopy, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as Landsat 8 and Sentinel-2 with Geographic information system analytics. === Business intelligence === In online analytical processing (OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.

    Read more →
  • Azure Maps

    Azure Maps

    Azure Maps is a suite of cloud-based, location-based services provided by Microsoft as part of the company's Azure platform. The platform provides geospatial and location-based services via REST APIs and software development kits (SDKs). The service is typically used to integrate maps or geospatial data into applications. Azure Maps differs from Microsoft's other enterprise mapping service, Bing Maps, in its pricing model, focus on privacy, and its level of integration into the broader Azure cloud ecosystem. == History == Azure Maps was first introduced in public preview mode under the name "Azure Location Based Services" in 2017, primarily as an enterprise solution. The services was intended to add mapping and location-based functionality onto the existing Azure cloud services suite, seen as a critical part of Microsoft's broader Internet-of-Things (IoT) strategy. The preview version included APIs which could be used to develop location aware apps for use cases such as logistics and mobility. In 2018, the software was renamed "Azure Maps," and became generally available to the public, and a number of new functions were added, including route calculation, travel time calculation, and incorporation of real-time traffic data and incident information. Azure Maps was integrated with Azure IoT Central in 2018, which added tracking, monitoring, and geofencing capabilities. A set of mobility APIs on were added in 2019, with applications such as use in public transport apps and shared bicycle fleet management. “Azure Maps Creator,” which converts private facility floor plans into indoor map data, was also introduced in 2019. Some commentators linked these services to Microsoft's broader development of augmented reality products. In 2020, Azure Maps Visual for Power BI was released, integrating location-based features and mapping capabilities into Microsoft's business intelligence software. An elevation API (which was later retired), geolocation services, and an iOS and Android software development kit were introduced in 2021. In 2022, support for historical weather, air quality, and tropical storm data was made generally available and custom styling for indoor maps was also introduced. In 2023, Azure Maps was certified as HIPAA compliant in a move to target healthcare and health insurance companies. == Functionality == === Geocoding === Geocoding is one of the core functionalities of Azure Maps, converting addresses or place names into geographic coordinates. Batch geocoding is used to process large amounts of address data, a function used for route optimization and spatial analysis. === Reverse geocoding === Reverse geocoding derives human-readable information from geographic coordinates like longitude and latitude, used in navigation and by geographic information systems. === Routing === Azure Maps uses map data and routing algorithms to calculate the shortest or fastest routes between locations based on factors like vehicle size and type, traffic conditions, and distance. Routing also supports multi-modal routing, which include multiple modes of transport in a single trip, including cycling, walking, and ferries. This functionality is used for location-based searches and route optimization in applications like fleet management, proximity marketing, and emergency services as well as logistics and delivery, urban planning, ride sharing apps, and outdoor activities. === Map visualization === The platform supports map visualizations that can be modified to reflect real-time data (including from IoT sensors) as well as historical data patterns. Visualizations include heat maps, street maps, satellite imagery and other custom data layers. Maps are rendered using raster or vector tiles which reduce the load of displaying large data sets or complex maps. This can be used in various applications in areas like transportation, smart cities, retail and marketing, public health, and environmental monitoring. For example, it can be used for tracking the spread of diseases or measuring the impact of changing climatic patterns. === Geofencing and spatial analytics === Azure Maps supports polygonal geofencing, which enables the definition of custom geographic boundaries. Geofenced areas can be monitored in real-time for events of interest. For example, an application could send an alert when equipment or persons enter or leave a defined area. Tools for analyzing historical geofencing data are also available via the APIs for optimization purposes. == Industry usage == Azure Maps' geofencing function has seen usage in the construction industry, designating hazardous areas for safety purposes and sending alerts if anyone enters the area. Private facility maps are used by construction companies for monitoring large construction sites to increase productivity and prevent accidents or damage. In emergency management, New Zealand based company Beca has used Azure Maps to provide analysis on the impact of earthquakes to users, including information on the severity and location of an earthquake and the impact on affected properties. Alaska's Department of Transportation uses Azure Maps as part of an information system providing weather-related warnings and analytics to road crews. Airmap, an airspace management platform for drones, uses Azure Maps. Azure Maps has also been used in conjunction with Azure Monitor for risk monitoring by an insurance company. Other companies that use or have used Azure Maps include BMW, Banco Santander, Jvion, MV Transportation, C.H. Robertson, Wise Skulls, Tata Consultancy Services, Providence Health and Services, Gas Brasiliano Distribuidora S.A., Shell plc, Persistent Systems, Phase 2 Dining and Entertainment, Symbio, HID, Globant, and Insight Enterprises. == Partnerships == Azure Maps and TomTom have been partners since 2016, and TomTom provides location data to Azure Maps and can process data from Azure Maps for mapping purposes. In 2021, Azure Maps partnered with AccuWeather to make climatic data available via its APIs, making weather data along all parts of calculated routes available for mobility and logistics purposes. Microsoft has partnered with Esri, the developer of ArcGIS, and there is cross-compatibility between Azure and ArcGIS so that data from Azure Maps can be integrated into ArcGIS and vice versa. Azure Maps partnered with Moovit in 2019, a startup providing software that interfaces with public transport data. Moovit's database on global public transit networks, including information on which stations and facilities are wheelchair accessible, was linked to Azure Maps. This service was noted for its use increasing accessibility to public transport for the visually impaired by means of voice activated route planning assistance. NORAD has used some Azure Maps functions for their NORAD Tracks Santa website during Christmas holidays. == Components == === REST APIs === Various APIs cover the major functionalities across Azure Maps: Data registry API Geolocation API Render API Route API Search API Spatial API Time zone API Traffic API Weather API === SDKs === Azure Maps SDKs uses MapLibre-style specifications and open source MapLibre GL-based libraries as a rendering engine. The Web SDK is used for developing web apps with maps and location-based data and functionality. It includes a map control module as well as modules with drawing tools. It also supports Azure Maps Creator and various spatial data formats. The platform also includes a set of REST SDKs for developers integrating Azure Maps REST APIs into Python, C#, Java or JavaScript applications. Azure Maps also includes Android and iOS SDKs used for developing applications for Android and Apple devices. === Azure Maps Creator === Azure Maps Creator is a tool for generating custom maps for locations like large office complexes, construction sites, or university campuses. These maps can then be integrated into applications and used with other Azure Maps functions for purposes such as wayfinding and maintenance and security in building automation contexts. === Azure Maps Visual for Power BI === Azure Maps is integrated with Microsoft Power BI, a graphical tool for producing data visualizations. Since July 2020, Power BI can be used in conjunction with Azure Maps for developing map-based data visualizations. This functionality entered general availability in May 2023.

    Read more →