Data lineage

Data lineage

Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across systems over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This process simplifies the identification of errors in data analytics workflows, by enabling users to trace issues back to their root causes. Data lineage facilitates the ability to replay specific segments or inputs of the dataflow. This can be used in debugging or regenerating lost outputs. In database systems, this concept is closely related to data provenance, which involves maintaining records of inputs, entities, systems and processes that influence data. Data provenance provides a historical record of data origins and transformations. It supports forensic activities such as data-dependency analysis, error/compromise detection, recovery, auditing and compliance analysis: "Lineage is a simple type of why provenance." Data governance plays a critical role in managing metadata by establishing guidelines, strategies and policies. Enhancing data lineage with data quality measures and master data management adds business value. Although data lineage is typically represented through a graphical user interface (GUI), the methods for gathering and exposing metadata to this interface can vary. Based on the metadata collection approach, data lineage can be categorized into three types: Those involving software packages for structured data, programming languages and Big data systems. Data lineage information includes technical metadata about data transformations. Enriched data lineage may include additional elements such as data quality test results, reference data, data models, business terminology, data stewardship information, program management details and enterprise systems associated with data points and transformations. Data lineage visualization tools often include masking features that allow users to focus on information relevant to specific use cases. To unify representations across disparate systems, metadata normalization or standardization may be required. == Representation of data lineage == Representation broadly depends on the scope of the metadata management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. As the data points or hops increase, the complexity of such representation becomes incomprehensible. Thus, the best feature of the data lineage view is the ability to simplify the view by temporarily masking unwanted peripheral data points. Tools with the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Data lineage also enables companies to trace sources of specific business data to track errors, implement changes in processes and implementing system migrations to save significant amounts of time and resources. Data lineage can improve efficiency in business intelligence BI processes. Data lineage can be represented visually to discover the data flow and movement from its source to destination via various changes and hops on its way in the enterprise environment. This includes how the data is transformed along the way, how the representation and parameters change and how the data splits or converges after each hop. A simple representation of the Data Lineage can be shown with dots and lines, where dots represent data containers for data points, and lines connecting them represent transformations the data undergoes between the data containers. Data lineage can be visualized at various levels based on the granularity of the view. At a very high-level, data lineage is visualized as systems that the data interacts with before it reaches its destination. At its most granular, visualizations at the data point level can provide the details of the data point and its historical behavior, attribute properties and trends and data quality of the data passed through that specific data point in the data lineage. The scope of the data lineage determines the volume of metadata required to represent its data lineage. Usually, data governance and data management of an organization determine the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes and critical data elements of the organization. == Rationale == Distributed systems like Google Map Reduce, Microsoft Dryad, Apache Hadoop (an open-source project) and Google Pregel provide such platforms for businesses and users. However, even with these systems, Big Data analytics can take several hours, days or weeks to run, simply due to the data volumes involved. For example, a ratings prediction algorithm for the Netflix Prize challenge took nearly 20 hours to execute on 50 cores, and a large-scale image processing task to estimate geographic information took 3 days to complete using 400 cores. "The Large Synoptic Survey Telescope is expected to generate terabytes of data every night and eventually store more than 50 petabytes, while in the bioinformatics sector, the 12 largest genome sequencing houses in the world now store petabytes of data apiece. It is very difficult for a data scientist to trace an unknown or an unanticipated result. === Big data debugging === Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Machine learning, among other algorithms, is used to transform and analyze the data. Due to the large size of the data, there could be unknown features in the data. The massive scale and unstructured nature of data, the complexity of these analytics pipelines, and long runtimes pose significant manageability and debugging challenges. Even a single error in these analytics can be extremely difficult to identify and remove. While one may debug them by re-running the entire analytics through a debugger for stepwise debugging, this can be expensive due to the amount of time and resources needed. Auditing and data validation are other major problems due to the growing ease of access to relevant data sources for use in experiments, the sharing of data between scientific communities and use of third-party data in business enterprises. As such, more cost-efficient ways of analyzing data intensive scale-able computing (DISC) are crucial to their continued effective use. === Challenges in Big Data debugging === ==== Massive scale ==== According to an EMC/IDC study, 2.8 ZB of data were created and replicated in 2012. Furthermore, the same study states that the digital universe will double every two years between now and 2020, and that there will be approximately 5.2 TB of data for every person in 2020. Based on current technology, the storage of this much data will mean greater energy usage by data centers. ==== Unstructured data ==== Unstructured data usually refers to information that doesn't reside in a traditional row-column database. Unstructured data files often include text and multimedia content, such as e-mail messages, word processing documents, videos, photos, audio files, presentations, web pages and many other kinds of business documents. While these types of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly into a database. The amount of unstructured data in enterprises is growing many times faster than structured databases are growing. Big data can include both structured and unstructured data, but IDC estimates that 90 percent of Big Data is unstructured data. The fundamental challenge of unstructured data sources is that they are difficult for non-technical business users and data analysts alike to unbox, understand and prepare for analytic use. Beyond issues of structure, the sheer volume of this type of data contributes to such difficulty. Because of this, current data mining techniques often leave out valuable information and make analyzing unstructured data laborious and expensive. In today's competitive business environment, companies have to find and analyze the relevant data they need quickly. The challenge is going through the volumes of data and accessing the level of detail needed, all at a high speed. The challenge only grows as the degree of granularity increases. One possible solution is hardware. Some vendors are using increased memory and parallel processing to crunch large volumes of data quickly. Another method is putting data in-memory but using a grid

FMLLR

In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

Overwatch

Overwatch (abbreviated as OW) is a multimedia franchise centered on a series of multiplayer first-person shooter (FPS) video games developed by Blizzard Entertainment. Overwatch was released in 2016. Overwatch 2 was released in 2022 and the original game was taken offline upon its release, though Blizzard renamed it back to Overwatch in 2026. Overwatch features hero-based combat between two teams of players fighting over various objectives, along with other traditional gameplay modes. Released in 2016, Overwatch lacked a traditional story mode. Instead, Blizzard employed a transmedia storytelling strategy to disseminate lore regarding the game's characters, releasing comics and other literary media, as well as animated media that includes short films. The game enjoyed both critical and commercial success, and garnered a devoted following. The fan community around the franchise has produced a large amount of content including art, cosplay, fan fiction, anime-influenced music videos, Internet memes, and pornography. Blizzard helped launch and promote an esports scene surrounding the game, including an annual Overwatch World Cup, Overwatch League a minor league, and the Overwatch Champions Series which borrowed elements found in traditional American sports leagues. == Gameplay == Both games in the Overwatch series are team-based hero shooters. Players select a hero character from a large roster (52 as of Season 2), divided among three class types. These are: Tanks, who have higher health and generally meant to help protect their teammates from damage, but are larger and easier to hit; Damage, who act as the team's offensive leads; and Support, who heal, provide buffs for teammates, or de-buff the opposing team. Each role also features sub-roles with extra passives. These sub-roles include 'Initiator', 'Stalwart', and 'Bruiser' for Tank. 'Specialist', 'Flanker', 'Recon', and 'Sharpshooter' for Damage. 'Medic', 'Tactician', and 'Survivor' for Support. Players are generally free to change to different heroes while inside their spawn room during the course of a match in response to the current tactics employed by other players. As of the development of Overwatch 2, a standard game features one tank player, two damage players and two support players, a change from having two of each class in its predecessor. Players choose their class before the match, and can only pick characters within that class for the duration of the game. There are different styles of game modes, however, that allow players to choose characters from any class throughout the game. Each hero has a skill kit that includes a primary attack, active skills that require a cooldown period before they can be used again, passive skills that remain active at all times, and an Ultimate skill that can only be used once they fill their Ultimate meter either by damaging opponents, mitigating damage, healing teammates or by passively generating it over time. An update in 2025 saw each hero receive a total of four unique abilities known as perks. Each hero has two minor and two major perks; minor perks consist of smaller changes to a hero's kit, while major perks are intended to affect the match more significantly. At the beginning of each match, all heroes are set to level 1 for each player. As the match progresses, players can individually level up their respective heroes, minor perks are unlocked at level 2, and major perks are unlocked at the maximum level 3. When perks become available, players may only select one of each type of perk; a selected perk becomes irreversibly attached to the current hero for the remainder of the match. If a player switches to another hero mid-match, the previously selected hero retains their level and perk progress. Game types of Overwatch are split between standard matches, competitive play, custom games, and arcade modes. Standard matches have matchmaking based loosely on the player's skill level as measured by the game. Competitive mode uses more strict matchmaking based on a player's current rank on the competitive ladder, with their rank increasing or decreasing when they win or lose a game, respectively. Arcade modes do not use matchmaking and are generally more experimental modes compared to standard and competitive modes. Custom games are created via the workshop and can be utilised to make game modes that are very different from the base game. The workshop, is the software in Overwatch which creates the game using either presets and settings or rules and conditions made by code. These game modes can be published directly onto Overwatch’s custom browse tab or shared off platform using a 5 digit alphanumeric code. Standard and competitive game modes are randomly selected at the start of each match, and are objective based, requiring teams to control a fixed objective point for a duration of time, or escort a payload to a target zone before match time expires. These modes include: Assault (introduced in Overwatch): Also known as 2 Capture Points (or 2CP), Assault has the attacking team tasked with capturing two target points in sequence on the map, while the defending team must stop them. Assault-style maps were removed from main gameplay rotation after Overwatch 2 released but available in the game's arcade mode. It is still available in the game's custom game modes. Since Season 2, Assault-style maps are available in Arcade Mode daily routines. Escort (introduced in Overwatch): Also known as "Payload" by the community, The attacking team is tasked with escorting a payload to a certain delivery point before time runs out, while the defending team must stop them. The payload vehicle moves along a fixed track when any player on the attacking team is close to it, increasing in speed if multiple attackers are present, the increase capping at 3, but will stop if a defending player is nearby; should no attacker be near the vehicle, it will start to move backwards along the track. The payload will also heal any attacking players by 10 health per second while they are near the payload. Passing specific checkpoints will extend the match time and prevent the payload from moving backwards from that point. Hybrid (Assault/Escort) (introduced in Overwatch): The attacking team has to capture the payload (as if it were a target point from Assault) and escort it to its destination, while the defending team tries to hold them back. Control (introduced in Overwatch): Each team tries to capture and maintain a common control point until their capture percentage reaches 100%. This game mode is played in a best-of-three format. Control maps are laid out in a symmetric fashion so no team has an intrinsic position advantage. Push (introduced in Overwatch 2's launch): Each team attempts to secure control of a large robot that pushes one of two barriers to the opposing team's side of the map, whilst being escorted by at least one team member, stopping when enemy players are nearby, similar to the payload movement system in Escort. The team that pushes the payload fully to the other side, or furthest into the enemy territory before the time runs out, wins the match. Flashpoint (introduced in Overwatch 2 in 2023): Similar to Control, each team attempts to capture and maintain a common control point until their capture percentage reaches 100%. This game mode takes place on significantly larger maps with five separate control points, which take a shorter amount of time to capture as compared to a standard Control map. A central control point is always activated first; after it is secured by one team, the remaining four are activated in a random order. The first team to secure three control points wins. Clash (introduced in Overwatch 2 in 2024): Clash maps feature symmetrical maps with five control points. Teams initially vie for control of the central point, with the winning team progressing to the next control point, towards the opponent's base. Opponents can push back by winning control points and shifting the next point away from their base. If a team captures the point closest to the opponent's base, they win. Otherwise the match plays out until one team wins control five times. Arcade modes may include variations of the above modes with experimental rules, and can also include modes like Deathmatch and Capture the Flag. Other common arcade modes include: Elimination (introduced in Overwatch in 2016): Two teams face off in a series of rounds, attempting to wipe out the other team; once a player is killed they remain out of the game until the next round, though they can be revived by Mercy's 'Resurrect' ability. If no team has won a round by a certain time, then the winners are decided by the team that can first take a neutral control point. Players cannot change heroes until the next round. Some of these can be played in "lockout" mode, in which the heroes selected by the winning team for a round are "locked" and cannot be selected in future rounds. Total Mayhem (i

Coronavirus breathalyzer

A coronavirus breathalyzer is a diagnostic medical device enabling the user to test with 90% or greater accuracy the presence of severe acute respiratory syndrome coronavirus 2 in an exhaled breath. As of the first half of 2020, the idea of a practical coronavirus breathalyzer was concomitantly developed by unrelated research groups in Australia, Canada, Finland, Germany, Indonesia, Israel, Netherlands, Poland, Singapore, United Kingdom and the USA. == Australia == In Australia, GreyScan CEO Samantha Ollerton and Prof. Michael Breadmore of the University of Tasmania are basing a coronavirus breathalyzer on existing technology that is used around the world to detect explosives. Another invention published from ABC News; produced by Colin Hickey and Examin Holdings, have released information on a new breathalyzer called the "Queensland Breath test" claiming its function has 98% efficiency, equipped with a replaceable plastic nozzle for reusability (February 2022). a statement in claim by Bruce Thompson, a professor at Swinburne University of Technology, Although this products is reliable, due to insufficient funding, the product is inaccessible. == Canada == Canary Health Technologies, headquartered in Toronto with offices in Cleveland, Ohio, is developing a breathalyzer with disposable nanosensors using AI-powered cloud-based analysis. According to a press release, clinical trials began in India during November 2020. The stated goal is to develop an accurate, reasonably priced screening tool that can be used anywhere and deliver a result in less than a minute. The company postulates that analyzing volatile organic compounds in human breath could potentially detect diseases before the on-set of symptoms, earlier than currently available methods. Moreover, the cloud-based technology is designed to be used as a disease surveillance apparatus. == Finland == By the end of June 2020, Forum Virium Helsinki, in collaboration with Finnish software firm Deep Sensing Algorithms, funded by the Helsinki-Uusimaa Regional Council, announced that testing of their device had begun with a control group in Kazakhstan, with plans to expand to the Netherlands, the United States, South Africa, Brazil and Finland throughout the summer. The efficacy of the Forum Virium Helsinki / Deep Sensing Algorithms device hinges on its AI component. "We are engaged in innovative cooperation with corporations to solve the coronavirus crisis, and we will help firms to use the city as a development platform. We are utilizing artificial intelligence and digitalization," said Forum Virium Helsinki CEO Mika Malin. == Germany == In March 2020, the Singaporean company RAM Global conducted research in Germany in hopes of developing a one-minute breathalyzer test for SARS-CoV-2 based on terahertz time-domain spectroscopy. The company attempted to develop a disposable test kit for direct detection of COVID-19 virion particles in breath, saliva and swab samples. On 31 March, RAM Global completed an initial clinical study on live patients at University Hospital Saarland. In April, the company pursued a small unknown sample study in which hospital doctors provided unknown samples in order to test accuracy in differentiating positive and negative samples. == Indonesia == Since April 2020, a team of researchers from Gadjah Mada University (UGM) has been developing an electronic nose called GeNose C19. The GeNose C19 can be used as a rapid, non-invasive screening tool in less than two minutes. A profiling test was carried out at the Bhayangkara Hospital and the Covid Bambanglipuro Special Field Hospital in Yogyakarta. GeNose C19 consists of gas sensors and an artificial intelligence-based pattern recognition system. The diagnostic test was carried out with the cooperation of nine multi-center hospitals. In the end of December 2020, GeNose C19 received a distribution permit from Indonesia's Health Ministry. Initially, 100 units will be released and each device will be able to perform 120 tests per day. The test is estimated to cost 15,000–25,000 Indonesian rupiah ($1–$1.8) and would take three minutes for the test and another two minutes to yield a result. Researchers hope to manufacture up to 1,000 GeNose C19 units, increasing the country's testing capabilities by 120 thousand subjects per day. Moreover, they aim to manufacture 10,000 units by February 2021. == Israel == In Israel, it is at the photonics lab of Gabby Sarusi, professor at Ben-Gurion University of the Negev, that research is underway as of midsummer 2020. Separately from Sarusi's project, in July 2020, it was reported that Israeli start-up Nanoscent in cooperation with Sheba Medical Center had devised a breathalyzer that Magen David Adom (MDA) is seeking to incorporate into existing drive-thru testing stations located throughout the country. Questionable intellectual property of Gabby Sarusi regarding this project is now under discussion in the court in Israel. == The Netherlands == A breath test with the SpiroNose device, made by the Dutch company Breathomix, has been developed and tested in collaboration with the Leiden University Medical Center (LUMC), Franciscus Gasthuis & Vlietland and the GGD Amsterdam. The breath test has been validated as a pre-screening test for people who have no or mild symptoms of COVID-19. From April 2021, the device was operational in COVID-19 test drive-ins, conferences and events, i.e. Eurovision Song Contest 2021. Subjects must abstain from alcohol for eight hours prior to taking the breath test. The SpiroNose contains four sets of seven different sensors that can measure the mixture of volatile organic compounds (biomarkers) in the exhaled air. These VOCs provide a picture of a person's metabolism. This 'breath profile' is forwarded to an online analysis platform. Here the breath profile is compared with other breath profiles of people with and without a COVID-19 diagnosis and analysed by algorithms. Data-analysis involves advanced signal processing and statistics based on independent t-tests followed by linear discriminant and ROC analysis. The test result is known within minutes. The breath test has a sensitivity/specificity for SARS-CoV-2 infection of 100/78, >99/84, 98/82% in validation, replication and asymptomatic cohorts of patients. The breath test reliably detects who is not infected. Such a subject will receive a test result immediately. Other subjects must promptly conduct a subsequent test, for example a PCR test or LAMP test. The test results can be viewed by the client and are not automatically interfaced to other databases, i.e. for public health surveillance, source and contact tracing, vaccination programs. In July 2021, the ministry stopped the tests with the SpiroNose because, according to the GGD, the device gives unusable results in some cases. Breathomix indicates that this is the result of the way in which the SpiroNose is deployed. The SpiroNose is and remains a reliable instrument for lung diseases. The analysis platform is developed conform the requirements of the standard ISO 27001 (Information Security) and NEN 7510 (Information Security in Health Care). A CE marking has been requested. In the meantime, the Dutch minister has granted a CE marking exemption on 25 January 2021. The device may also be used to detect other diseases, e.g., asthma, COPD, lung cancer, interstitial lung diseases (ILD). == Poland == In February 2021, the President of Poland, Andrzej Duda, announced that ML System S. A., headquartered in Zaczernie, Poland, had successfully developed a means of analyzing a patient's breath to test for the presence of coronavirus. According to an anonymous press release, test subjects exhale into a device in order to determine the presence of the coronavirus. The procedure, similar to that of a police breathalyzer, is said to take less than ten seconds. Independent clinical trials were begun in April 2021. In the first half of May 2021, a brief text concerning partial results was published by ML System, stating that independent clinical trials were successful with specificity (97,15%) and accuracy/sensitivity (86,86%), for CT (Cycle Threshold) assumed at 25, which is in line with the guidelines set out by the World Health Organization. Moreover, ML System in partnership with Rzeszów–Jasionka Airport published a statement indicating their intention to test the device at the airport. Similar plans exist between the manufacturer and the Warsaw Chopin Airport. Two large networks of laboratories in Poland, "Diagnostyka" and "ALAB Laboratoria", have signed a letter of intent with ML System. In agreement with ALAB, the parties declared cooperation in the implementation of the product named "COVID DETECTOR" on the Polish, German and Ukrainian markets. In addition, the companies declared joint activities aimed at extending the diagnosis with the use of "COVID Detector" to include mutations of the SARS-CoV-2 virus, differentiate the stage of the disease and ot

Google AI Studio

Google AI Studio is a web-based integrated development environment developed by Google for prototyping applications using generative AI models. Released in December 2023 alongside the Gemini API, the platform provides access to Google's Gemini family of models and related tools for image, video, and audio generation. The service targets both developers and non-technical users for testing prompts and generating code for the Gemini API. == History == Google launched AI Studio on December 13, 2023, as the successor to Google MakerSuite. MakerSuite, introduced at Google I/O in May 2023, had provided similar functionality for Google's PaLM language models. The AI Studio was launched alongside the public release of the Gemini API. == Features == AI Studio's interface consists of a central prompt area and a settings panel for model selection and parameter adjustment. The platform supports chat prompts for multi-turn conversations and includes system instructions for defining model behavior, tone, or specific rules. Users can employ zero-shot and few-shot prompting techniques to guide the model's output format. The platform processes various media types including video, audio, and documents, and can generate images through Imagen models, videos through Veo models, and audio through text-to-speech functionality. Additional tools include real-time streaming for screen sharing and live analysis, code execution in a sandboxed Python environment, grounding with Google Search for current information, URL context for analyzing specific web pages, and a thinking mode for complex reasoning tasks. == Available models == The platform provides access to several Google AI models including the Gemini language models, Imagen for image generation, Veo for video generation, LearnLM for educational applications, and Gemma, Google's open-source model family. == Privacy and data usage == Google AI Studio's data handling differs between free and paid users. For free tier users, Google uses submitted prompts, uploaded files, and generated responses to improve its products and services, with human reviewers potentially reading and annotating the data after disconnection from user accounts. Google advises against submitting sensitive information on the free tier. Users who enable Google Cloud Billing are considered paid service users, and their data is not used for product improvement. Data is processed according to Google's Data Processing Addendum and retained temporarily for abuse monitoring. == Availability == The platform is available at no cost, with API usage subject to a free tier with daily and per-minute rate limits. Access is restricted to users aged 18 and older in specific countries and territories. The service was initially unavailable in the United Kingdom and European Economic Area due to regulatory concerns, which drew user complaints. == Reception == Reviews have noted the platform's accessibility and integration with Gemini models, with features such as real-time screen sharing and large context windows cited as notable capabilities. However, reviewers have raised concerns about the privacy implications for free tier users, whose data is used for model training. Some users have reported inconsistent performance with features like screen streaming and issues with folder uploads for large datasets. The initial geographic restrictions were a point of criticism among developers in affected regions.

Piranesi (software)

Piranesi is an interactive paint system that enables the user to create artistic images from 3D scenes created using conventional modeling applications. == Image format == Piranesi uses the proprietary EPix file format. For every pixel, additional information is stored, such as distance from the viewer and material settings. EPix files can be rendered from 3D scenes using a fixed viewpoint by Piranesi's companion software, Vedute.

Pulsar (social listening platform)

Pulsar is a software platform for social media monitoring, audience intelligence and social listening that allows organizations to monitor and analyze online conversations across social media, news, and other digital sources. The platform combines social media listening, media monitoring, trend analysis, and audience segmentation to help users understand public discussions and audience behavior in real time. The platform is a social listening platform, which aggregates data from networks such as X, Facebook, Instagram, and forums) and applies artificial intelligence for text and sentiment analysis. Pulsar is offered as a cloud-based Software as a Service (SaaS) tool and insights consultancy. It has been part of Pulsar Group (formerly Access Intelligence), a publicly listed group of communications software products, since 2019. As well as commercial uses, the platform has been used in peer-reviewed academic research analysing online discourse. The platform is listed on the UK government's G-Cloud 14 Digital Marketplace for the provision of social listening and audience intelligence services. == History == Pulsar originated in the early 2010s as a project within Face, a London-based innovation and market research consultancy. The platform's first product, Pulsar TRAC, launched in 2013 as a social media analytics tool. Pulsar TRAC was designed to measure the reach of conversations, mapping brand audiences, and tracking how content spreads through networks. The development was led by Dr Francesco D'Orazio, who created the Pulsar brand and led the development of the platform while serving as VP of Product and Innovation at Face. Face itself had been acquired by the Cello Group Plc (a UK-based advisory firm) in 2012, and Pulsar became part of Cello's portfolio of research and data tools. In January 2017, Cello Group made a significant investment to scale Pulsar and announced the merger of Face's qualitative research business into Pulsar, unifying both under the Pulsar brand for global expansion. In 2018, Pulsar opened an office in Los Angeles to better serve its growing U.S. client base in media, healthcare, and entertainment sectors and Francesco D'Orazio was appointed CEO. The company focused on developing new products amid a wave of consolidation in the social listening industry. In October 2019, Pulsar was acquired by Access Intelligence Plc (now Pulsar Group), an AIM-listed communications software company. The group, which also owns PR and media tools Isentia, Vuelio and ResponseSource, integrated Pulsar to their end-to-end marketing and communications insights offering. Pulsar established a new office in Sydney, Australia in 2022 as part of this global expansion, adding to its existing offices in London and Los Angeles. In 2023, Pulsar Group (then Access Intelligence) was recognised as one of Europe's fastest growing companies by the Financial Times. In May 2024, Access Intelligence PLC changed its name to Pulsar Group PLC. The company has since continued to develop its platform. In March 2025 it introduced new tool Narratives AI, described as a "search engine for public opinion" and the first of its kind for analyzing public narratives and their evolutions in both social media and the news. In October 2025, Pulsar launched Insight Agents, a set of AI agents embedded into the platform advertised to "proactively anticipate user needs or issues, carry out routine tasks, uncover anomalies in your datasets, and prompt responses at scale, 24/7." == Products == Pulsar's architecture integrates four main products into a single interface. The core product suite is often broken into three main components: Pulsar TRAC (for social listening and audience analysis), Pulsar TRENDS (for trend discovery and analysis), and Pulsar CORE (for owned-channel and web analytics). Pulsar's fourth product is Narratives AI. === Pulsar TRAC === Pulsar TRAC is a social listening and audience intelligence platform that allows users to configure searches that track public conversations and measure audience behaviour. Pulsar TRAC is focused on conversation insights and audience segmentations - the platform is reported to collect and analyse data from a wide range of sources, including major social networks, forums, news and review sites, and ecommerce platforms, with real-time visualisations and AI-supported analytics used to find patterns and communities of interest. Pulsar TRAC can be incorporated into workflows with other audience tools, such as an integration with Audiense that connects TRAC's conversation insights to external audience-segmentation datasets. === Pulsar CORE === Pulsar CORE centres on the analysis of owned-channel data, such as brand social media profiles, website interaction and other in-house digital assets, to generate audience and content insights. CORE can monitor published content, evaluate competitors, and extract demographic and behavioural segmentation from owned channels. === Narratives AI === Narratives AI is a tool within the Pulsar audience intelligence platform that uses artificial intelligence to detect, cluster and analyse narratives forming across social and news media. It was launched in March 2025 as a standalone search interface that processes real-time and historical data to find cultural trends, behaviours and beliefs. It uses clustering algorithms and visualisation to show how conversations form and spread online, and their relative importance within wider discourse. == Notable features == === Insight Agents === Pulsar's Insight Agents are AI-powered agents within the Pulsar platform designed to automate and augment common tasks in media, social, audience and narrative intelligence. Branded as TeamMates, these agents are grouped into four functional types: Sentinels for real-time monitoring, anomaly detection and alerting Oracles for forecasting and scenario planning Custodians for governance, compliance and policy enforcement Analysts for research, reporting and recommendations Each agent is trained on Pulsar's multi-source data and domain-specific workflows. In February 2026, Pulsar introduced 'Crisis Oracle,' an AI-driven system designed to quantify narrative momentum and predict reputational risk. == Academic research == Pulsar has been used as a data collection and analysis tool in peer-reviewed academic research across public health, infodemiology, veterinary science, and policy research. Published uses include a World Health Organization report on infodemic management, a Journal of Medical Internet Research study on headache and migraine discourse across Japan, Germany, and France, a Frontiers in Big Data study of Long COVID narratives, and Frontiers in Veterinary Science studies on canine chronic kidney disease and oral medication administration in dogs.