AI Assistant Gemini

AI Assistant Gemini — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Machine unlearning

    Machine unlearning

    Machine unlearning is a branch of machine learning focused on removing specific undesired element, such as private data, wrong or manipulated training data, outdated information, copyrighted material, harmful content, dangerous abilities, or misinformation, without needing to rebuild models from the ground up. Large language models, like the ones powering ChatGPT, may be asked not just to remove specific elements but also to unlearn a "concept," "fact," or "knowledge," which aren't easily linked to specific examples. New terms such as "model editing," "concept editing," and "knowledge unlearning" have emerged to describe this process. == History == Early research efforts were largely motivated by Article 17 of the GDPR, the European Union's privacy regulation commonly known as the "right to be forgotten" (RTBF), introduced in 2014. The GDPR did not anticipate that the development of large language models would make data erasure a complex task. This issue has since led to research on "machine unlearning," with a growing focus on removing copyrighted material, harmful content, dangerous capabilities, and misinformation. Just as early experiences in humans shape later ones, some concepts are more fundamental and harder to unlearn. A piece of knowledge may be so deeply embedded in the model's knowledge graph that unlearning it could cause internal contradictions, requiring adjustments to other parts of the graph to resolve them. Researchers have now also started studying unlearning in the context of removing incorrect or adversarially manipulated training data such as systematically biased labels or poisoning attacks. == Motivations == At present, machine unlearning is motivated by a growing range of concerns that extend well beyond the field's original focus on data privacy. A widely used taxonomy in the literature distinguishes two high-level categories of motivation. Access revocation covers cases where a data subject or rights holder requests the removal of data they own or control. This is most commonly associated with RTBF established by the European Union's General Data Protection Regulation (GDPR) and analogous legislation such as the California Consumer Privacy Act (CCPA). These regulations grant individuals the legal right to request erasure of their personal data from any system that has processed it, including models that were trained on it. Access revocation also encompasses the removal of copyrighted or pay-walled content that was incorporated into training corpora without the necessary licenses, a concern that has become prominent with the widespread use of largely web-scraped pre-training datasets. Model correction covers cases where the model exhibits undesirable behavior arising from the training data, regardless of any individual's request. This includes: Removal of toxic, biased, or unsafe outputs introduced by harmful content in the training set Correction of stale or factually incorrect associations, such as outdated knowledge encoded in a deployed model Removal of dangerous capabilities, such as detailed knowledge of the synthesis of chemical or biological agents Correction of the influence of data poisoning or adversarial attacks that have corrupted model behavior This second category has been formalized as corrective machine unlearning, which frames unlearning as a post-training mechanism for repairing the effects of bad or harmful training data. It is closely related to the AI safety literature, where data filtering alone has been found insufficient to prevent hazardous knowledge from being encoded in model weights, motivating unlearning as a complementary risk mitigation strategy. A further distinction has been drawn in the literature between removal {eliminating the influence of specific training data on model parameters) and suppression (preventing the model from generating specific outputs regardless of how that knowledge is encoded). These two goals are not equivalent: removing training data does not guarantee meaningful output suppression, and suppressing outputs does not constitute removal of the underlying training data's influence. == SISA Training == SISA is a training strategy consisting of four mechanisms designed to make machine unlearning more efficient by structuring how models are trained and updated. Its goal is to allow a system to remove the influence of specific data points without retraining an entire model from scratch. By reorganizing training data and workflows, SISA reduces the computational burden of unlearning requests. Sharding divides the training dataset into multiple disjoint subsets, or shards. Each shard is used to train a separate model instance. This ensures that a single data point affects only one shard, so unlearning it requires updating only the corresponding shard rather than the full model. Isolation refers to training each shard independently, with nothing shared across shards during the training process. This separation prevents cross-contamination between shards, ensuring that forgetting data in one shard does not require adjustments to any others. Slicing breaks the data within each shard into sequential slices and stores model states after each slice is trained on. When an unlearning request targets a piece of data, the system can roll back to the checkpoint before the point was seen and retrain only from that slice forward. This reduces retraining time even within a shard. Aggregation occurs at inference, when the model is queried. It combines the outputs of each shard to determine the output of the overall model. This is often through majority voting or averaging. This allows SISA-trained systems to behave like a single model despite being composed of multiple shard-level models. Together, these mechanisms enable machine learning systems to forget specific data points with far lower computational cost than full retraining. The trade-off is that sharding and slicing can lead to reduced model accuracy, worse generalization, and increased storage requirements for the intermediate checkpoints. This can be tolerable based on the needs of the individual or organization to comply with "right to be forgotten" or efficiently recover from backdoor attacks. == Algorithms == Machine unlearning algorithms are broadly categorized into exact and approximate methods, reflecting a fundamental trade-off between formal guarantees and computational tractability. === Exact Unlearning === Exact unlearning methods produce a model that is statistically indistinguishable from one retrained from scratch on the dataset with the forget data removed. The canonical framework for exact unlearning is SISA Training (Sharded, Isolated, Sliced, and Aggregated), introduced by Bourtoule et al. (2021). SISA partitions the training dataset into disjoint shards and trains a separate sub-model on each. At inference time, predictions are aggregated across sub-models. When an unlearning request is received, only the sub-model corresponding to the shard containing the target data requires retraining, reducing computational overhead proportionally to the number of shards. Exact methods provide the strongest guarantees but become prohibitively expensive for large pre-trained neural networks and are generally limited to settings where training can be structured in advance. === Approximate Unlearning === Approximate unlearning methods seek to produce a model whose behavior is sufficiently close to an exactly unlearned model without the cost of full retraining. These methods dominate practical applications. Common approaches include: Gradient Ascent: The model is fine-tuned by maximizing the loss on the forget set, directly degrading its performance on targeted data. This is the most direct approach but risks destabilizing performance on retained data. Random Labelling: The model is fine-tuned on the forget set using randomly shuffled labels, confusing its associations with the targeted data while producing a less aggressive weight shift than pure gradient ascent. Gradient Difference: Combines gradient ascent on the forget set with simultaneous gradient descent on the retain set, using the retain objective as a regularizer to preserve general model utility. KL Divergence Regularization: Minimizes the KL divergence between the outputs of the unlearned model and the original model on the retain set, anchoring behavior on data the model should remember. Weight Pruning and Fine-tuning: Parameters with the smallest L1-norm are pruned — targeting weights most weakly associated with general knowledge and potentially most associated with the forget set — followed by fine-tuning on the retain set to restore utility. Layer Reset and Fine-tuning: The first or last k layers are re-initialized to random weights and the model is subsequently fine-tuned on the retain set. This is a coarse but computationally simple approach. Selective Synaptic Dampening: Uses influence functions to estimate the effect of individual trainin

    Read more →
  • Viral marketing

    Viral marketing

    Viral marketing is a business strategy that uses existing social networks to promote a product or service on social media platforms. Its name refers to how consumers spread information about a product with other people, much in the same way that a virus spreads from one person to another. It can be delivered by word of mouth, or enhanced by the network effects of the Internet and mobile networks. The concept is often misused or misunderstood, as people apply it to any successful enough story without taking into account the word "viral". Viral advertising is personal and, while coming from an identified sponsor, it does not mean businesses pay for its distribution. Most of the well-known viral ads circulating online are ads paid by a sponsor company, launched either on their own platform (company web page or social media profile) or on social media websites such as YouTube. Consumers receive the page link from a social media network or copy the entire ad from a website and pass it along through e-mail or posting it on a blog, web page or social media profile. Viral marketing may take the form of video clips, advergames, ebooks, brandable software, images, text messages, email messages, or web pages. The most commonly utilized transmission vehicles for viral messages include pass-along based, incentive based, trendy based, and undercover based. However, the creative nature of viral marketing enables an "endless amount of potential forms and vehicles the messages can utilize for transmission", including mobile devices. The ultimate goal of marketers interested in creating successful viral marketing programs is to create viral messages that appeal to individuals with high social networking potential (SNP) and that have a high probability of being presented and spread by these individuals and their competitors in their communications with others in a short period. The term "viral marketing" has also been used pejoratively to refer to stealth marketing campaigns—marketing strategies that advertise a product to people without them knowing they are being marketed to. == History == The emergence of "viral marketing", as an approach to advertisement, has been tied to the popularization of the notion that ideas spread like viruses. The field that developed around this notion, memetics, peaked in popularity in the 1990s. As this then began to influence marketing gurus, it took on a life of its own in that new context. The brief career of Australian pop singer Marcus Montana is largely remembered as an early example of viral marketing. In early 1989, thousands of posters declaring "Marcus is Coming" were placed around Sydney, generating discussion and interest within the media and the community about the meaning of the mysterious advertisements. The campaign successfully made Montana's musical debut a talking point, but his subsequent music career was a failure. The term is found in PC User magazine in 1989 with a somewhat differing meaning. It was later used by Jeffrey Rayport in the 1996 Fast Company article "The Virus of Marketing", and Tim Draper and Steve Jurvetson of the venture capital firm Draper Fisher Jurvetson in 1997 to describe Hotmail's practice of appending advertising to outgoing mail from their users. Doug Rushkoff, a media critic, wrote about viral marketing on the Internet in 1996. Bob Gerstley wrote about algorithms designed to identify people with high "social networking potential." Gerstley employed SNP algorithms in quantitative marketing research. In 2004, the concept of the alpha user was coined to indicate that it had now become possible to identify the focal members of any viral campaign, the "hubs" who were most influential. Alpha users could be targeted for advertising purposes most accurately in mobile phone networks, due to their personal nature. In early 2013, the first ever Viral Summit was held in Las Vegas. == Factors == Marketer Jonah Berger defines six key factors that drive virality, organized in an acronym called STEPPS: Social currency – the better something makes people look, the more likely they will be to share it Triggers – things that are "top of mind" are more likely to be "tip of the tongue" Emotion – when people care, they share Public – the easier something is to see, the more likely people are to imitate it Practical value – people share useful information to help others Stories – like a Trojan Horse, stories carry messages and ideas along for the ride. Another important factor that drives virality is the propagativity of the content, referring to the ease with which consumers can redistribute it. == Psychology == To form deeper connections with viewers and increase the chances of virality, many marketers use psychological principles. They argue that this approach is scientific and can foster an environment where the odds of gaining traction are much higher. People find psychological safety and can develop a sense of trust when more people interact with online content. For this reason, marketers work to develop media that resonates with viewers on a deeper, emotional level as this approach frequently results in higher engagement. This level of interaction serves as a sign of approval, reducing the personal risk that is subconsciously linked to associating oneself with a company or brand’s content. Professor Jonah Berger at the University of Pennsylvania's Wharton School of Business affirms that marketing campaigns that trigger psychological responses linked to strong emotions tend to perform better. In particular, Berger found that positive emotions like happiness, joy, and excitement have more successful share rates than their negative counterparts. This outcome results from the human instinct to respond more positively to content with activating emotions, increasing the desire to share content, which contributes to its virality. Viral marketing utilizes the primitive feeling of frisson to increase their view and share counts. This feeling of excitement is considered powerful because of its ability to cause a physical response. From increased heart rates to full body chills, Professor Brent Coker at the University of Melbourne describes that this approach to marketing triggers a primitive response that immerses the viewer in the content on a deeper level. Researchers Juliana Fernandes from the University of Florida and Sigal Segev from the Florida International University also found that people are more inclined to share emotional campaigns over those that are heavily informational. They claim that consumers do not often care to learn about a product’s actual features and benefits. Instead, people prefer to be immersed in experience-based content that creates an emotional impact. Companies and brands can benefit from treating their content in this manner and go viral more frequently than those who do not. Social proof is another psychological phenomenon that impacts viral content. Experts in this field argue that it is a natural instinct to want to behave similarly to others because it results in positive validation. This phenomenon explains the human need to conform, so marketers focus on creating engaging content that encourages interactions and causes a snowball effect. This subconsciously influences people to like, comment, and share if they already see others doing the same. Social proof goes further by providing people with a form of social currency. When individuals interact with and share content, they become associated with the topics at hand. People naturally tend to perceive one another, and this pattern carries over to the digital world. As a result, many people tend to be vigilant about the viral marketing they engage with, since they want to be perceived positively. Companies and brands have the opportunity to develop social currency themselves by aligning with their target audiences and creating marketing campaigns that fit their interests or match their values. == Methods and metrics == According to marketing professors Andreas Kaplan and Michael Haenlein, to make viral marketing work, three basic criteria must be met, i.e., giving the right message to the right messengers in the right environment: Messenger: Three specific types of messengers are required to ensure the transformation of an ordinary message into a viral one: market mavens, social hubs, and salespeople. Market mavens are individuals who are continuously 'on the pulse' of things (information specialists); they are usually among the first to get exposed to the message and who transmit it to their immediate social network. Social hubs are people with an exceptionally large number of social connections; they often know hundreds of different people and have the ability to serve as connectors or bridges between different subcultures. Salespeople might be needed who receive the message from the market maven, amplify it by making it more relevant and persuasive, and then transmit it to the social hub for further distr

    Read more →
  • Cambridge Analytica

    Cambridge Analytica

    Cambridge Analytica Ltd. (CA), previously known as SCL USA, was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was founded in 2013, as a subsidiary of the private intelligence company and self-described "global election management agency" SCL Group by long-time SCL executives Nigel Oakes, Alexander Nix and Alexander Oakes, with Nix as CEO. Cambridge Analytica was hired by a variety of political actors, including the Trinidadian government in 2010 and the 2016 presidential campaigns of Ted Cruz and Donald Trump. The firm maintained offices in London, New York City, and Washington, D.C. The company closed operations in 2018 due to backlash from the scandal, although firms related to both Cambridge Analytica and its parent firm SCL still exist. == History == Cambridge Analytica was founded in 2013 as a subsidiary of the private intelligence company SCL Group, which describes itself as providing "data, analytics and strategy to governments and military organisations worldwide". The company was part of "an international web of companies" headed by the London-based SCL Group. Cambridge Analytica (SCL USA) was incorporated in January 2013 with its registered office being in Westferry Circus, London and consisting of just one staff member, director and CEO Alexander Nix (also appointed in January 2015). Nix was also the director of nine similar companies sharing the same registered offices in London, including Firecrest technologies, Emerdata and six SCL Group companies including "SCL elections limited". Nigel Oakes, known as the former boyfriend of Lady Helen Windsor, had founded the predecessor SCL Group in the 1990s, and in 2005 Oakes established SCL Group together with his brother Alexander Oakes and Alexander Nix; SCL Group was the parent company of Cambridge Analytica. Former Conservative minister and MP Sir Geoffrey Pattie was the founding chairman of SCL; Lord Ivar Mountbatten also joined Oakes as a director of the company. As a result of the Facebook–Cambridge Analytica data scandal, Nix was removed as CEO and replaced by Julian Wheatland before the company closed. Several of the company's executives were Old Etonians. The company's owners included several of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former British Conservative minister Jonathan Marland, Baron Marland and the family of American hedge fund manager Robert Mercer. The company combined misappropriation of digital assets, data mining, data brokerage, and data analysis with strategic communication during electoral processes. While its parent SCL had focused on influencing elections in developing countries since the 1990s, Cambridge Analytica focused more on the western world, including the United Kingdom and the United States; CEO Alexander Nix has said CA was involved in 44 U.S. political races in 2014. In 2015, CA performed data analysis services for Ted Cruz's presidential campaign. In 2016, CA worked for Donald Trump's presidential campaign as well as for Leave.EU (one of the organisations campaigning in the United Kingdom's referendum on European Union membership). CA's role in those campaigns has been controversial and is the subject of ongoing inquiries in both countries. Political scientists question CA's claims about the effectiveness of its methods of targeting voters. == Data scandal == In March 2018, media outlets broke news of Cambridge Analytica's business practices. The New York Times and The Observer reported that the company had acquired and used personal data about Facebook users from an external researcher who had told Facebook he was collecting it for academic purposes. Shortly afterwards, Channel 4 News aired undercover investigative videos showing Nix boasting about using prostitutes, bribery sting operations, and honey traps to discredit politicians on whom it had conducted opposition research, and saying that the company "ran all of (Donald Trump's) digital campaign". In response to the media reports, the Information Commissioner's Office (ICO) of the UK pursued a warrant to search the company's servers. Facebook banned Cambridge Analytica from advertising on its platform, saying that it had been deceived. On 23 March 2018, the British High Court granted the ICO a warrant to search Cambridge Analytica's London offices. As a result, Nix was suspended as CEO, and replaced by Julian Wheatland. The personal data of up to 87 million Facebook users were acquired via the 270,000 Facebook users who used a Facebook app created by Aleksandr Kogan called "This Is Your Digital Life". This was a personality profiling app and asked simple personality questions similar to other Facebook quizzes. Kogan was a scientist and psychologist, also being an employed lecturer for the University of Cambridge from 2012 to 2018. Alexander Nix claimed they had close to five thousand data points on each person who participated. They also gathered information through other data brokers ending with them acquiring millions of data points from American citizens. Kogan's app exploited a feature of Facebook's Graph API (version 1.0), which permitted any third-party app to access not only the app user's data, but also the full profile data of all of that user's Facebook friends, without those friends' knowledge or consent. This platform-wide design was available to all developers and was used by tens of thousands of apps; Facebook CEO Mark Zuckerberg later told the House Energy and Commerce Committee that the company was auditing "tens of thousands" of apps that had had access to large amounts of user data. Because the average Facebook user at the time had approximately 300 friends, the 270,000 users who installed Kogan's app yielded data on up to 87 million people. Facebook deprecated the friends-data API in April 2014 and shut it down entirely in April 2015, but data already collected by apps remained in developers' possession. Kogan passed this data to Cambridge Analytica, breaching Facebook's terms of service. On 1 May 2018, Cambridge Analytica and its parent company SCL filed for insolvency proceedings and closed operations. Alexander Tayler, a former director for Cambridge Analytica, was appointed director of Emerdata on 28 March 2018. Rebekah Mercer, Jennifer Mercer, Alexander Nix and Johnson Chun Shun Ko, who has links to American businessman Erik Prince, are in leadership positions at Emerdata. The Russo brothers are producing an upcoming film on Cambridge Analytica. In 2019 the Federal Trade Commission filed an administrative complaint against Cambridge Analytica for misuse of data. In 2020, the British Information Commissioner's Office closed a three-year inquiry into the company, concluded that Cambridge Analytica was "not involved" in the 2016 Brexit referendum and found no additional evidence for Russia's alleged interference during the campaign. US sensitive polling and election data, however, were passed to Russian Intelligence via a Cambridge Analytica contractor Sam Patten, Trump campaign manager Paul Manafort, and Russian agent Konstantin Kilimnik, who was indicted during the affair. Publicly, parent company SCL Group called itself a "global election management agency", Politico reported it was known for involvement "in military disinformation campaigns to social media branding and voter targeting". SCL gained work on a large number of campaigns for the US and UK governments' war on terror advancing their model of behavioral conflict during the 2000s. SCL's involvement in the political world has been primarily in the developing world where it has been used by the military and politicians to study and manipulate public opinion and political will. Slate writer Sharon Weinberger compared one of SCL's hypothetical test scenarios to fomenting a coup. Among the investors in Cambridge Analytica were some of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former Conservative minister Jonathan Marland, Baron Marland, Roger Gabb, the family of American hedge fund manager Robert Mercer, and Steve Bannon. A minimum of 15 million dollars has been invested into the company by Mercer, according to The New York Times. Bannon's stake in the company was estimated at 1 to 5 million dollars, but he divested his holdings in April 2017 as required by his role as White House Chief Strategist. In March 2018, Jennifer Mercer and Rebekah Mercer became directors of Emerdata limited. In March 2018 it became public by Christopher Wylie, that Cambridge Analytica's first activities were founded on a data set, which its parent company SCL bought 2014 from a company named Global Science Research founded by Aleksandr Kogan and his team present across the world who worked as a psychologist at Cambridge. During Boris Johnson's tenure as foreign secretary, the Foreign Office sought advice from Cambridge Analytica and Boris Johnson had a meeting with Alexander N

    Read more →
  • Media engagement framework

    Media engagement framework

    The media engagement framework is a planning framework used by marketing professionals to understand the behavior of social media marketing-based audiences. The construct was introduced in the book, ROI of Social Media. Powell’s background in marketing ROI and Groves' experience and understanding of the applications of social media in business led to a collaboration. Dimos joined as a brand strategist for Litmus Group, a global management consulting firm. The media engagement framework consists of the definitions of personas (Individuals, Consumers and Influencers), referenced by the competitive set or constraint that applies to that persona and the measurement framework that might be applied to those personas. It is referenced at the center of the marketing process diagram, surrounded by the marketing functions of strategy, tactics, metrics and ROI. The marketing process diagram describes how the media engagement framework can apply to any strategic marketing activity but was developed to establish a completely integrated framework describing how both traditional and social media marketing activities can be planned, executed, measured and improved. == Application == The media engagement framework provides a strategic planning construct in which measurements and metrics play a crucial role. Applying the media engagement framework aids in the development and management of an effective online marketing presence leveraging social media to engage a market or audience. By first personifying the audience, the marketer is able to identify the limiting aspect of the engagements possible with that audience segment and then, understand the type of engagement metrics to apply. Each persona makes decisions differently about how he/she acts in the social media universe. A framework metric can be applied for each of these personas: Endorsement funnel for influencers Community engagement funnel for individuals Purchase funnel for consumers Individuals, influencers and consumers make decisions based on alternatives available to them and constraints put on them. To engage with an individual brands must realize they are competing against the time an individual spends on line. If they find something else more engaging, they will engage with that activity. Brands compete against other brands for the purchases of consumers acting in the category. Lastly, influencers have only so many endorsements they can make and therefore brands compete with other endorsers for the endorsement of an influencer. Creating engaging content by keeping target audience in mind like create content that audience find it funny, interesting, and relatable will encourage audience to share it on social networks. Which will be beneficial for you brand, getting more people to know about your business and brand. Contact Digilord to create engaging content for your brand. Use of listening tools (Google Alerts, Twitter Search, SocialMention.com, Veooz.com, Alterian SM2, Radian6, Sysomos, Buzzient etc.) can be employed within the model to help identify the members of the audience segment and to support the formation of other social engagement planning and management tools.

    Read more →
  • MetaMask

    MetaMask

    MetaMask is a software cryptocurrency wallet developed by ConsenSys for interacting with the Ethereum blockchain and other EVM-compatible networks. It enables users to manage Ethereum accounts and connect to decentralized applications (dApps) via a browser extension or mobile app. As of early 2026, MetaMask reports over 100 million users worldwide. == Overview == MetaMask allows users to store and manage private keys, send and receive Ethereum-based cryptocurrencies and tokens (including ERC-20 and ERC-721 standards), broadcast transactions, and interact with dApps. dApps connect to the wallet via JavaScript interfaces, prompting users to approve signatures or transactions. The wallet features MetaMask Swaps, an in-app token swap aggregator sourcing liquidity from multiple decentralized exchanges (DEXs), with a service fee of 0.875%. In 2025, MetaMask introduced the MetaMask Rewards program (initially mobile-only), where users earn points for activities such as swaps, bridging, and referrals. Season 1 (October 2025 – January 2026) distributed over $30 million in Linea tokens and other perks to participants. == History == MetaMask launched in 2016 as open-source software under the MIT license. It initially supported browser extensions for Chrome and Firefox. Mobile versions were in closed beta from 2019 and publicly released for iOS and Android in September 2020. In August 2020, the license changed to a custom proprietary one. MetaMask Swaps launched on desktop in October 2020 and on mobile in March 2021. The Rewards program launched in late 2025 with Linea integration. == Criticism == MetaMask has faced criticism over privacy, including default analytics settings that share some user data (which can be disabled). Its reliance on Infura (acquired by ConsenSys in 2019) has raised concerns about centralization in Ethereum infrastructure. The wallet regularly issues warnings about phishing scams and fake airdrops impersonating MetaMask.

    Read more →
  • Brain Imaging Data Structure

    Brain Imaging Data Structure

    The Brain Imaging Data Structure (BIDS) is a standard for organizing, annotating, and describing data collected during neuroimaging experiments. It is based on a formalized file and directory structure and metadata files (based on JSON and TSV) with controlled vocabulary. This standard has been adopted by a multitude of labs around the world as well as databases such as OpenNeuro, SchizConnect, Developing Human Connectome Project, and FCP-INDI, and is seeing uptake in an increasing number of studies. While originally specified for MRI data, BIDS has been extended to several other imaging modalities such as MEG, EEG, and intracranial EEG (see also BIDS Extension Proposals). == History == The project is a community-driven effort. BIDS, originally OBIDS (Open Brain Imaging Data Structure), was initiated during an INCF sponsored data sharing working group meeting (January 2015) at Stanford University. It was subsequently spearheaded and maintained by Chris Gorgolewski. Since October 2019, the project is headed by a Steering Group and maintained by a separate team of maintainers, the Maintainers Group, according to a governance document that was approved of by the BIDS community in a vote. BIDS has advanced under the direction and effort of contributors, the community of researchers that appreciate the value of standardizing neuroimaging data to facilitate sharing and analysis. == BIDS Extension Proposals == BIDS can be extended in a backwards compatible way and is evolving over time. This is accomplished through BIDS Extension Proposals (BEPs), which are community-driven processes following agreed-upon guidelines. A full list of finalized BEPs and BEPs in progress can be found on the BIDS website

    Read more →
  • Vans challenge

    Vans challenge

    The Vans challenge is a viral internet challenge that began in March 2019 where people show their Vans shoes landing right-side up after tossing them in the air. The viral sensation reportedly started after a Twitter user shared a video of the occurrence, which was captioned: “Did you know it doesn’t matter how you throw your Vans they will land facing up.” Since then, multiple people on social media posted similar videos of them throwing their Vans in the air and landing right-side up, along with Crocs, UGG boots, and other popular shoes. This theory proved false, as these shoes have not always landed facing up after tossing them.

    Read more →
  • Social knowledge management

    Social knowledge management

    Social knowledge management is a business approach that aims to leverage the collective intelligence and social interactions of an organization’s members and stakeholders. It is a branch of knowledge management, which is a multidisciplinary field that deals with the creation, sharing, and use of knowledge in various domains, such as business, economics, psychology, and information management. Knowledge management seeks to enhance organizational performance, innovation, and competitiveness by managing the intangible assets of an organization, such as human capital, know-how, technology, customers, and networks. Social media plays a crucial role in social knowledge management by enhancing communication, collaboration, and learning among individuals and groups, both internally and externally. It offers valuable insights and feedback from customers, partners, and stakeholders, and aids in generating and disseminating new knowledge. In a business context, social media is utilized for various purposes, including sentiment analysis, social learning, social collaboration, and social knowledge management. Social knowledge management is one of the application areas of social media in a business context next to others like sentiment analysis, social learning or social collaboration. Social media use by businesses can strive to achieve the following things from social media strategy point of view: learn, listen, engage in conversation, measure and refine, develop capabilities, define activities, prioritize objectives etc. Social media are not only transforming private communication and interaction, they also will transform how people work. With social media knowledge work in organizations can be optimized extremely: like a better distribution sharing and access to knowledge. This will be more and more important, as in today's business world, speed and complexity increase dramatically, while work environments change constantly. == Examples of Social KM platforms == Elium, a European software application which combines social tagging, bookmarking and networking paradigms to address internal information management purposes. Sciomino was a startup enterprise social network for Social Knowledge Management.

    Read more →
  • Texture compression

    Texture compression

    Texture compression is a specialized form of image compression designed for storing texture maps in 3D computer graphics rendering systems. Unlike conventional image compression algorithms, texture compression algorithms are optimized for random access. Texture compression can be applied to reduce memory usage at runtime. Texture data is often the largest source of memory usage in a mobile application. == Tradeoffs == In their seminal paper on texture compression, Beers, Agrawala and Chaddha list four features that tend to differentiate texture compression from other image compression techniques. These features are: Decoding Speed It is highly desirable to be able to render directly from the compressed texture data and so, in order not to impact rendering performance, decompression must be fast. Random Access Since predicting the order that a renderer accesses texels would be difficult, any texture compression scheme must allow fast random access to decompressed texture data. This tends to rule out many better-known image compression schemes such as JPEG or run-length encoding. Compression Rate and Visual Quality In a rendering system, lossy compression can be more tolerable than for other use cases. Some texture compression libraries, such as crunch, allow the developer to flexibly trade off compression rate vs. visual quality, using methods such as rate–distortion optimization (RDO). Encoding Speed Texture compression is more tolerant of asymmetric encoding/decoding rates as the encoding process is often done only once during the application authoring process. Given the above, most texture compression algorithms involve some form of fixed-rate lossy vector quantization of small fixed-size blocks of pixels into small fixed-size blocks of coding bits, sometimes with additional extra pre-processing and post-processing steps. Block Truncation Coding is a very simple example of this family of algorithms. Because their data access patterns are well-defined, texture decompression may be executed on-the-fly during rendering as part of the overall graphics pipeline, reducing overall bandwidth and storage needs throughout the graphics system. As well as texture maps, texture compression may also be used to encode other kinds of rendering map, including bump maps and surface normal maps. Texture compression may also be used together with other forms of map processing such as mipmaps and anisotropic filtering. == Availability == Some examples of practical texture compression systems are S3 Texture Compression (S3TC), PVRTC, Ericsson Texture Compression (ETC) and Adaptive Scalable Texture Compression (ASTC); these may be supported by special function units in modern graphics processing units (GPUs). OpenGL and OpenGL ES, as implemented on many video accelerator cards and mobile GPUs, can support multiple common kinds of texture compression - generally through the use of vendor extensions. == Supercompression == A compressed-texture can be further compressed in what is called "supercompression". Fixed-rate texture compression formats are optimized for random access and are much less efficient compared to image formats such as PNG. By adding further compression, a programmer can reduce the efficiency gap. The extra layer can be decompressed by the CPU so that the GPU receives a normal compressed texture, or in newer methods, decompressed by the GPU itself. Supercompression saves the same amount of VRAM as regular texture compression, but saves more disk space and download size. == Neural Texture Compression == Random-Access Neural Compression of Material Textures (Neural Texture Compression) is a Nvidia's technology which enables two additional levels of detail (16× more texels, so four times higher resolution) while maintaining similar storage requirements as traditional texture compression methods. The key idea is compressing multiple material textures and their mipmap chains together, and using a small neural network, that is optimized for each material, to decompress them.

    Read more →
  • Data preservation

    Data preservation

    Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data. == History == Most historical data collected over time has been lost or destroyed. War and natural disasters combined with the lack of materials and necessary practices to preserve and protect data has caused this. Usually, only the most important data sets were saved, such as government records and statistics, legal contracts and economic transactions. Scientific research and doctoral theses data have mostly been destroyed from improper storage and lack of data preservation awareness and execution. Over time, data preservation has evolved and has generated importance and awareness. We now have many different ways to preserve data and many different important organizations involved in doing so. The first digital data preservation storage solutions appeared in the 1950s, which were usually flat or hierarchically structured. While there were still issues with these solutions, it made storing data much cheaper, and more easily accessible. In the 1970s relational databases as well as spreadsheets appeared. Relational data bases structure data into tables using structured query languages which made them more efficient than the preceding storage solutions, and spreadsheets hold high volumes of numeric data which can be applied to these relational databases to produce derivative data. More recently, non-relational (non-structured query language) databases have appeared as complements to relational databases which hold high volumes of unstructured or semi-structured data. == Importance == The scope of data preservation is vast. Everything from governmental to business records to art essentially can be represented as data, and is amenable to be lost. This then leads to loss of human history, for perpetuity. Data can be lost on a small or independent scale whether it's personal data loss, or data loss within businesses and organizations, as well as on a larger or national or global scale which can negatively and potentially permanently affect things such as environmental protection, medical research, homeland security, public health and safety, economic development and culture. The mechanisms of data loss are also as many as they are varied, spanning from disaster, wars, data breaches, negligence, all the way through simple forgetting to natural decay. Ways in which data collections can be used when preserved and stored properly can be seen through the U.S. Geological Survey, which stores data collections on natural hazards, natural resources, and landscapes. The data collected by the Survey is used by federal and state land management agencies towards land use planning and management, and continually needs access to historical reference data. == Related Concepts == In contrast, data holdings are collections of gathered data that are informally kept, and not necessarily prepared for long-term preservation. For example, a collection or back-up of personal files. Data holdings are generally the storage methods used in the past when data has been lost due to environmental and other historical disasters. Furthermore, data retention differs from data preservation in the sense that by definition, to retain an object (data) is to hold or keep possession or use of the object. To preserve an object is to protect, maintain and keep up for future use. Retention policies often circle around when data should be deleted on purpose as well, and held from public access, while preservation prioritizes permanence and more widely shared access. Thus, data preservation exceeds the concept of having or possessing data or back up copies of data. Data preservation ensures reliable access to data by including back-up and recovery mechanisms that precede the event of a disaster or technological change. == Methods == === Digital === Digital preservation, is similar to data preservation, but is mainly concerned with technological threats, and solely digital data. Essentially digital data is a set of formal activities to enable ongoing or persistent use and access of digital data exceeding the occurrence of technological malfunction or change. Digital preservation is aware of the inevitable change in technology and protocols, and prepares for data that will need to be accessible across new types of technologies and platforms while the integrity of the data and metadata are being conserved. Technology, while providing great process in conserving data that may not have been possible in the past, is also changing at such a quick rate that digital data may not be accessible anymore due to the format being incompatible with new software. Without the use of data preservation much of our existing digital data is at risk. The majority of methods used towards data preservation today are digital methods, which are so far the most effective methods that exist. === Archives === Archives are a collection of historical documents and records. Archives contribute and work towards the preservation of data by collecting data that is well organized, while providing the appropriate metadata to confirm it. An example of an important data archive is The LONI Image Data Archive, which is an archive that collects data regarding clinical trials and clinical research studies. === Catalogues, directories and portals === Catalogues, directories and portals are consolidated resources which are kept by individual institutions, and are associated with data archives and holdings. In other words, the data is not presented on the site, but instead might act as metadata and aggregators, and may administer thorough inventories. === Repositories === Repositories are places where data archives and holdings can be accessed and stored. The goal of repositories is to make sure that all requirements and protocols of archives and holdings are being met, and data is being certified to ensure data integrity and user trust. Single-site Repositories A repository that holds all data sets on a single site. An example of a major single-site repository the Data Archiving and Networking Services which is a repository which provides ongoing access to digital research resources for the Netherlands. Multi-Site Repositories A repository that hosts data set on multiple institutional sites. An example of a well known multi-site repository is OpenAIRE which is a repository that hosts research data and publications collaborating all of the EU countries and more. OpenAIRE promotes open scholarship and seeks to improves discover-ability and re-usability of data. Trusted Digital Repository A repository that seeks to provide reliable, trusted access over a long period of time. The repository can be single or multi-sited but must cooperate with the Reference Model for an Open Archival Information System, as well as adhere to a set of rules or attributes that contribute to its trust such as having persistent financial responsibility, organizational buoyancy, administrative responsibility security and safety. An example of a trusted digital repository is The Digital Repository of Ireland (DRI) which is a multi-site repository that hosts Ireland's humanity and social science data sets. === Cyber Infrastructures === Cyber infrastructures which consists of archive collections which are made available through the system of hardware, technologies, software, policies, services and tools. Cyber infrastructures are geared towards the sharing of data supporting peer-to-peer collaborations and a cultural community. An example of a major cyber-infrastructure is The Canadian Geo-spatial Data Infrastructure which provides access to spatial data in Canada.

    Read more →
  • Data independence

    Data independence

    Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be exposed to details of data representation and storage. The DBMS provides an abstract view of the data that hides such details. There are two types of data independence: physical and logical data independence. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. == Logical data independence == The logical structure of the data is known as the 'schema definition'. In general, if a user application operates on a subset of the attributes of a relation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas. == Physical data independence == The physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are three types of data independence: Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema or having to rewrite existing application programs. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. View level data independence: always independent no effect, because there doesn't exist any other level above view level. == Data independence == Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture. The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this, external schema does not change. In this, internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user. == Data independence types == The ability to modify schema definition in one level without affecting schema of that definition in the next higher level is called data independence. There are two levels of data independence, they are Physical data independence and Logical data independence. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. It means we change the physical storage/level without affecting the conceptual or external view of the data. The new changes are absorbed by mapping techniques. Logical data independence is the ability to modify the logical schema without causing application programs to be rewritten. Modifications at the logical level are necessary whenever the logical structure of the database is altered (for example, when money-market accounts are added to banking system). Logical Data independence means if we add some new columns or remove some columns from table then the user view and programs should not change. For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and "EmployeeName". If user B adds a new column (e.g. salary) to his table, it will not affect the external view for user A, though the internal schema of the database has been changed for both users A & B. Logical data independence is more difficult to achieve than physical data independence, since application programs are heavily dependent on the logical structure of the data that they access.

    Read more →
  • Key (cryptography)

    Key (cryptography)

    A key in cryptography is a piece of information, usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm, can encode or decode cryptographic data. Based on the used method, the key can be different sizes and varieties, but in all cases, the strength of the encryption relies on the security of the key being maintained. A key's security strength is dependent on its algorithm, the size of the key, the generation of the key, and the process of key exchange. == Scope == The key is what is used to encrypt data from plaintext to ciphertext. There are different methods for utilizing keys and encryption. === Symmetric cryptography === Symmetric cryptography refers to the practice of the same key being used for both encryption and decryption. === Asymmetric cryptography === Asymmetric cryptography has separate keys for encrypting and decrypting. These keys are known as the public and private keys, respectively. == Purpose == Since the key protects the confidentiality and integrity of the system, it is important to be kept secret from unauthorized parties. With public key cryptography, only the private key must be kept secret, but with symmetric cryptography, it is important to maintain the confidentiality of the key. Kerckhoff's principle states that the entire security of the cryptographic system relies on the secrecy of the key. == Key sizes == Key size is the number of bits in the key defined by the algorithm. This size defines the upper bound of the cryptographic algorithm's security. The larger the key size, the longer it will take before the key is compromised by a brute force attack. Since perfect secrecy is not feasible for key algorithms, researches are now more focused on computational security. In the past, keys were required to be a minimum of 40 bits in length, however, as technology advanced, these keys were being broken quicker and quicker. As a response, restrictions on symmetric keys were enhanced to be greater in size. Currently, 2048 bit RSA is commonly used, which is sufficient for current systems. However, current RSA key sizes would all be cracked quickly with a powerful quantum computer. "The keys used in public key cryptography have some mathematical structure. For example, public keys used in the RSA system are the product of two prime numbers. Thus public key systems require longer key lengths than symmetric systems for an equivalent level of security. 3072 bits is the suggested key length for systems based on factoring and integer discrete logarithms which aim to have security equivalent to a 128 bit symmetric cipher." == Key generation == To prevent a key from being guessed, keys need to be generated randomly and contain sufficient entropy. The problem of how to safely generate random keys is difficult and has been addressed in many ways by various cryptographic systems. A key can directly be generated by using the output of a Random Bit Generator (RBG), a system that generates a sequence of unpredictable and unbiased bits. A RBG can be used to directly produce either a symmetric key or the random output for an asymmetric key pair generation. Alternatively, a key can also be indirectly created during a key-agreement transaction, from another key or from a password. Some operating systems include tools for "collecting" entropy from the timing of unpredictable operations such as disk drive head movements. For the production of small amounts of keying material, ordinary dice provide a good source of high-quality randomness. == Establishment scheme == The security of a key is dependent on how a key is exchanged between parties. Establishing a secured communication channel is necessary so that outsiders cannot obtain the key. A key establishment scheme (or key exchange) is used to transfer an encryption key among entities. Key agreement and key transport are the two types of a key exchange scheme that are used to be remotely exchanged between entities . In a key agreement scheme, a secret key, which is used between the sender and the receiver to encrypt and decrypt information, is set up to be sent indirectly. All parties exchange information (the shared secret) that permits each party to derive the secret key material. In a key transport scheme, encrypted keying material that is chosen by the sender is transported to the receiver. Either symmetric key or asymmetric key techniques can be used in both schemes. The Diffie–Hellman key exchange and Rivest-Shamir-Adleman (RSA) are the most two widely used key exchange algorithms. In 1976, Whitfield Diffie and Martin Hellman constructed the Diffie–Hellman algorithm, which was the first public key algorithm. The Diffie–Hellman key exchange protocol allows key exchange over an insecure channel by electronically generating a shared key between two parties. On the other hand, RSA is a form of the asymmetric key system which consists of three steps: key generation, encryption, and decryption. Key confirmation delivers an assurance between the key confirmation recipient and provider that the shared keying materials are correct and established. The National Institute of Standards and Technology recommends key confirmation to be integrated into a key establishment scheme to validate its implementations. == Management == Key management concerns the generation, establishment, storage, usage and replacement of cryptographic keys. A key management system (KMS) typically includes three steps of establishing, storing and using keys. The base of security for the generation, storage, distribution, use and destruction of keys depends on successful key management protocols. == Key vs password == A password is a memorized series of characters including letters, digits, and other special symbols that are used to verify identity. It is often produced by a human user or a password management software to protect personal and sensitive information or generate cryptographic keys. Passwords are often created to be memorized by users and may contain non-random information such as dictionary words. On the other hand, a key can help strengthen password protection by implementing a cryptographic algorithm which is difficult to guess or replace the password altogether. A key is generated based on random or pseudo-random data and can often be unreadable to humans. A password is less safe than a cryptographic key due to its low entropy, randomness, and human-readable properties. However, the password may be the only secret data that is accessible to the cryptographic algorithm for information security in some applications such as securing information in storage devices. Thus, a deterministic algorithm called a key derivation function (KDF) uses a password to generate the secure cryptographic keying material to compensate for the password's weakness. Various methods such as adding a salt or key stretching may be used in the generation.

    Read more →
  • Google Cloud Dataflow

    Google Cloud Dataflow

    Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. Dataflow provides a fully managed service for executing Apache Beam pipelines, offering features like autoscaling, dynamic work rebalancing, and a managed execution environment. Dataflow is suitable for large-scale, continuous data processing jobs, and is one of the major components of Google's big data architecture on the Google Cloud Platform. At its core, Dataflow's architecture is designed to abstract away infrastructure management, allowing developers to focus purely on the logic of their data processing tasks. When a pipeline written using the Apache Beam SDK is submitted, Dataflow translates this high-level definition into an optimized job graph. The service then provisions and manages a fleet of Google Compute Engine workers to execute this graph in a highly parallelized and fault-tolerant manner. This serverless approach, combined with intelligent autoscaling of both the number of workers (horizontal) and the resources per worker (vertical), ensures that jobs have the precise amount of computational power needed at any given time, optimizing both performance and cost. The service's deep integration with the Google Cloud ecosystem makes it a powerful tool for a variety of use cases beyond simple data movement. For real-time analytics, Dataflow can ingest unbounded streams of data from Cloud Pub/Sub, perform complex transformations, and load results into BigQuery for immediate querying. In machine learning workflows, it is commonly used to preprocess and transform massive datasets stored in Cloud Storage, preparing them for training models in Vertex AI. This versatility makes it the central processing engine for modern ETL (Extract, Transform, Load) operations, streaming analytics, and large-scale data preparation within the cloud. == History == Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in April, 2015. In January, 2016 Google donated the underlying SDK, the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. The donated code formed the original basis for Apache Beam. In August 2022, there was an incident where user timers were broken for certain Dataflow streaming pipelines in multiple regions, which was later resolved. Throughout 2023 and 2024, there have been various other updates and incidents affecting Google Cloud Dataflow, as documented in the release notes and service health history. The donation of the Dataflow SDK to the Apache Software Foundation was a pivotal moment, establishing Apache Beam as a unified, open-source programming model for defining both batch and streaming data pipelines. This strategic move decoupled the pipeline definition from the execution engine. As a result, developers could write portable data processing logic that was not locked into Google's ecosystem. A Beam pipeline can be executed on various runners, including Apache Flink, Apache Spark, and, of course, the highly optimized Google Cloud Dataflow service, providing flexibility and future-proofing data processing investments. == Features == Google Cloud Dataflow supports both batch and streaming data processing pipelines. It automatically handles resource provisioning, data sharding, and scaling according to workload, reducing manual configuration needed for large-scale data operations. == Use cases == Dataflow is used for ETL (Extract, Transform, Load) data pipelines, real-time analytics, and event stream processing for companies in industries such as finance, advertising, and IoT.

    Read more →
  • Software token

    Software token

    A software token (a.k.a. soft token) is a piece of a two-factor authentication security device that may be used to authorize the use of computer services. Software tokens are stored on a general-purpose electronic device such as a desktop computer, laptop, PDA, or mobile phone and can be duplicated. (Contrast hardware tokens, where the credentials are stored on a dedicated hardware device and therefore cannot be duplicated — absent physical invasion of the device) Because software tokens are something one does not physically possess, they are exposed to unique threats based on duplication of the underlying cryptographic material - for example, computer viruses and software attacks. Both hardware and software tokens are vulnerable to bot-based man-in-the-middle attacks, or to simple phishing attacks in which the one-time password provided by the token is solicited, and then supplied to the genuine website in a timely manner. Software tokens do have benefits: there is no physical token to carry, they do not contain batteries that will run out, and they are cheaper than hardware tokens. == Security architecture == There are two primary architectures for software tokens: shared secret and public-key cryptography. For a shared secret, an administrator will typically generate a configuration file for each end-user. The file will contain a username, a personal identification number, and the secret. This configuration file is given to the user. The shared secret architecture is potentially vulnerable in a number of areas. The configuration file can be compromised if it is stolen and the token is copied. With time-based software tokens, it is possible to borrow an individual's PDA or laptop, set the clock forward, and generate codes that will be valid in the future. Any software token that uses shared secrets and stores the PIN alongside the shared secret in a software client can be stolen and subjected to offline attacks. Shared secret tokens can be difficult to distribute, since each token is essentially a different piece of software. Each user must receive a copy of the secret, which can create time constraints. Some newer software tokens rely on public-key cryptography, or asymmetric cryptography. This architecture eliminates some of the traditional weaknesses of software tokens, but does not affect their primary weakness (ability to duplicate). A PIN can be stored on a remote authentication server instead of with the token client, making a stolen software token no good unless the PIN is known as well. However, in the case of a virus infection, the cryptographic material can be duplicated and then the PIN can be captured (via keylogging or similar) the next time the user authenticates. If there are attempts made to guess the PIN, it can be detected and logged on the authentication server, which can disable the token. Using asymmetric cryptography also simplifies implementation, since the token client can generate its own key pair and exchange public keys with the server.

    Read more →
  • Cleo Communications

    Cleo Communications

    Cleo Communications LLC, simply referred to as Cleo, is a privately held software company founded in 1976. The company is best known for its ecosystem integration platform, Cleo Integration Cloud with RADAR. == History == Cleo originally began as a division of Phone 1 Inc., a voice data gathering systems manufacturer, and built data concentrators and terminal emulators — multi-bus computers, modems, and terminals to interface with IBM mainframes via bisynchronous communications. The company then began developing mainframe middleware in the 1980s, and with the rise of the PC, moved into B2B data communications and secure file transfer software. Cleo Communications was acquired in 2012 by Global Equity Partners along with other investment companies. Since being acquired in 2012, the company’s offerings have evolved into Cleo Integration Cloud, a platform for enterprise business integration. == Business == Based in Rockford, Illinois (USA), with offices in Chicago, Pennsylvania, London, and Bangalore, Cleo has about 400 employees and more than 4,100 direct customers. The company's flagship offering, Cleo Integration Cloud, provides both on-premise and cloud-based integration technologies and comprises solutions for B2B/EDI, application integration, data movement and data transformation. Previous products now incorporated into the Cleo Integration Cloud platform include Cleo Harmony, Cleo Clarify, and Cleo Jetsonic. Cleo solutions span a variety of industries, including manufacturing, logistics and supply chain, retail, third-party logistics, warehouse management and transportation management, healthcare, financial services and government. The U.S. Department of Veterans Affairs adopted Cleo's fax technology, Cleo Streem, in 2013 when in need of FIPS 140-2-compliant technology to protect information, and the City of Atlanta has used Cleo Streem for network and desktop faxing since 2006. Cleo also serves U.S. transportation logistics company MercuryGate International and SaaS-based food logistics organization ArrowStream. It powers the architecture for several major supply chain companies, such as Blue Yonder and SAP. Cleo integrates the pharmaceutical supply chain for such companies as Octapharma. Key partners include FourKites and ClientsFirst, among many others. In May 2023, Cleo announced it entered a global partnership with consulting and multinational information technology services company, Cognizant (NASDAQ: CTSH). Together, the companies announced CCIB, powered by Cleo, which is a B2B iPaaS solution that provides B2B managed services with built-in, scalable infrastructure on the cloud. The solution comprises elements from Cleo’s flagship offering, Cleo Integration Cloud. == Expansion == In June 2014, Cleo opened an office in Chicago for members of its support and Ashok and teams. In 2014, the company hired Jorge Rodriguez as Senior Vice President of Product Development and John Thielens as Vice President of Technology. Cleo hired Dave Brunswick as Vice President of Solutions for North America in 2015, and Cleo hired Ken Lyons to lead global sales in 2016. Lyons now serves as the company's Chief Revenue Officer. More recent additions to the company's leadership team include Vipin Mittal, Vice President, Customer Experience, and Tushar Patel, CMO. Cleo opened its product development facility in Bengaluru, India, in 2015 and expanded its hybrid cloud integration teams into a new office there in 2017. The company also opened a London office in 2016 and expanded its network of channel partners in EMEA. In 2016, Cleo acquired EXTOL International, a Pottsville, Pa.-based business and EDI integration and data transformation company for an undisclosed amount. In 2017, the company moved its headquarters from Loves Park, Illinois, to Rockford. In 2021 the company received a significant growth investment from H.I.G. Capital. In July 2022, Cleo opened a new, 5,000-square-foot office located in Chicago's Loop. In November 2022, Cleo launched an accelerator for Microsoft Dynamics 365 SCM-to-X12 and a connector for Microsoft Dynamics 365 Business Central. These pre-built solutions allow businesses and users to quickly build integration flows that integrate their digital ecosystems. In March 2023, Cleo released CIC PAVE (Procurement Automation and Vendor Enablement). PAVE provides customers with enhanced supply chain visibility via a supplier portal that allows the customer to keep vendor interaction in a single location, even if they cannot use EDI or have API-ready applications. In December 2023, Cleo acquired ECS International, an integration technology company based in the Netherlands. == Certification == Cleo regularly submits its products to Drummond Group's interoperability software testing for AS2, AS3 and ebMS 2.0. In January 2020, Cleo announced that its new application connector for Acumatica ERP has been recognized as an Acumatica-Certified Application (ACA). The company also holds SOC 2, Type 2 certification. == Awards == Cleo was a Xerox partner of the year award for five years, from 2009 to 2014. The Cleo Streem solution integrates with Xerox multi-function products, providing customers with solutions for network fax and interactive messaging needs. Cleo was named to Food Logistics’ FL100+ Top Software and Technology Providers Lists in 2016, 2017, 2019 and 2020. Cleo CEO, Mahesh Rajasekharan was named an Ernst & Young Entrepreneur Of The Year 2022 Midwest Award winner. Rajasekharan is serving as a judge for the 2023 Ernst & Young Entrepreneur Of the Year Awards. As of April 2022, Cleo has been named a Leader in EDI on the G2 Grid, a peer-to-peer review site, for 20 straight quarters. In Spring 2023, Cleo won 23 G2 awards—including EDI Leader Enterprise, MFT Leader Enterprise, On-Premise Data Integration Best Support Enterprise, and iPaaS High Performer Asia.

    Read more →