AI Data Center

AI Data Center — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • DoorDash

    DoorDash

    DoorDash, Inc. is an American company operating online food ordering and food delivery. It trades under the symbol DASH. With a 56% market share, DoorDash is the largest food delivery platform in the United States. It also has a 60% market share in the convenience delivery category. As of December 31, 2020, the platform was used by 450,000 merchants, 20 million consumers, and had over one million delivery couriers. Founded by Tony Xu, Andy Fang, Stanley Tang and Evan Moore, DoorDash made its debut on the Fortune 500 list in 2024, ranking No. 443. DoorDash has been sued for or held legally liable for withholding tips, reducing tip transparency, antitrust price manipulation, listing restaurants without permission, misclassifying workers, withholding sick time, and illegally selling personal data. As of April 2026, DoorDash operates in the United States (including Puerto Rico), Canada, Australia, and New Zealand. Through its subsidiaries Deliveroo and Wolt, the company also operates across Europe, as well as in Azerbaijan, Georgia, Israel, Kazakhstan, Kuwait, and the United Arab Emirates. == History == In January 2013, Stanford University students Tony Xu, Stanley Tang, Andy Fang and Evan Moore launched PaloAltoDelivery.com in Palo Alto, California. In the summer of 2013, it received US$120,000 in seed money from Y Combinator in exchange for a 7% stake. It incorporated as DoorDash in June 2013. DoorDash's first partnership with a fast food burger restaurant chain was in April 2016, when it partnered with CKE Restaurants, parent company of Carl's Jr. and Hardee's, for food delivery. In December 2017, DoorDash announced its partnership with Wendy's for delivery from its restaurants. In December 2018, DoorDash overtook Uber Eats to hold the second position in total US food delivery sales, behind GrubHub. By March 2019, it had exceeded GrubHub in total sales, at 27.6% of the on-demand delivery market. By early 2019, DoorDash was the largest food delivery provider in the U.S., as measured by consumer spending. In October 2019, DoorDash opened its first ghost kitchen, DoorDash Kitchen, in Redwood City, California, with four restaurants operating at the location. By June 2020, DoorDash had raised more than $2.5 billion over several financing rounds from investors including Y Combinator, Charles River Ventures, SV Angel, Khosla Ventures, Sequoia Capital, SoftBank Group, GIC, and Kleiner Perkins. DoorDash announced a partnership with KFC in September 2020, followed by Taco Bell in October 2020. In November 2020, DoorDash announced the opening of its first physical restaurant location, partnering up with Bay Area restaurant Burma Bites to offer delivery and pick-up orders. In December 2020, it became a public company via an initial public offering, raising $3.37 billion. In November 2021, DoorDash acquired Finland's Wolt for €7bn. In August 2022, DoorDash announced it would end its partnership with Walmart in September, ending the companies' cooperation agreement from 2018. In November 2022, DoorDash announced plans to lay off 1,250 corporate employees, or about six percent of its workforce, to rein in expenses. In June 2023, DoorDash announced it would give its drivers the option of earning an hourly minimum wage instead of being paid per delivery. However, drivers are only paid hourly when on an active delivery. In September 2023, the company transferred its stock listing from the New York Stock Exchange to the Nasdaq. On December 18, 2023, DoorDash was added to the Nasdaq-100 index. In March 2025, DoorDash announced a partnership with Klarna, a Buy Now, Pay Later (BNPL) service, letting customers schedule small payments over a set period of time. DoorDash received widespread criticism from this decision, including internet mockery, given concerns about the increase of household debt in America. In 2025, DoorDash acquired the UK-based delivery service Deliveroo for $3.88 billion. The combined company operates in 40 countries and serves 50 million users monthly. In September 2025, DoorDash and Ace Hardware (the largest hardware cooperative) announced their partnership to offer delivery for home use products from over 4,000 Ace locations. == Lawsuits against DoorDash == === 2017 class-action lawsuit for misclassifying workers === In 2017, a class-action lawsuit was filed against DoorDash for allegedly misclassifying delivery drivers in California and Massachusetts as independent contractors. In 2022, a tentative settlement was reached in which DoorDash would pay $100 million total, with $61 million going to over 900,000 drivers, paying out just over $130 per driver, and $28 million for the lawyers. Gizmodo criticized the settlement, noting that the $413 million that DoorDash CEO Tony Xu received the previous year was one of the largest CEO compensation packages of all time. === 2019 data breach lawsuit === On May 4, 2019, DoorDash confirmed 4.9 million customers, delivery workers and merchants had sensitive information stolen via a data breach. Those who joined the platform after April 5, 2018, were unaffected by the breach. A class-action lawsuit for the breach was filed against DoorDash in October 2019. === Withholding of tips and subsequent class-action lawsuits === In July 2019, the company's tipping policy was criticized by The New York Times, and later The Verge and Vox and Gothamist. Drivers receive a guaranteed minimum per order that is paid by DoorDash by default. When a customer added a tip, instead of going directly to the driver, it first went to the company to cover the guaranteed minimum. Drivers then only directly received the part of the tip that exceeded the guaranteed minimum per order. In January 2020, it was reported that DoorDash had lied about skimming tips from its drivers, causing them to earn an average of $1.45 an hour after expenses, and that after the company had allegedly overhauled its tipping system, DoorDash was still manipulating per-delivery payouts at the expense of drivers. A DoorDash customer filed a class action lawsuit against the company for its "materially false and misleading" tipping policy. The case was referred to arbitration in August 2020. Under pressure, the company revised its policy. The company settled a lawsuit with District of Columbia Attorney General Karl Racine for $2.5 million, with funds going to deliverers, the government, and to charity. ==== 2021 driver strike for tip transparency ==== In July 2021, DoorDash drivers went on strike to protest lack of tip transparency and to ask for higher pay. At the time of the strike, and, as of June 2022, DoorDash did not allow drivers to see the full tip amounts prior to accepting a delivery in the app. If customers tip over a set amount for the order total, Doordash hides a portion of the tip until the delivery is complete. The strike occurred after DoorDash rewrote its code to cut off access to Para, a third-party app that drivers had been using to see the full tip amounts. ==== 2025 class-action lawsuit settlement ==== In 2025, DoorDash agreed to pay around $17 million for "misleading both consumers and delivery workers" with tips being docked from drivers' pay instead of directly going to drivers. === 2020 antitrust litigation === In April 2020, in the case of Davitashvili v. GrubHub Inc. DoorDash, Grubhub, Postmates, and Uber Eats were accused of monopolistic power by only listing restaurants on its apps if the restaurant owners signed contracts which include clauses that require prices be the same for dine-in customers as for customers receiving delivery. The plaintiffs stated that this arrangement increases the cost for dine-in customers, as they are required to subsidize the cost of delivery; and that the apps charge "exorbitant" fees, which range from 13% to 40% of revenue, while the average restaurant's profit ranges from 3% to 9% of revenue. The lawsuit seeks treble damages, including for overcharges, since April 14, 2016, for dine-in and delivery customers in the United States at restaurants using the defendants’ delivery apps. Although several preliminary documents in the case have now been filed, a trial date has not yet been set. === Litigation for illegal unauthorized restaurant listing === In May 2021, DoorDash was criticized for unauthorized listings of restaurants who had not given permission to appear on the app. The company was sued by Lona's Lil Eats in St. Louis, with the lawsuit claiming that DoorDash had listed them without permission, then prevented any orders to the restaurant from going through and redirecting customers to other restaurants instead, because Lona's was "too far away," when in reality it had not paid DoorDash a fee for listing. This aspect of DoorDash's business practice is illegal in California. === 2021 lawsuit by the city of Chicago === In August 2021, the city of Chicago sued DoorDash and GrubHub. According to Chicago mayor Lori Lightfoot, the companies broke the law by using "unfair and deceptive t

    Read more →
  • Algorithmic management

    Algorithmic management

    Algorithmic management is a term used to describe certain labor management practices in the contemporary digital economy. In scholarly uses, the term was initially coined in 2015 by Min Kyung Lee, Daniel Kusbit, Evan Metsky, and Laura Dabbish to describe the managerial role played by algorithms on the Uber and Lyft platforms, but has since been taken up by other scholars to describe more generally the managerial and organisational characteristics of platform economies. However, digital direction of labor was present in manufacturing already since the 1970s and algorithmic management is becoming increasingly widespread across a wide range of industries. The concept of algorithmic management can be broadly defined as the delegation of managerial functions to algorithmic and automated systems. Algorithmic management has been enabled by "recent advances in digital technologies" which allow for the real-time and "large-scale collection of data" which is then used to "improve learning algorithms that carry out learning and control functions traditionally performed by managers". The term does not refer to a specific underlying technology, and encompasses the design choices, organisational policies, and governance that surround the managerial use of algorithms in workplaces. In the contemporary workplace, firms employ an ecology of accounting devices, such as "rankings, lists, classifications, stars and other symbols' in order to effectively manage their operations and create value without the need for traditional forms of hierarchical control." Many of these devices fall under the label of what is called algorithmic management, and were first developed by companies operating in the sharing economy or gig economy, functioning as effective labor and cost cutting measures. The Data&Society explainer of the term, for example, describes algorithmic management as 'a diverse set of technological tools and techniques that structure the conditions of work and remotely manage workforces. Data&Society also provides a list of five typical features of algorithmic management: Prolific data collection and surveillance of workers through technology; Real-time responsiveness to data that informs management decisions; Automated or semi-automated decision-making; Transfer of performance evaluations to rating systems or other metrics; and The use of "nudges" and penalties to indirectly incentivize worker behaviors. Proponents of algorithmic management claim that it "creates new employment opportunities, better and cheaper consumer services, transparency and fairness in parts of the labour market that are characterised by inefficiency, opacity and capricious human bosses." On the other hand, critics of algorithmic management claim that the practice leads to several issues, especially as it impacts the employment status of workers managed by its new array of tools and techniques. == History of the term == "Algorithmic management" was first described by Lee, Kusbit, Metsky, and Dabbish in 2015 in their study of the Uber and Lyft platforms. In their study, Lee et al. termed "software algorithms that assume managerial functions and surrounding institutional devices that support algorithms in practice" algorithmic management. Software algorithms, it was said, are increasingly used to "allocate, optimize, and evaluate work" by platforms in managing their vast workforces. In Lee et al.'s paper on Uber and Lyft this included the use of algorithms to assign work to drivers, as mechanisms to optimise pricing for services, and as systems for evaluating driver performance. In 2016, Alex Rosenblat and Luke Stark sought to extend on this understanding of algorithmic management "to elucidate on the automated implementation of company policies on the behaviours and practices of Uber drivers." Rosenblat and Stark found in their study that algorithmic management practices contributed to a system beset by power asymmetries, where drivers had little control over "critical aspects of their work", whereas Uber had far greater control over the labor of its drivers. Since this time, studies of algorithmic management have extended the use of the term to describe the management practices of various firms, where, for example, algorithms "are taking over scheduling work in fast food restaurants and grocery stores, using various forms of performance metrics ad even mood... to assign the fastest employees to work in peak times." Algorithmic management is seen to be especially prevalent in gig work on platforms, such as on Upwork and Deliveroo, and in the sharing economy, such as in the case of Airbnb. Furthermore, recent research has defined sub-constructs that fall under the umbrella term of algorithmic management, for example, "algorithmic nudging". A Harvard Business Review article published in 2021 explains: "Companies are increasingly using algorithms to manage and control individuals not by force, but rather by nudging them into desirable behavior — in other words, learning from their personalized data and altering their choices in some subtle way." While the concept builds on nudging theory popularized by University of Chicago economist Richard Thaler and Harvard Law School professor Cass Sunstein, "due to recent advances in AI and machine learning, algorithmic nudging is much more powerful than its non-algorithmic counterpart. With so much data about workers' behavioral patterns at their fingertips, companies can now develop personalized strategies for changing individuals' decisions and behaviors at large scale. These algorithms can be adjusted in real-time, making the approach even more effective." == Relationships with other labor management practices == Algorithmic management has been compared and contrasted with other forms of management, such as Scientific management approaches, as pioneered by Frederick Taylor in the early 1900s. Henri Schildt has called algorithmic management "Scientific management 2.0", where management "is no longer a human practice, but a process embedded in technology." Similarly, Kathleen Griesbach, Adam Reich, Luke Elliott-Negri, and Ruth Milkman suggest that, while "algorithmic control over labor may be relatively new, it replicates many features of older mechanisms of labor control." On the other hand, some commentators have argued that algorithmic management is not simply a new form of Scientific management or digital Taylorism, but represents a distinct approach to labor control in platform economies. David Stark and Ivana Pais, for example, state that, "In contrast to Scientific Management at the turn of the twentieth century, in the algorithmic management of the twenty-first century there are rules but these are not bureaucratic, there are rankings but not ranks, and there is monitoring but it is not disciplinary. Algorithmic management does not automate bureaucratic structures and practices to create some new form of algorithmic bureaucracy. Whereas the devices and practices of Taylorism were part of a system of hierarchical supervision, the devices and practices of algorithmic management take place within a different economy of attention and a new regime of visibility. Triangular rather than vertical, and not as a panopticon, the lines of vision in algorithmic management are not lines of supervision." Similarly, Data&Society's explainer for algorithmic management claims that the practice represents a marked departure from earlier management structures that more strongly rely on human supervisors to direct workers. In analyzing the difference and the similarities to previous management styles, David Stark and Pieter Vanden Broeck expand the applicability of algorithmic management beyond the workplace. They develop a theory of algorithmic management in terms of broader changes in the shape and structure of organization in the 21st century, attentive to the erosion of organization's boundaries whereby heterogeneous actors, assets, and activities, are coopted regardless of their place in organizational space. Stark and Vanden Broeck propose the following means of differentiating algorithmic management from other historical managerial paradigms: == Issues == Algorithmic management can provide an effective and efficient means of workforce control and value creation in the contemporary digital economy. However, commentators have highlighted several issues that algorithmic management poses, especially for the workers it manages. Criticisms of the practice often highlight several key issues pertaining to algorithmic management practices, such as the imperfection and scope of its surveillance and control measures, which also threaten to lock workers out of key decision-making processes; its lack of transparency for users and information asymmetries; its potential for bias and discrimination; its dehumanizing tendencies; and its potential to create conditions which sidestep traditional employer-employee accountability. This last point has been especi

    Read more →
  • Ontology alignment

    Ontology alignment

    Ontology alignment, or ontology matching, is the process of determining correspondences between concepts in ontologies. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in computer science, cognitive science or philosophy. == Computer science == For computer scientists, concepts are expressed as labels for data. Historically, the need for ontology alignment arose out of the need to integrate heterogeneous databases, ones developed independently and thus each having their own data vocabulary. In the Semantic Web context involving many actors providing their own ontologies, ontology matching has taken a critical place for helping heterogeneous resources to interoperate. Ontology alignment tools find classes of data that are semantically equivalent, for example, "truck" and "lorry". The classes are not necessarily logically identical. According to Euzenat and Shvaiko (2007), there are three major dimensions for similarity: syntactic, external, and semantic. Coincidentally, they roughly correspond to the dimensions identified by Cognitive Scientists below. A number of tools and frameworks have been developed for aligning ontologies, some with inspiration from Cognitive Science and some independently. Ontology alignment tools have generally been developed to operate on database schemas, XML schemas, taxonomies, formal languages, entity-relationship models, dictionaries, and other label frameworks. They are usually converted to a graph representation before being matched. Since the emergence of the Semantic Web, such graphs can be represented in the Resource Description Framework line of languages by triples of the form , as illustrated in the Notation 3 syntax. In this context, aligning ontologies is sometimes referred to as "ontology matching". The problem of Ontology Alignment has been tackled recently by trying to compute matching first and mapping (based on the matching) in an automatic fashion. Systems like DSSim, X-SOM or COMA++ obtained at the moment very high precision and recall. The Ontology Alignment Evaluation Initiative aims to evaluate, compare and improve the different approaches. === Formal definition === Given two ontologies i = ⟨ C i , R i , I i , T i , V i ⟩ {\displaystyle i=\langle C_{i},R_{i},I_{i},T_{i},V_{i}\rangle } and j = ⟨ C j , R j , I j , T j , V j ⟩ {\displaystyle j=\langle C_{j},R_{j},I_{j},T_{j},V_{j}\rangle } where C {\displaystyle C} is the set of classes, R {\displaystyle R} is the set of relations, I {\displaystyle I} is the set of individuals, T {\displaystyle T} is the set of data types, and V {\displaystyle V} is the set of values, we can define different types of (inter-ontology) relationships. Such relationships will be called, all together, alignments and can be categorized among different dimensions: similarity vs logic: this is the difference between matchings (predicating about the similarity of ontology terms), and mappings (logical axioms, typically expressing logical equivalence or inclusion among ontology terms) atomic vs complex: whether the alignments we considered are one-to-one, or can involve more terms in a query-like formulation (e.g., LAV/GAV mapping) homogeneous vs heterogeneous: do the alignments predicate on terms of the same type (e.g., classes are related only to classes, individuals to individuals, etc.) or we allow heterogeneity in the relationship? type of alignment: the semantics associated to an alignment. It can be subsumption, equivalence, disjointness, part-of or any user-specified relationship. Subsumption, atomic, homogeneous alignments are the building blocks to obtain richer alignments, and have a well defined semantics in every Description Logic. Let's now introduce more formally ontology matching and mapping. An atomic homogeneous matching is an alignment that carries a similarity degree s ∈ [ 0 , 1 ] {\displaystyle s\in [0,1]} , describing the similarity of two terms of the input ontologies i {\displaystyle i} and j {\displaystyle j} . Matching can be either computed, by means of heuristic algorithms, or inferred from other matchings. Formally we can say that, a matching is a quadruple m = ⟨ i d , t i , t j , s ⟩ {\displaystyle m=\langle id,t_{i},t_{j},s\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms, s {\displaystyle s} is the similarity degree of m {\displaystyle m} . A (subsumption, homogeneous, atomic) mapping is defined as a pair μ = ⟨ t i , t j ⟩ {\displaystyle \mu =\langle t_{i},t_{j}\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms. == Cognitive science == For cognitive scientists interested in ontology alignment, the "concepts" are nodes in a semantic network that reside in brains as "conceptual systems." The focal question is: if everyone has unique experiences and thus different semantic networks, then how can we ever understand each other? This question has been addressed by a model called ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems for Translation). Three major dimensions have been identified for similarity as equations for "internal similarity, external similarity, and mutual inhibition." == Ontology alignment methods == Two sub research fields have emerged in ontology mapping, namely monolingual ontology mapping and cross-lingual ontology mapping. The former refers to the mapping of ontologies in the same natural language, whereas the latter refers to "the process of establishing relationships among ontological resources from two or more independent ontologies where each ontology is labelled in a different natural language". Existing matching methods in monolingual ontology mapping are discussed in Euzenat and Shvaiko (2007). Approaches to cross-lingual ontology mapping are presented in Fu et al. (2011).

    Read more →
  • KiSAO

    KiSAO

    The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. KiSAO is part of the BioModels.net project and of the COMBINE initiative. == Structure == KiSAO consists of three main branches: simulation algorithm simulation algorithm characteristic simulation algorithm parameter The elements of each algorithm branch are linked to characteristic and parameter branches using has characteristic and has parameter relationships accordingly. The algorithm branch itself is hierarchically structured using relationships which denote that the descendant algorithms were derived from, or specify, more general ancestors.

    Read more →
  • Resilience week

    Resilience week

    Resilience week is an annual symposium established to enable cross-disciplinary and role based discussions to advance strategies and research that engenders resilience in critical infrastructure systems and communities. Damaging storms, cyber attack and the interconnection of critical infrastructure systems can lead to cascading events that not only affect local but also across regions. However, many of these interdependencies are not easily recognized and obscure and complicate the mitigation of risk. The purpose of the symposia series is hence to facilitate best practice in managing critical infrastructure risks, by bringing together businesses, government and researchers. == Background == Originally organized in 2008 as a focus on the new research area of resilient control systems, including the disciplinary areas of control system, cyber-security, cognitive psychology and any number of critical infrastructure domains. Resilience has long been recognized as an area that requires not only the contributions of multiple disciplines or multidisciplinary participation, but interdisciplinary interaction where there is a common language and familiarity of the contributors to what other disciplines (and roles) contribute. The resulting interactions developed by Resilience Week and associated activities are intended to culture this sharing environment as a safe zone for inclusion; more importantly, an environment that lends to developing the new science and practice. As the attributes of resilience are complex, the contributions and topics for the event have included both the disciplinary and the project considerations, in keynotes, panels and research presentations. Keynotes have included senior leadership in the Department of Energy, Department of Defense, Department of Homeland Security, the National Science Foundation, and other agencies in addition to National Academy and professional organization fellows and senior industry leaders. Project panels and research presentations include emergent topics in resilience to climate change, cyber attack, damaging storms and the energy assurance. Topics Areas of focus have included: Control Systems Cyber Systems Cognitive Systems Communications Systems Communities and Infrastructure Project Focus Areas have included: Dependencies and Interdependencies Cyber Resilience for Operating Technology Commercializing Research and Development Building Critical Infrastructure Resilience through Distributed Energy Resources Energy Equity and Community Resilience Proceedings are developed for each year of the event, documenting the diversity of the research and engagements within these topical areas. == Impacts for the future == Since its inception, the Resilience Week community has evolved from one that primarily included only university researchers to one that includes many government laboratories, universities and private industries in the US and internationally. This type of collaboration forms a feedback loop that informs the research with the current needs and hones best practices. The future of the event is to further advance discussions that advance investment, recognize priorities and expedite technologies and tools to proactively address our energy future, in light of the natural and manmade challenges, and rationalizing the complex relationships that exist in critical infrastructure.

    Read more →
  • Ubiquitous robot

    Ubiquitous robot

    Ubiquitous robot is a term used in an analogous way to ubiquitous computing. Software useful for "integrating robotic technologies with technologies from the fields of ubiquitous and pervasive computing, sensor networks, and ambient intelligence". The emergence of mobile phone, wearable computers and ubiquitous computing makes it likely that human beings will live in a ubiquitous world in which all devices are fully networked. The existence of ubiquitous space resulting from developments in computer and network technology will provide motivations to offer desired services by any IT device at any place and time through user interactions and seamless applications. This shift has hastened the ubiquitous revolution, which has further manifested itself in the new multidisciplinary research area, ubiquitous robotics. It initiates the third generation of robotics following the first generation of the industrial robot and the second generation of the personal robot. Ubiquitous robot (Ubibot) is a robot incorporating three components including virtual software robot or avatar, real-world mobile robot and embedded sensor system in surroundings. Software robot within a virtual world can control a real-world robot as a brain and interact with human beings. Researchers of KAIST, Korea describe these three components as a Sobot (Software robot), Mobot (Mobile robot), and Embot (Embedded robot).

    Read more →
  • Chandy–Misra–Haas algorithm resource model

    Chandy–Misra–Haas algorithm resource model

    The Chandy–Misra–Haas algorithm resource model checks for deadlock in a distributed system. It was developed by K. Mani Chandy, Jayadev Misra and Laura M. Haas. == Locally dependent == Consider the n processes P1, P2, P3, P4, P5,, ... ,Pn which are performed in a single system (controller). P1 is locally dependent on Pn, if P1 depends on P2, P2 on P3, so on and Pn−1 on Pn. That is, if P 1 → P 2 → P 3 → … → P n {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}} , then P 1 {\displaystyle P_{1}} is locally dependent on P n {\displaystyle P_{n}} . If P1 is said to be locally dependent to itself if it is locally dependent on Pn and Pn depends on P1: i.e. if P 1 → P 2 → P 3 → … → P n → P 1 {\displaystyle P_{1}\rightarrow P_{2}\rightarrow P_{3}\rightarrow \ldots \rightarrow P_{n}\rightarrow P_{1}} , then P 1 {\displaystyle P_{1}} is locally dependent on itself. == Description == The algorithm uses a message called probe(i,j,k) to transfer a message from controller of process Pj to controller of process Pk. It specifies a message started by process Pi to find whether a deadlock has occurred or not. Every process Pj maintains a boolean array dependent which contains the information about the processes that depend on it. Initially the values of each array are all "false". === Controller sending a probe === Before sending, the probe checks whether Pj is locally dependent on itself. If so, a deadlock occurs. Otherwise it checks whether Pj, and Pk are in different controllers, are locally dependent and Pj is waiting for the resource that is locked by Pk. Once all the conditions are satisfied it sends the probe. === Controller receiving a probe === On the receiving side, the controller checks whether Pk is performing a task. If so, it neglects the probe. Otherwise, it checks the responses given Pk to Pj and dependentk(i) is false. Once it is verified, it assigns true to dependentk(i). Then it checks whether k is equal to i. If both are equal, a deadlock occurs, otherwise it sends the probe to next dependent process. == Algorithm == In pseudocode, the algorithm works as follows: === Controller sending a probe === if Pj is locally dependent on itself then declare deadlock else for all Pj,Pk such that (i) Pi is locally dependent on Pj, (ii) Pj is waiting for 'Pk and (iii) Pj, Pk are on different controllers. send probe(i, j, k). to home site of Pk === Controller receiving a probe === if (i)Pk is idle / blocked (ii) dependentk(i) = false, and (iii) Pk has not replied to all requests of to Pj then begin "dependents""k"(i) = true; if k == i then declare that Pi is deadlocked else for all Pa,Pb such that (i) Pk is locally dependent on Pa, (ii) Pa is waiting for 'Pb and (iii) Pa, Pb are on different controllers. send probe(i, a, b). to home site of Pb end == Example == P1 initiates deadlock detection. C1 sends the probe saying P2 depends on P3. Once the message is received by C2, it checks whether P3 is idle. P3 is idle because it is locally dependent on P4 and updates dependent3(2) to True. As above, C2 sends probe to C3 and C3 sends probe to C1. At C1, P1 is idle so it update dependent1(1) to True. Therefore, deadlock can be declared. == Complexity == Suppose there are n {\displaystyle n} controllers and m {\displaystyle m} processes, at most m ( n − 1 ) / 2 {\displaystyle m(n-1)/2} messages need to be exchanged to detect a deadlock, with a delay of O ( n ) {\displaystyle O(n)} messages.

    Read more →
  • Virtual data room

    Virtual data room

    A virtual data room (sometimes called a VDR or Deal Room) is an online repository of information that is used for the storing and distribution of documents. In many cases, a virtual data room is used to facilitate the due diligence process during an M&A transaction, loan syndication, or private equity and venture capital transactions. This due diligence process has traditionally used a physical data room to accomplish the disclosure of documents. For reasons of cost, efficiency and security, virtual data rooms have widely replaced the more traditional physical data room. A virtual data room is an extranet to which the bidders and their advisers are given access via the internet. An extranet is essentially a website with limited controlled access, using a secure log-on supplied by the vendor, which can be disabled at any time, by the vendor, if a bidder withdraws. Much of the information released is confidential and restrictions are applied to the viewer's ability to release this to third parties (by means of forwarding, copying or printing). This can be effectively applied to protect the data using digital rights management. The virtual data room provides access to secure documents for authorized users through a dedicated web site, or through secure agent applications. In the process of mergers and acquisitions the data room is set up as part of the central repository of data relating to companies or divisions being acquired or sold. The data room enables the interested parties to view information relating to the business in a controlled environment where confidentiality can be preserved. Conventionally this was achieved by establishing a supervised, physical data room in secure premises with controlled access. In most cases, with a physical data room, only one bidder team can access the room at a time. A virtual data room is designed to have the same advantages as a conventional data room (controlling access, viewing, copying and printing, etc.) with fewer disadvantages. Due to their increased efficiency, many businesses and industries have moved to using virtual data rooms instead of physical data rooms. In 2006, a spokesperson for a company which sets up virtual deal rooms was reported claiming that the process reduced the bidding process by about thirty days compared to physical data rooms. In the process of startup fundraising, a virtual data room is set up to be a central location for key data, documents, and financials. These are shared with venture capital and angel investors and allows them to streamline due diligence. == Application == Any business dealing with private data can apply VDRs when secure transaction processing is required. This includes financial institutions that need to negotiate confidential customer information without involving third parties. VDRs have traditionally been used for IPOs and real estate asset management. Technology companies may use them to exchange and review code or confidential data needed for operations. The same is true for clients, who entrust their valuable code only to the most qualified people in the organisation. The code is not something that can be printed out and brought in a folder. It resides on a computer and must be used together. VDR can find application in any business that manages data in the form of documents, especially law firms, financial advisers or the B2B sector. The latter work with documents that must always be handled and controlled confidentially, and it is difficult to store them securely when they are on a server that other people can access. In addition, in B2B, it is important to close the deal as quickly as possible: the average sales cycle is one to three months. VDR can be compared to a locked filing cabinet where all those folders and documents are kept. It automates the mathematics of pricing to prevent revenue leakage, and initially integrates CRM to ensure accurate synchronisation of all account data, which is important for B2B in particular and sales in general. While virtual data rooms offer many advantages, they are not suitable for every industry. For example, some governments may decide to continue using physical data rooms for highly confidential information sharing. The damage from potential cyberattacks and data breaches exceeds the benefits offered by virtual data rooms. In such cases, the use of VDRs is not considered. Data breaches have particularly affected the US healthcare system from March 2021 to March 2022 - according to IBM Security the cost of the breach was a record high of $10.1 million.

    Read more →
  • WHATWG

    WHATWG

    The Web Hypertext Application Technology Working Group (WHATWG) was founded by representatives from Apple Inc., the Mozilla Foundation and Opera Software, leading web browser vendors in 2004. WHATWG is responsible for maintaining multiple web-related technical standards, including the specifications for the HyperText Markup Language (HTML) and the Document Object Model (DOM). The central organizational membership and control of WHATWG – its "Steering Group" – consists of Apple, Mozilla, Google, and Microsoft. WHATWG editors of the specifications ensure correct implementation, in consultation with participants, but ultimately in accordance with Steering Group member objectives. == History == The WHATWG was formed in response to the slow development of World Wide Web Consortium (W3C) Web standards and W3C's decision to abandon HTML in favor of XML-based technologies. The WHATWG mailing list was announced on 4 June 2004, two days after the initiatives of a joint Opera–Mozilla position paper had been voted down by the W3C members at the W3C Workshop on Web Applications and Compound Documents. On 10 April 2007, the Mozilla Foundation, Apple, and Opera Software proposed that the new HTML working group of the W3C adopt the WHATWG's HTML5 as the starting point of its work and name its future deliverable as "HTML5" (though the WHATWG specification was later renamed HTML Living Standard). On 9 May 2007, the new HTML working group of the W3C resolved to do that. An Internet Explorer platform architect from Microsoft was invited but did not join, citing the lack of a patent policy to ensure all specifications can be implemented on a royalty-free basis. Since then, the W3C and the WHATWG had been developing HTML independently, at times causing specifications to diverge. In 2017, the WHATWG established an intellectual property rights agreement that includes a patent policy. This spurred a renewed attempt to allow the W3C and the WHATWG to work together on specifications. In 2019, the W3C and WHATWG agreed to a memorandum of understanding where development of HTML and DOM specifications would be done principally in the WHATWG. The editor has significant control over the specification, but the community can influence the decisions of the editor. In one case, editor Ian Hickson proposed replacing the

    Read more →
  • Dependency network (graphical model)

    Dependency network (graphical model)

    Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents. == Markov blanket == In a Bayesian network, the Markov blanket of a node is the set of parents and children of that node, together with the children's parents. The values of the parents and children of a node evidently give information about that node. However, its children's parents also have to be included in the Markov blanket, because they can be used to explain away the node in question. In a Markov random field, the Markov blanket for a node is simply its adjacent (or neighboring) nodes. In a dependency network, the Markov blanket for a node is simply the set of its parents. == Dependency network versus Bayesian networks == Dependency networks have advantages and disadvantages with respect to Bayesian networks. In particular, they are easier to parameterize from data, as there are efficient algorithms for learning both the structure and probabilities of a dependency network from data. Such algorithms are not available for Bayesian networks, for which the problem of determining the optimal structure is NP-hard. Nonetheless, a dependency network may be more difficult to construct using a knowledge-based approach driven by expert-knowledge. == Dependency networks versus Markov networks == Consistent dependency networks and Markov networks have the same representational power. Nonetheless, it is possible to construct non-consistent dependency networks, i.e., dependency networks for which there is no compatible valid joint probability distribution. Markov networks, in contrast, are always consistent. == Definition == A consistent dependency network for a set of random variables X = ( X 1 , … , X n ) {\textstyle \mathbf {X} =(X_{1},\ldots ,X_{n})} with joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} is a pair ( G , P ) {\displaystyle (G,P)} where G {\displaystyle G} is a cyclic directed graph, where each of its nodes corresponds to a variable in X {\displaystyle \mathbf {X} } , and P {\displaystyle P} is a set of conditional probability distributions. The parents of node X i {\displaystyle X_{i}} , denoted P a i {\displaystyle \mathbf {Pa_{i}} } , correspond to those variables P a i ⊆ ( X 1 , … , X i − 1 , X i + 1 , … , X n ) {\displaystyle \mathbf {Pa_{i}} \subseteq (X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})} that satisfy the following independence relationships p ( x i ∣ p a i ) = p ( x i ∣ x 1 , … , x i − 1 , x i + 1 , … , x n ) = p ( x i ∣ x − x i ) . {\displaystyle p(x_{i}\mid \mathbf {pa_{i}} )=p(x_{i}\mid x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})=p(x_{i}\mid \mathbf {x} -{x_{i}}).} The dependency network is consistent in the sense that each local distribution can be obtained from the joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} . Dependency networks learned using large data sets with large sample sizes will almost always be consistent. A non-consistent network is a network for which there is no joint probability distribution compatible with the pair ( G , P ) {\displaystyle (G,P)} . In that case, there is no joint probability distribution that satisfies the independence relationships subsumed by that pair. == Structure and parameters learning == Two important tasks in a dependency network are to learn its structure and probabilities from data. Essentially, the learning algorithm consists of independently performing a probabilistic regression or classification for each variable in the domain. It comes from observation that the local distribution for variable X i {\displaystyle X_{i}} in a dependency network is the conditional distribution p ( x i | x − x i ) {\displaystyle p(x_{i}|\mathbf {x} -{x_{i}})} , which can be estimated by any number of classification or regression techniques, such as methods using a probabilistic decision tree, a neural network or a probabilistic support-vector machine. Hence, for each variable X i {\displaystyle X_{i}} in domain X {\displaystyle X} , we independently estimate its local distribution from data using a classification algorithm, even though it is a distinct method for each variable. Here, we will briefly show how probabilistic decision trees are used to estimate the local distributions. For each variable X i {\displaystyle X_{i}} in X {\displaystyle \mathbf {X} } , a probabilistic decision tree is learned where X i {\displaystyle X_{i}} is the target variable and X − X i {\displaystyle \mathbf {X} -X_{i}} are the input variables. To learn a decision tree structure for X i {\displaystyle X_{i}} , the search algorithm begins with a singleton root node without children. Then, each leaf node in the tree is replaced with a binary split on some variable X j {\displaystyle X_{j}} in X − X i {\displaystyle \mathbf {X} -X_{i}} , until no more replacements increase the score of the tree. == Probabilistic Inference == A probabilistic inference is the task in which we wish to answer probabilistic queries of the form p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} , given a graphical model for X {\displaystyle \mathbf {X} } , where Y {\displaystyle \mathbf {Y} } (the 'target' variables) Z {\displaystyle \mathbf {Z} } (the 'input' variables) are disjoint subsets of X {\displaystyle \mathbf {X} } . One of the alternatives for performing probabilistic inference is using Gibbs sampling. A naive approach for this uses an ordered Gibbs sampler, an important difficulty of which is that if either p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} or p ( z ) {\displaystyle p(\mathbf {z} )} is small, then many iterations are required for an accurate probability estimate. Another approach for estimating p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} when p ( z ) {\displaystyle p(\mathbf {z} )} is small is to use modified ordered Gibbs sampler, where Z = z {\displaystyle \mathbf {Z=z} } is fixed during Gibbs sampling. It may also happen that y {\displaystyle \mathbf {y} } is rare, e.g. when Y {\displaystyle \mathbf {Y} } has many variables. So, the law of total probability along with the independencies encoded in a dependency network can be used to decompose the inference task into a set of inference tasks on single variables. This approach comes with the advantage that some terms may be obtained by direct lookup, thereby avoiding some Gibbs sampling. You can see below an algorithm that can be used for obtain p ( y | z ) {\displaystyle p(\mathbf {y|z} )} for a particular instance of y ∈ Y {\displaystyle \mathbf {y} \in \mathbf {Y} } and z ∈ Z {\displaystyle \mathbf {z} \in \mathbf {Z} } , where Y {\displaystyle \mathbf {Y} } and Z {\displaystyle \mathbf {Z} } are disjoint subsets. Algorithm 1: U := Y {\displaystyle \mathbf {U:=Y} } ( the unprocessed variables ) P := Z {\displaystyle \mathbf {P:=Z} } ( the processed and conditioning variables ) p := z {\displaystyle \mathbf {p:=z} } ( the values for P {\displaystyle \mathbf {P} } ) While U ≠ ∅ {\displaystyle \mathbf {U} \neq \emptyset } : Choose X i ∈ U {\displaystyle X_{i}\in \mathbf {U} } such that X i {\displaystyle X_{i}} has no more parents in U {\displaystyle U} than any variable in U {\displaystyle U} If all the parents of X {\displaystyle X} are in P {\displaystyle \mathbf {P} } p ( x i | p ) := p ( x i | p a i ) {\displaystyle p(x_{i}|\mathbf {p} ):=p(x_{i}|\mathbf {pa_{i}} )} Else Use a modified ordered Gibbs sampler to determine p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} U := U − X i {\displaystyle \mathbf {U:=U} -X_{i}} P := P + X i {\displaystyle \mathbf {P:=P} +X_{i}} p := p + x i {\displaystyle \mathbf {p:=p} +x_{i}} Returns the product of the conditionals p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} == Applications == In addition to the applications to probabilistic inference, the following applications are in the category of Collaborative Filtering (CF), which is the task of predicting preferences. Dependency networks are a natural model class on which to base CF predictions, once an algorithm for this task only needs estimation of p ( x i = 1 | x − x i = 0 ) {\displaystyle p(x_{i}=1|\mathbf {x} -{x_{i}}=0)} to produce recommendations. In particular, these estimates may be obtained by a direct lookup in a dependency network. Predicting what movies a person will like based on his or her ratings of movies seen; Predicting what web pages a person will access based on his or her history on the site; Predicting what news stories a person is interested in based on other stories he or she read; Predicting what product a person will buy based on products he or she has already purchased and/or dropped into his or her shopping basket. Another class of useful applications for dependency networks is related to data visualization, that is

    Read more →
  • Social information architecture

    Social information architecture

    Social information architecture, also known as social iA, is a sub-domain of information architecture which deals with the social aspects of conceptualizing, modeling and organizing information. It has become more relevant because of the rise of social media and Web 2.0 in recent times. == Approach == There are different approaches to the explanation of social information architecture. === Architecture model (internal space) === Architects designing a physical community space, have to consider how the architecture will shape social interactions. A long hallway of offices creates an utterly different dynamic than desks with arranged in an open space. One might foster individuality, privacy, propriety; the other: collaboration, distraction, communalism. Still, physical spaces can be flexibly repurposed and worked around if the inhabitants desire a social dynamic not instantly afforded by the space. Office doors can be left open to invite easier interaction. Partitions can be raised between adjacent desks to limit distraction and increase privacy. That's physical architecture. The information architectures of online communities are far more deterministic and far less flexible. They literally define the social architecture by pre-specifying in immutable computer code what information you have access to, who you can talk to, where you can go. In the online world, information architecture = social architecture. === Social dialogue and information model (external space) === All major brands use information architecture to market their products online, it is then commonly wrapped under the umbrella phrase 'digital strategy'. Information architecture used for strategic purposes encompasses brand SEO, strategic placement of virals, social media presence etc. Charities, news outlets and social dialogue forums can make a much more specific use of the same tools for positive and important social purposes. Social Information Architecture is perceived as the socially conscious wing of commercial information architecture and function to exchange information and ideas between people and groups. Social iA can pick up on conflicting issues that are treated with misunderstanding between cultures and leaves individuals and societies vulnerable to exploitation and manipulation. Since the net has such a far reach it is obvious to use it for meaningful and coordinated social dialogue. Example of such issues are faith, environment, politics, climate change, war, injustice and other social challenges. Information architecture can help create frameworks in which sharing information brings people together, inspires and encourages them to participate in a forward thinking and unfragmented way. One of its core activities is to spread messages that bring people from opposite sites of social and cultural spectrums together and to confront uncomfortable subject head on. == How does social information architecture work? == Social iA utilizes a variety of Web2.0 applications to filter relevant or valuable information and weave them in appropriate information repository or provide feedback to interesting channels. Social iA makes strategic use of Search Engines, Social Media, Google Algorithms, as well as websites, video & news channels. It ‘reads’ or 'listens' to social conversations and search engine queries and engages with the net actively to gather clues about the world's pulse on the internet. It assesses data, social & political trends, and respond with targeted campaigns to give people ideas, as well as help people with making sense of information. == Principals == Dan Brown in his paper 8 Principals of Social Information Architecture enlists the following principals: 1. The principle of objects: Treat content as a living, breathing thing, with a lifecycle, behaviors and attributes. 2. The principle of choices: Create pages that offer meaningful choices to users, keeping the range of choices available focused on a particular task. 3. The principle of disclosure: Show only enough information to help people understand what kinds of information they'll find as they dig deeper. 4. The principle of exemplars: Describe the contents of categories by showing examples of the contents. 5. The principle of front doors: Assume at least half of the website's visitors will come through some page other than the home page. 6. The principle of multiple classification: Offer users several different classification schemes to browse the site's content. 7. The principle of focused navigation: Don't mix apples and oranges in your navigation scheme. 8. The principle of growth: Assume the content you have today is a small fraction of the content you will have tomorrow. == What can social information architecture achieve? == Social information architecture has many potentials in terms of fostering social connections and how information is shared in social spaces on the web.

    Read more →
  • Documentalist

    Documentalist

    A documentalist is a professional, trained in documentation science and specializing in assisting researchers in their search for scientific and technical documentation. With the development of bibliographical databases such as MEDLINE, documentalists were professionals who searched such databases on the behalf of users. When the field of documentation changed its name to information science, the terms information specialist or information professional often replaced the term documentalist.

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →
  • Master data management

    Master data management

    Master data management (MDM) is a discipline in which business and information technology collaborate to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise's official shared master data assets. == Reasons for master data management == Data consistency and accuracy: MDM ensures that the organization's critical data is consistent and accurate across all systems, reducing discrepancies and errors caused by multiple, siloed copies of the same data. Improved decision-making: By providing a single version of the truth (SVOT), MDM enables organizations to deliver the right data to decision makers, allowing them to clearly understand business performance and make informed, data-driven decisions. Operational efficiency: With the consistent and accurate data provided by an MDM, operational processes such as reporting and inventory management can be automated to improve efficiency. Employee learning, onboarding, and customer service also become more efficient, as MDM data facilitates rapid, accurate, and thorough information retrieval, permitting more employee time to be spent on work. Regulatory compliance: MDM tries to help organizations comply with industry standards and regulations by ensuring that master data is accurately recorded, maintained, and audited. However, issues with data quality, classification, and reconciliation may require data transformation. As with other Extract, Transform, Load-based data movements, these processes are expensive and inefficient, reducing return on investment for a project. == Business unit and product line segmentation == As a result of business unit and product line segmentation, the same entity (whether a customer, supplier, or product) will be included in different product lines. This leads to data redundancy and even confusion. For example, a customer takes out a mortgage at a bank. If the marketing and customer service departments have separate databases, advertisements might still be sent to the customer, even though they've already signed up. The two parts of the bank are unaware, and the customer is sent irrelevant communications. Record linkage can associate different records corresponding to the same entity, mitigating this issue. == Mergers and acquisitions == One of the most common problems for master data management is company growth through mergers or acquisitions. Reconciling these separate master data systems can present difficulties, as existing applications have dependencies on the master databases. Ideally, database administrators resolve this problem through deduplication of the master data as part of the merger. Over time, as further mergers and acquisitions occur, the problem can multiply. Data reconciliation processes can become extremely complex or even unreliable. Some organizations end up with 10, 15, or even 100 separate and poorly integrated master databases. This can cause serious problems in customer satisfaction, operational efficiency, decision support, and regulatory compliance. Another problem involves determining the proper degrees of detail and normalization to include in the master data schema. For example, in a federated Human Resources environment, the enterprise software may focus on storing people's data as current status, adding a few fields to identify the date of hire, date of last promotion, etc. However, this simplification can introduce business-impacting errors into dependent systems for planning and forecasting. The stakeholders of such systems may be forced to build a parallel network of new interfaces to track the onboarding of new hires, planned retirements, and divestment, which works against one of the aims of master data management. == People, processes and technology == Master data management is enabled by technology, but is more than the technologies that enable it. An organization's master data management capability will also include people and processes in its definition. === People === Several roles should be staffed within MDM. Most prominently, the Data Owner and the Data Steward. Several people would likely be allocated to each role and each person responsible for a subset of Master Data (e.g. one data owner for employee master data, another for customer master data). The Data Owner is responsible for the requirements for data definition, data quality, data security, etc. as well as for compliance with data governance and data management procedures. The Data Owner should also be funding improvement projects in case of deviations from the requirements. The Data Steward is running the master data management on behalf of the data owner and probably also being an advisor to the Data Owner. === Processes === Master data management can be viewed as a "discipline for specialized quality improvement" defined by the policies and procedures put in place by a data governance organization. It has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing master data throughout an organization to ensure a common understanding, consistency, accuracy and control, in the ongoing maintenance and application use of that data. Processes commonly seen in master data management include source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, data classification, taxonomy services, item master creation, schema mapping, product codification, data enrichment, hierarchy management, business semantics management and data governance. === Technology === A master data management tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), and incorporating rules to eliminate incorrect data from entering the system to create an authoritative source of master data. Master data are the products, accounts, and parties for which the business transactions are completed. Where the technology approach produces a "golden record" or relies on a "source of record" or "system of record", it is common to talk of where the data is "mastered". This is accepted terminology in the information technology industry, but care should be taken, both with specialists and with the wider stakeholder community, to avoid confusing the concept of "master data" with that of "mastering data". ==== Implementation models ==== There are several models for implementing a technology solution for master data management. These depend on an organization's core business, its corporate structure, and its goals. These include: Source of record Registry Consolidation Coexistence Transaction/centralized ===== Source of record ===== This model identifies a single application, database, or simpler source (e.g. a spreadsheet) as being the "source of record" (or "system of record" where solely application databases are relied on). The benefit of this model is its conceptual simplicity, but it may not fit with the realities of complex master data distribution in large organizations. The source of record can be federated, for example by groups of attributes (so that different attributes of a master data entity may have different sources of record) or geographically (so that different parts of an organization may have different master sources). Federation is only applicable in certain use cases, where there is a clear delineation of which subsets of records will be found in which sources. The source of record model can be applied more widely than simply to master data, for example to reference data. ==== Transmission of master data ==== There are several ways in which master data may be collated and distributed to other systems. This includes: Data consolidation – The process of capturing master data from multiple sources and integrating it into a single hub (operational data store) for replication to other destination systems. Data federation – The process of providing a single virtual view of master data from one or more sources to one or more destination systems. Data propagation – The process of copying master data from one system to another, typically through point-to-point interfaces in legacy systems. == Change management in implementation == Challenges in adopting master data management within large organizations often arise when stakeholders disagree on a "single version of the truth" concept is not affirmed by stakeholders, who believe that their local definition of the master data is necessary. For example, the product hierarchy used to manage inventory may be entirely different from the product hierarchies used to support marketing efforts or pay sales representatives. It is above all necessary to identify if different master data is genuinely required. If it is required, then the solution implemented (technology and process) must be able to allow multiple versions of the truth to exist but will prov

    Read more →
  • Irish logarithm

    Irish logarithm

    The Irish logarithm was a system of number manipulation invented by Percy Ludgate for machine multiplication. The system used a combination of mechanical cams as lookup tables and mechanical addition to sum pseudo-logarithmic indices to produce partial products, which were then added to produce results. The technique is similar to Zech logarithms (also known as Jacobi logarithms), but uses a system of indices original to Ludgate. == Concept == Ludgate's algorithm compresses the multiplication of two single decimal numbers into two table lookups (to convert the digits into indices), the addition of the two indices to create a new index which is input to a second lookup table that generates the output product. Because both lookup tables are one-dimensional, and the addition of linear movements is simple to implement mechanically, this allows a less complex mechanism than would be needed to implement a two-dimensional 10×10 multiplication lookup table. Ludgate stated that he deliberately chose the values in his tables to be as small as he could make them; given this, Ludgate's tables can be simply constructed from first principles, either via pen-and-paper methods, or a systematic search using only a few tens of lines of program code. They do not correspond to either Zech logarithms, Remak indexes or Korn indexes. == Pseudocode == The following is an implementation of Ludgate's Irish logarithm algorithm in the Python programming language: Table 1 is taken from Ludgate's original paper; given the first table, the contents of Table 2 can be trivially derived from Table 1 and the definition of the algorithm. Note since that the last third of the second table is entirely zeros, this could be exploited to further simplify a mechanical implementation of the algorithm.

    Read more →