Sayre's paradox

Sayre's paradox

Sayre's paradox is a dilemma encountered in the design of automated handwriting recognition systems. A standard statement of the paradox is that a cursively written word cannot be recognized without being segmented and cannot be segmented without being recognized. The paradox was first articulated in a 1973 publication by Kenneth M. Sayre, after whom it was named. == Nature of the problem == It is relatively easy to design automated systems capable of recognizing words inscribed in a printed format. Such words are segmented into letters by the very act of writing them on the page. Given templates matching typical letter shapes in a given language, individual letters can be identified with a high degree of probability. In cases of ambiguity, probable letter sequences can be compared with a selection of properly spelled words in that language (called a lexicon). If necessary, syntactic features of the language can be applied to render a generally accurate identification of the words in question. Printed-character recognition systems of this sort are commonly used in processing standardized government forms, in sorting mail by zip code, and so forth. In cursive writing, however, letters comprising a given word typically flow sequentially without gaps between them. Unlike a sequence of printed letters, cursively connected letters are not segmented in advance. Here is where Sayre's Paradox comes into play. Unless the word is already segmented into letters, template-matching techniques like those described above cannot be applied. That is, segmentation is a prerequisite for word recognition. But there are no reliable techniques for segmenting a word into letters unless the word itself has been identified. Word recognition requires letter segmentation, and letter segmentation requires word recognition. There is no way a cursive writing recognition system employing standard template-matching techniques can do both simultaneously. Advantages to be gained by use of automated cursive writing recognition systems include routing mail with handwritten addresses, reading handwritten bank checks, and automated digitalization of hand-written documents. These are practical incentives for finding ways of circumventing Sayre's Paradox. == Avoiding the paradox == One way of ameliorating the adverse effects of the paradox is to normalize the word inscriptions to be recognized. Normalization amounts to eliminating idiosyncrasies in the penmanship of the writer, such as unusual slope of the letters and unusual slant of the cursive line. This procedure can increase the probability of a correct match with a letter template, resulting in an incremental improvement in the success rate of the system. Since improvement of this sort still depends on accurate segmentation, however, it remains subject to the limitations of Sayre's Paradox. Researchers have come to realize that the only way to circumvent the paradox is by use of procedures that do not rely on accurate segmentation. == Directions of current research == Segmentation is accurate to the extent that it matches distinctions among letters in the actual inscriptions presented to the system for recognition (the input data). This is sometimes referred to as “explicit segmentation”. “Implicit segmentation,” by contrast, is division of the cursive line into more parts than the number of actual letters in the cursive line itself. Processing these “implicit parts” to achieve eventual word identification requires specific statistical procedures involving hidden Markov models (HMM). A Markov model is a statistical representation of a random process, which is to say a process in which future states are independent of states occurring before the present. In such a process, a given state is dependent only on the conditional probability of its following the state immediately before it. An example is a series of outcomes from successive casts of a die. An HMM is a Markov model, individual states of which are not fully known. Conditional probabilities between states are still determinate, but the identities of individual states are not fully disclosed. Recognition proceeds by matching HMMs of words to be recognized with previously prepared HMMs of words in the lexicon. The best match in a given case is taken to indicate the identity of the handwritten word in question. As with systems based on explicit segmentation, automated recognition systems based on implicit segmentation are judged more or less successful according to the percentage of correct identifications they accomplish. Instead of explicit segmentation techniques, most automated handwriting recognition systems today employ implicit segmentation in conjunction with HMM-based matching procedures. The constraints epitomized by Sayre's Paradox are largely responsible for this shift in approach.

Synthesia (company)

Synthesia Limited is a British multinational artificial intelligence company based in London, United Kingdom. It is a synthetic media-generation software developer and creator of AI-generated video content, including audio-visual agents and cloned avatars. Britain's largest generative-AI firm, it is used by 70% of FTSE 100 and over 90% of Fortune 100 companies. == Overview == Synthesia is most often used by corporations for localized communication, orientation, employee training videos, advertising campaigns, reporting, product demonstrations, customer service, and to create chatbots. Its software algorithm mimics speech and facial movements based on video recordings of an individual’s speech and facial expressions. From this, a text-to-speech video is created to look and sound like the individual. Swiss bank UBS incorporated Synthesia AI-powered avatars of their human financial experts, for instance, in 2025. Users create content via the platform's pre-generated AI presenters or by creating digital representations of themselves, or personal avatars, using the platform's AI video editing tool. These avatars can be used to narrate videos generated from text. As of August 2021, Synthesia's voice database included multiple gender options in over 60 languages. Its free voice library doubled by 2025, to 140 languages and accents, and its Express-Voice technology can clone a user's own voice, or generate a synthetic one. === Deepfakes === The platform prohibits use of its software to create non-consensual clones, including of celebrities or political figures for satirical purposes. Explicit consent must be provided in addition to a strict pre-screening regimen for use of an individual's likeness to avoid “deepfaking”. While the company prohibits use of its technology for misinformation or "news-like content", an October 2023 Freedom House report stated that Synthesia tools had been used by governments in Venezuela, China, Burkina Faso, and Russia to create videos of fake TV news outlets with AI-generated avatars in order to spread propaganda. Actor Dan Dewhirst signed a contract with the company in 2021, becoming one of the first actors whose likeness would be made into an AI avatar, finding his likeness used in the Venezuelan generated-videos. The company stated, in February 2024, that it had improved its misuse detection systems, and, in April 2024, that new users of its technology are screened by the company, and content employing it is further vetted by Synthesia moderators. == History == Synthesia's software utilizes deep learning architecture developed by Lourdes Agapito and Matthias Niessner. The company was co-founded in 2017 by Agapito, Niessner, Victor Riparbelli, and Steffen Tjerrild. In 2018, the company first demonstrated the software's capabilities on the BBC programme Click when it presented a digitization of Matthew Amroliwala speaking Spanish, Mandarin, and Hindi. Through Synthesia's first two years of existence, it employed 10 people and struggled to make sales, leading to an expansion of the company's focus. It moved on from just targeting entertainment studios to a variety of businesses. In 2020, Synthesia users were reported to include Amazon, Tiffany & Co. and IHG Hotels & Resorts. In January 2024, the company introduced its AI video assistant, which turns text-to-video. That April, with a reported 55,000 customers, including half of the Fortune 100, Synthesia launched "expressive avatars". That September, an enhanced dubbing feature was launched, to translate video in 30 languages with naturalized lip-syncing. Peter Hill joined Synthesia as CTO in January 2025, following 25 years at Amazon, and two years as CEO and CPO of Wildfire Studios. That March, a million dollar base of shares was formed to furnish human actors, employed to generate digital avatars, with company stock, which all of its employees hold. By June of that year, 150,000 individuals from among Synthesia's 65,000 customers had created AI-generated avatars of themselves. In July 2025, the company's new global headquarters at Regent’s Place was opened by London mayor Sadiq Khan, who described Britain's largest generative-AI company, then valued at over $2 billion, as a "London success story". By that October, its technology was employed by 90% of the Fortune 100, and Synthesia 3.0 was launched, with hyper-realistic digital avatars equipped with AI-powered dubbing and translation, and a built-in video assistant. In January 2026, it reached a $4 billion valuation, with 70% of FTSE 100 companies noted among its customers. === Funding === The company raised $3.1 million in seed funding in 2019. In April 2021, the company raised $12.5 million in Series A funding. In December 2021, it raised $50 million in a Series B funding round led by Kleiner Perkins and GV (then Google Ventures). Synthesia gained a total valuation of $1 billion, and achieved unicorn status, when it raised $90 million from Accel and Nvidia partnership NVentures, in June 2023, during its Series C funding round. Counting 60,000 customers by January 2025, including over 60% of Fortune 100 companies; the company raised $180 million in a Series D round led by NEA, with new investors World Innovation Lab (WiL), Atlassian Ventures and PSP Growth, as well as existing investors GV, MMC Ventures and FirstMark, doubling Synthesia's valuation to $2.1 billion. Capital raised by 2025 had reached $330 million, with investments slated to further product innovation, talent growth, and company expansion in North America, Europe, Japan and Australia. In April 2025, Adobe Inc. invested £10 million in the company for a strategic partnership. Synthesia subsequently rejected a $3 billion acquisition offer from Adobe, choosing to remain independent. With a revenue stream then exceeding $100 million annually; GV led a Series E funding round in October 2025, resulting in Synthesia's $4 billion valuation, raising $200 million from GV, Nvidia and Accel to develop, in 2026, interactive audio-visual avatar "agents" that converse on topic, for automated sales training and corporate communications, such as recruiting. == Recognition == In 2021, Synthesia partnered with Lay's to create the Messi Messages campaign featuring Argentine footballer Lionel Messi. Users created personalized messages with Synthesia's software and sent custom artificial reality video messages from Messi based on their text input. The campaign received a Cannes Lion Award under the Bronze category. In February 2025, UK Science and Technology Minister Peter Kyle commended Synthesia's "pioneering generative AI innovations."

Hybrid argument (cryptography)

In cryptography, the hybrid argument is a proof technique used to show that two distributions are computationally indistinguishable. == History == Hybrid arguments had their origin in a papers by Andrew Yao in 1982 and Shafi Goldwasser and Silvio Micali in 1983. == Formal description == Formally, to show two distributions D1 and D2 are computationally indistinguishable, we can define a sequence of hybrid distributions D1 := H0, H1, ..., Ht =: D2 where t is polynomial in the security parameter n. Define the advantage of any probabilistic efficient (polynomial-bounded time) algorithm A as A d v H i , H i + 1 d i s t ( A ) := | Pr [ x ← $ H i : A ( x ) = 1 ] − Pr [ x ← $ H i + 1 : A ( x ) = 1 ] | , {\displaystyle {\mathsf {Adv}}_{H_{i},H_{i+1}}^{\mathsf {dist}}(\mathbf {A} ):=\left|\Pr[x{\stackrel {\$}{\gets }}H_{i}:\mathbf {A} (x)=1]-\Pr[x{\stackrel {\$}{\gets }}H_{i+1}:\mathbf {A} (x)=1]\right|,} where the dollar symbol ($) denotes that we sample an element from the distribution at random. By triangle inequality, it is clear that for any probabilistic polynomial time algorithm A, A d v D 1 , D 2 d i s t ( A ) ≤ ∑ i = 0 t − 1 A d v H i , H i + 1 d i s t ( A ) . {\displaystyle {\mathsf {Adv}}_{D_{1},D_{2}}^{\mathsf {dist}}(\mathbf {A} )\leq \sum _{i=0}^{t-1}{\mathsf {Adv}}_{H_{i},H_{i+1}}^{\mathsf {dist}}(\mathbf {A} ).} Thus there must exist some k s.t. 0 ≤ k < t(n) and A d v H k , H k + 1 d i s t ( A ) ≥ A d v D 1 , D 2 d i s t ( A ) / t ( n ) . {\displaystyle {\mathsf {Adv}}_{H_{k},H_{k+1}}^{\mathsf {dist}}(\mathbf {A} )\geq {\mathsf {Adv}}_{D_{1},D_{2}}^{\mathsf {dist}}(\mathbf {A} )/t(n).} Since t is polynomial-bounded, for any such algorithm A, if we can show that it has a fixed negligible advantage function ε(n) between distributions Hi and Hi+1 for every i, so in particular, ϵ ( n ) ≥ A d v H k , H k + 1 d i s t ( A ) ≥ A d v D 1 , D 2 d i s t ( A ) / t ( n ) , {\displaystyle \epsilon (n)\geq {\mathsf {Adv}}_{H_{k},H_{k+1}}^{\mathsf {dist}}(\mathbf {A} )\geq {\mathsf {Adv}}_{D_{1},D_{2}}^{\mathsf {dist}}(\mathbf {A} )/t(n),} then it immediately follows that its advantage to distinguish the distributions D1 = H0 and D2 = Ht must also be negligible. == Applications == The hybrid argument is extensively used in cryptography. Some simple proofs using hybrid arguments are: If one cannot efficiently predict the next bit of the output of some number generator, then this generator is a pseudorandom number generator (PRG). We can securely expand a PRG with 1-bit output into a PRG with n-bit output.

Reverse proxy

In computer networks, a reverse proxy or surrogate server is a proxy server that appears to any client to be an ordinary web server, but in reality merely acts as an intermediary that forwards the client's requests to one or more ordinary web servers. Reverse proxies help increase scalability, performance, resilience, and security, but they also carry a number of risks. Companies that run web servers often set up reverse proxies to facilitate the communication between an Internet user's browser and the web servers. An important advantage of doing so is that the web servers can be hidden behind a firewall on a company-internal network, and only the reverse proxy needs to be directly exposed to the Internet. Reverse proxy servers are implemented in popular open-source web servers. Dedicated reverse proxy servers are used by some of the biggest websites on the Internet. A reverse proxy is capable of tracking IP addresses of requests that are relayed through it as well as reading and/or modifying any non-encrypted traffic. However, this implies that anyone who has compromised the server could do so as well. Reverse proxies differ from forward proxies, which are used when the client is restricted to a private, internal network and asks a forward proxy to retrieve resources from the public Internet. == Uses == Large websites and content delivery networks use reverse proxies, together with other techniques, to balance the load between internal servers. Reverse proxies can keep a cache of static content, which further reduces the load on these internal servers and the internal network. It is also common for reverse proxies to add features such as compression or TLS encryption to the communication channel between the client and the reverse proxy. Reverse proxies can inspect HTTP headers, which, for example, allows them to present a single IP address to the Internet while relaying requests to different internal servers based on the URL of the HTTP request. Reverse proxies can hide the existence and characteristics of origin servers. This can make it more difficult to determine the actual location of the origin server / website and, for instance, more challenging to initiate legal action such as takedowns or block access to the website, as the IP address of the website may not be immediately apparent. Additionally, the reverse proxy may be located in a different jurisdiction with different legal requirements, further complicating the takedown process. Application firewall features can protect against common web-based attacks, like a denial-of-service attack (DoS) or distributed denial-of-service attacks (DDoS). Without a reverse proxy, removing malware or initiating takedowns (while simultaneously dealing with the attack) on one's own site, for example, can be difficult. In the case of secure websites, a web server may not perform TLS encryption itself, but instead offload the task to a reverse proxy that may be equipped with TLS acceleration hardware. (See TLS termination proxy.) A reverse proxy can distribute the load from incoming requests to several servers, with each server supporting its own application area. In the case of reverse proxying web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource. A reverse proxy can reduce load on its origin servers by caching static content and dynamic content, known as web acceleration. Proxy caches of this sort can often satisfy a considerable number of website requests, greatly reducing the load on the origin server(s). A reverse proxy can optimize content by compressing it in order to speed up loading times. In a technique named "spoon-feeding", a dynamically generated page can be produced in its entirety and served to the reverse proxy, which can feed the page to the client as the connection allows. The program that generates the page need not remain open, thus releasing server resources during the possibly extended time the client requires to complete the transfer. Reverse proxies can operate wherever multiple web-servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines with different local IP addresses. The reverse proxy analyzes each incoming request and delivers it to the right server within the local area network. Reverse proxies can perform A/B testing and multivariate testing without requiring application code to handle the logic of which version is served to a client. A reverse proxy can add access authentication to a web server that does not have any authentication. == Risks == When the transit traffic is encrypted and the reverse proxy needs to filter/cache/compress or otherwise modify or improve the traffic, the proxy first must decrypt and re-encrypt communications. This requires the proxy to possess the TLS certificate and its corresponding private key, extending the number of systems that can have access to non-encrypted data and making it a more valuable target for attackers. The vast majority of external data breaches happen either when hackers succeed in abusing an existing reverse proxy that was intentionally deployed by an organization, or when hackers succeed in converting an existing Internet-facing server into a reverse proxy server. Compromised or converted systems allow external attackers to specify where they want their attacks proxied to, enabling their access to internal networks and systems. Applications that were developed for the internal use of a company are not typically hardened to public standards and are not necessarily designed to withstand all hacking attempts. When an organization allows external access to such internal applications via a reverse proxy, they might unintentionally increase their own attack surface and invite hackers. If a reverse proxy is not configured to filter attacks or it does not receive daily updates to keep its attack signature database up to date, a zero-day vulnerability can pass through unfiltered, enabling attackers to gain control of the system(s) that are behind the reverse proxy server. Giving the reverse proxy of a third party access to private keys (for caching or optimizing content) places the entire triad of confidentiality, integrity and availability in the hands of the third party who operates the proxy. A reverse proxy is a single point of failure for the back-end services it fronts: an outage caused by misconfiguration, a denial-of-service attack, or a software fault can make every fronted service unreachable to outside clients, even when the back-end services themselves remain healthy. For example, a 2020 outage at Cloudflare briefly took down major sites and services that relied on its reverse-proxy edge, including Discord.

Hyper-encryption

Hyper-encryption is a form of encryption invented by Michael O. Rabin which uses a high-bandwidth source of public random bits, together with a secret key that is shared by only the sender and recipient(s) of the message. It uses the assumptions of Ueli Maurer's bounded-storage model as the basis of its secrecy. Although everyone can see the data, decryption by adversaries without the secret key is still not feasible, because of the space limitations of storing enough data to mount an attack against the system. Unlike almost all other cryptosystems except the one-time pad, hyper-encryption can be proved to be information-theoretically secure, provided the storage bound cannot be surpassed. Moreover, if the necessary public information cannot be stored at the time of transmission, the plaintext can be shown to be impossible to recover, regardless of the computational capacity available to an adversary in the future, even if they have access to the secret key at that future time. A highly energy-efficient implementation of a hyper-encryption chip was demonstrated by Krishna Palem et al. using the Probabilistic CMOS or PCMOS technology and was shown to be ~205 times more efficient in terms of Energy-Performance-Product.

Digital Michelangelo Project

The Digital Michelangelo Project was a pioneering initiative undertaken during the 1998–1999 academic year to digitize the sculptures and architecture of Michelangelo using advanced laser scanning technology. The project was led by a team of 30 faculty, staff, and students from Stanford University and the University of Washington, with the aim of creating high-resolution 3D models of Michelangelo's works for scholarly, educational, and preservation purposes. == Objectives == The primary goals of the Digital Michelangelo Project were: To apply recent advancements in laser rangefinder technology for digitizing large cultural artifacts. To create detailed digital archives of Michelangelo's sculptures and architectural spaces for future study and analysis. To explore potential educational and curatorial applications for 3D scanned data. === Artworks digitized === The project involved scanning several iconic works by Michelangelo, including: David The Unfinished Slaves (Atlas, Awakening, Bearded, and Youthful) St. Matthew The allegorical statues from the Medici tombs (Night, Day, Dawn, and Dusk) The architectural interiors of the Tribuna del David at the Galleria dell'Accademia and the New Sacristy in the Medici Chapels. == Technology and methodology == === 3D scanning === The project's primary scanner was a laser triangulation rangefinder mounted on a motorized gantry, custom-built by Cyberware Inc. The scanner used a laser sheet to project onto an object, capturing its shape through triangulation. Multiple scans were taken from various angles and combined into a single, detailed 3D mesh. The resolution achieved was fine enough to capture even Michelangelo's chisel marks, with triangles approximately 0.25 mm on each side. In addition to shape data, color data was captured using a spotlight and a secondary camera, enabling the creation of textured 3D models. === Data processing === The project developed a software suite for processing the scanned data. This included: Aligning and merging multiple scans into a seamless 3D model. Filling holes in the geometry caused by inaccessible areas. Correcting color data for lighting inconsistencies and shadowing. Non-photorealistic rendering techniques were also applied, highlighting surface features such as Michelangelo’s chisel marks for enhanced visualization. == Logistical challenges == The scale and complexity of the project presented several challenges: Data size: The dataset for David alone comprised 2 billion polygons and 7,000 color images, occupying 60 GB of storage. Artifact safety: Ensuring the safety of the statues during scanning required extensive crew training, foam-encased equipment, and collision-prevention mechanisms. == Applications and impact == The digitized models have numerous potential applications: Art history: Allowing precise measurements and geometric analysis, such as determining chisel types or evaluating structural balance. Education: Providing new ways to study art, including interactive viewing from unconventional angles and with custom lighting. Museum curation: Enhancing visitor experiences through interactive kiosks and virtual models. The project demonstrated the potential for 3D technology to preserve and disseminate cultural heritage. == Data distribution == The project's models are available through Stanford University for scholarly purposes, under strict licensing due to Italian intellectual property laws. === ScanView === To provide public access to the 3D models while respecting usage restrictions, the project developed ScanView, a client/server rendering system. ScanView allows users to view and interact with high-resolution 3D models without downloading the data. The client component consists of a freely available viewer program and simplified 3D models. Users can navigate these models locally, adjusting position, orientation, lighting, and surface appearance. When a user finalizes a view, the client queries a remote server for a high-resolution rendering of the model, which is sent back to overwrite the simplified version on the user’s screen. A typical query-response cycle takes 1–2 seconds, depending on network conditions. To protect the models from unauthorized reconstruction, the system employs several security measures, including: Encrypting queries Perturbing viewpoint and lighting parameters Adding noise and warping rendered images Compressing images before transmission ScanView operates on Windows-based PCs and provides access to selected models, including David and St. Matthew, as well as other artifacts such as fragments of the Forma Urbis Romae and items from the Stanford 3D Scanning Repository. == Sponsors == The Digital Michelangelo Project was supported by Stanford University, Interval Research Corporation, and the Paul G. Allen Foundation for the Arts.

European Grid Infrastructure

EGI (originally an initialism for European Grid Infrastructure) is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation. As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation). == Name == Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate. To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI. == Structure == === EGI Federation === The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation. === EGI Foundation === The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed Artificial Intelligence/Machine Learning, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term. === EGI Council === The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem. === EGI Services === EGI offers a suite of services to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing. == History == In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications. On 28 February 2003, the first software release of LCG-MW was published. gLite, the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation. 2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation. A few years later, in 2010, EGI was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper on a 'European Open Science Cloud for Research'. With the EOSC-hub project in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, EOSC Life and EOSC Synergy. With EGI-ACE and its contribution to EOSC Future, EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, and with EOSC Beyond it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.