AI Art That Looks Real

AI Art That Looks Real — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • CloudPassage

    CloudPassage

    CloudPassage is a company that provides an automation platform, delivered via software as a service, that improves security for private, public, and hybrid cloud computing environments. CloudPassage is headquartered in San Francisco. == History == CloudPassage was founded by Carson Sweet, Talli Somekh, and Vitaliy Geraymovych in 2010. The company used cloud computing and big data analytics to implement security monitoring and control in a platform called Halo. CloudPassage spent a year in stealth developing the Halo technology, coming out of stealth mode to a closed beta in January 2011. In June 2012, the company launched the commercial product that included configuration security monitoring, network microsegmentation, and two-factor authentication for privileged access management. By 2013, CloudPassage expanded Halo to support large enterprises with advanced security and compliance requirements with a product called Halo Enterprise. The first round of venture funding for the company raised $6.5 million. In April 2012, CloudPassage raised $14 million. The financing round was led by Tenaya Capital. In February 2014, CloudPassage announced that it had raised $25.5 million in funding led by Shasta Ventures. In total, the company has invested over $30 million in its technology and raised approximately $88 million in capital. == Product == The CloudPassage platform provides cloud workload security and compliance for systems hosted in public or private cloud infrastructure environments, including hybrid cloud and multi-cloud workload hosting models. The flagship product the company offers is called Halo. Halo secures virtual servers in public, private, and hybrid cloud infrastructures and provides file integrity monitoring (FIM) while also administering firewall automation, vulnerability monitoring, network access control, security event alerting, and assessment. The Halo platform also provides security applications such as privileged access management, software vulnerability scanning, multifactor authentication, and log-based IDS. In December 2013, CloudPassage set up six servers with Microsoft Windows and Linux operating systems and combinations of popular programs and invited hackers to attempt to hack into the servers. The top prize was $5,000 and the winning hacker was a novice that completed the task in four hours. CloudPassage programmed the servers to use basic default security settings to show how vulnerable cloud computing programs can be to security threats. == Awards and recognition == In May 2011, Gigaom named CloudPassage in its list of the Top 50 Cloud Innovators. That same month, eWeek recognized CloudPassage as one of 16 Hot Startup Companies Flying Under the Radar. SC Magazine named CloudPassage an Industry Innovator in the Virtualization and Cloud Security category in 2012. Also in 2012, The Wall Street Journal named CloudPassage a runner-up in the Information Security category of its Technology Innovation Awards. The CloudPassage large-scale security program, Halo, won Best Security Solution in 2014 at the SIIA Codie awards.

    Read more →
  • Verifiable secret sharing

    Verifiable secret sharing

    In cryptography, a secret sharing scheme is verifiable if auxiliary information is included that allows players to verify their shares as consistent. More formally, verifiable secret sharing ensures that even if the dealer is malicious there is a well-defined secret that the players can later reconstruct. (In standard secret sharing, the dealer is assumed to be honest.) The concept of verifiable secret sharing (VSS) was first introduced in 1985 by Benny Chor, Shafi Goldwasser, Silvio Micali and Baruch Awerbuch. In a VSS protocol a distinguished player who wants to share the secret is referred to as the dealer. The protocol consists of two phases: a sharing phase and a reconstruction phase. Sharing: Initially the dealer holds secret as input and each player holds an independent random input. The sharing phase may consist of several rounds. At each round each player can privately send messages to other players and can also broadcast a message. Each message sent or broadcast by a player is determined by its input, its random input and messages received from other players in previous rounds. Reconstruction: In this phase each player provides its entire view from the sharing phase and a reconstruction function is applied and is taken as the protocol's output. An alternative definition given by Oded Goldreich defines VSS as a secure multi-party protocol for computing the randomized functionality corresponding to some (non-verifiable) secret sharing scheme. This definition is stronger than that of the other definitions and is very convenient to use in the context of general secure multi-party computation. Verifiable secret sharing is important for secure multiparty computation. Multiparty computation is typically accomplished by making secret shares of the inputs, and manipulating the shares to compute some function. To handle "active" adversaries (that is, adversaries that corrupt nodes and then make them deviate from the protocol), the secret sharing scheme needs to be verifiable to prevent the deviating nodes from throwing off the protocol. == Feldman's scheme == A commonly used example of a simple VSS scheme is the protocol by Paul Feldman, which is based on Shamir's secret sharing scheme combined with any encryption scheme which satisfies a specific homomorphic property (that is not necessarily satisfied by all homomorphic encryption schemes). The following description gives the general idea, but is not secure as written. (Note, in particular, that the published value gs leaks information about the dealer's secret s.) First, a cyclic group G of prime order q, along with a generator g of G, is chosen publicly as a system parameter. The group G must be chosen such that computing discrete logarithms is hard in this group. (Typically, one takes an order-q subgroup of (Z/pZ)×, where q is a prime dividing p − 1.) The dealer then computes (and keeps secret) a random polynomial P of degree t with coefficients in Zq, such that P(0) = s, where s is the secret. Each of the n share holders will receive a value P(1), ..., P(n) modulo q. Any t + 1 share holders can recover the secret s by using polynomial interpolation modulo q, but any set of at most t share holders cannot. (In fact, at this point any set of at most t share holders has no information about s.) So far, this is exactly Shamir's scheme. To make these shares verifiable, the dealer distributes commitments to the coefficients of P modulo q. If P(x) = s + a1x + ... + atxt, then the commitments that must be given are: c0 = gs, c1 = ga1, ... ct = gat. Once these are given, any party can verify their share. For instance, to verify that v = P(i) modulo q, party i can check that g v = c 0 c 1 i c 2 i 2 ⋯ c t i t = ∏ j = 0 t c j i j = ∏ j = 0 t g a j i j = g ∑ j = 0 t a j i j = g P ( i ) {\displaystyle g^{v}=c_{0}c_{1}^{i}c_{2}^{i^{2}}\cdots c_{t}^{i^{t}}=\prod _{j=0}^{t}c_{j}^{i^{j}}=\prod _{j=0}^{t}g^{a_{j}i^{j}}=g^{\sum _{j=0}^{t}a_{j}i^{j}}=g^{P(i)}} . This scheme is, at best, secure against computationally bounded adversaries, namely the intractability of computing discrete logarithms. Pedersen proposed later a scheme where no information about the secret is revealed even with a dealer with unlimited computing power. == Baghery's hash-based scheme == A recent line of research has proposed a unified framework, for building practical VSS schemes that do not necessarily require homomorphic commitments —a key requirement in traditional constructions such as Feldman's and Pedersen's schemes. The framework allows instantiations with different commitment schemes, including post-quantum secure options such as hash-based commitments. This offers a flexible and efficient approach to build VSS schemes, in which the verifiability of shares is decoupled from the need for homomorphic commitments, which are often tied to assumptions like the Discrete Logarithm (DL) problem, known to be insecure against quantum adversaries. One instantiation of the new framework uses hash-based commitments and a random oracle to construct a hash-based VSS scheme based on Shamir's secret sharing. === Protocol Overview === Sharing Phase: Given a secure hash-based commitment scheme C {\displaystyle {\mathcal {C}}} and a hash function H {\displaystyle {\mathcal {H}}} (modeled as a random oracle), to share a secret value s {\displaystyle s} among n {\displaystyle n} parties with threshold t {\displaystyle t} , the dealer acts as follows: Following Shamir sharing, the dealer samples a random degree- t {\displaystyle t} polynomial P ( X ) {\displaystyle P(X)} over a filed or ring, with P ( 0 ) = s {\displaystyle P(0)=s} . Each of the n {\displaystyle n} parties will receive a value v i = P ( i ) {\displaystyle v_{i}=P(i)} modulo q {\displaystyle q} as a share. To prove the validity of the shares, the dealer acts as follows: Samples another random degree- t {\displaystyle t} polynomial R ( X ) {\displaystyle R(X)} and n {\displaystyle n} random values γ 1 , … , γ n {\displaystyle \gamma _{1},\dots ,\gamma _{n}} from the same filed or ring. Computes a set of commitments c i = C ( P ( i ) , R ( i ) , γ i ) {\displaystyle c_{i}={\mathcal {C}}(P(i),R(i),\gamma _{i})} for i = 1 , 2 , … , n {\displaystyle i=1,2,\dots ,n} . Note that, the additional randomness γ i {\displaystyle \gamma _{i}} is used when the secret s {\displaystyle s} does not have sufficient entropy, but it can be omitted when sharing a uniformly random secret. Each of the n {\displaystyle n} parties will also receive a value γ i {\displaystyle \gamma _{i}} modulo q {\displaystyle q} as a share. Calculates a challenge value d {\displaystyle d} via a hash function d = H ( c 1 , … , c n ) {\displaystyle d={\mathcal {H}}(c_{1},\dots ,c_{n})} and then computes a polynomial Z ( X ) = R ( X ) + d ⋅ P ( X ) {\displaystyle Z(X)=R(X)+d\cdot P(X)} . Broadcasts the commitments c 1 , … , c n {\displaystyle c_{1},\dots ,c_{n}} along with Z ( X ) {\displaystyle Z(X)} as the proof and privately sends ( v i , γ i ) {\displaystyle (v_{i},\gamma _{i})} as the individual share to party i {\displaystyle i} . Verification Phase: Given an individual share ( v i , γ i ) {\displaystyle (v_{i},\gamma _{i})} and a proof ( c 1 , … , c n , Z ( X ) ) {\displaystyle (c_{1},\dots ,c_{n},Z(X))} , party i {\displaystyle i} verifies the correctness of it as below: Checks that Z ( X ) {\displaystyle Z(X)} is a valid (up to) degree- t {\displaystyle t} polynomial. Recomputes the challenge value d = H ( c 1 , … , c n ) {\displaystyle d={\mathcal {H}}(c_{1},\dots ,c_{n})} , and verifies the commitment equation c i = C ( v i , Z ( i ) − d v i , γ i ) {\displaystyle c_{i}={\mathcal {C}}(v_{i},Z(i)-dv_{i},\gamma _{i})} . If the verification fails, similar to Feldman’s and Pedersen’s schemes, the party raises a complaint. If too many complaints (more than t {\displaystyle t} ) are raised, the dealer is disqualified. In case of a complaint, the dealer can publicly reveal the disputed share to allow global verification. Honest parties can then collectively agree to either continue or disqualify the dealer. This scheme supports the sharing of both low-entropy and high-entropy secrets. Moreover, since it relies solely on secure hash functions for commitments and on a (quantum) random oracle, it plausibly achieves security even against quantum adversaries. Additionally, by using only lightweight cryptographic primitives, the scheme is considerably more efficient in practice compared to traditional VSS constructions based on number-theoretic assumptions. == Benaloh's scheme == Once n shares are distributed to their holders, each holder should be able to verify that all shares are collectively t-consistent (i.e., any subset t of n shares will yield the same, correct, polynomial without exposing the secret). In Shamir's secret sharing scheme the shares s 1 , s 2 , . . . , s n {\displaystyle s_{1},s_{2},...,s_{n}} are t-consistent if and only if the interpolation of the points ( 1 , s 1 ) , ( 2 , s 2 ) , . . . , (

    Read more →
  • Death of Molly Russell

    Death of Molly Russell

    In November 2017, Molly Russell, a fourteen-year-old British schoolgirl from Harrow, London, was found dead in her bedroom by her parents. In an inquest, the coroner stated that she had died from an act of self-harm following depression and the results of social media consumption, including material on Instagram and Pinterest. She also had a Twitter account in which she documented her growing depression. == Life == Russell had been a pupil at Hatch End High School. At the inquest, the school's head teacher expressed shock that she was able to access distressing online content. Her parents stated that she had never shown any previous signs of struggle and was doing very well in school. It was revealed at the inquest that in the six months prior to her death, 2,100 of 16,300 pieces of content she had interacted with on Instagram were on topics such as self-harm, depression, and suicide. It was also noted that throughout her experience on social media, there were never any warning signs about the information she viewed on these platforms. == Subsequent events == Dr. Navin Venugopal, the child psychiatrist assigned to the case investigating her death, called the material she viewed "disturbing and distressing" and said he was unable to sleep well for weeks after viewing it. The coroner Andrew Walker concluded that Molly's death was "an act of self harm suffering from depression and the negative effects of online content". He issued a prevention of future deaths report regarding her death, in which he made a number of recommendations for operators of online platforms, including: separating platforms for adults and children age verification changes in policy on filtering of age-specific content adding features for parental supervision and control data retention of material viewed by children He suggested that this could be accomplished by either legislation or self-regulation. The lawyer representing her family at the inquest stated that the findings "captured all of the elements of why this material is so harmful." The case has been cited as a motivator for the passage of the Online Safety Act. A charity, the Molly Rose Foundation, was set up in her memory, with the goal of suicide prevention for young people. Meta and Pinterest are believed to have made substantial donations to the charity.

    Read more →
  • Virtual influencer

    Virtual influencer

    A virtual influencer, sometimes described as a virtual persona or virtual model, is a computer-generated fictional character that can be used for a variety of marketing-related purposes, but most frequently for social media marketing, in lieu of online human "influencers". Most virtual influencers are designed using computer graphics and motion capture technology to resemble real people in realistic situations. Common derivatives of virtual influencers include VTubers, which broadly refer to online entertainers and YouTubers who represent themselves using virtual avatars instead of their physical selves. == History == Virtual influencers are fundamentally synonymous with virtual idols, which originate from Japan's anime and Japanese idol culture that dates back to the 1980s. The first virtual idol created was Lynn Minmay, a fictional singer and main character of the anime television series Super Dimension Fortress Macross (1982) and the animated film adaptation Macross: Do You Remember Love? (1984). Minmay's success led to the production of more Japanese virtual idols, such as EVE from the Japanese cyberpunk anime Megazone 23 (1985), and Sharon Apple in Macross Plus (1994). Virtual idols were not always well received – in 1995, Japanese talent agency Horipro created Kyoko Date, which was inspired by the Macross franchise and dating sim games such as Tokimeki Memorial (1994). Date failed to gain commercial success despite drawing headlines for her debut as a CGI idol, largely due to technical limitations leading to issues such as unnatural movements, an issue also known as the uncanny valley. Since their inception, many virtual idols created have achieved continual success, with notable names including the Vocaloid singer Hatsune Miku, and the VTuber Kizuna AI. Technological advancements have also enabled production teams to use artificial intelligence and advanced techniques to customize the personalities and behavior of virtual idols. Due to modern-day advancements in technology, many virtual idols have held real-life tours and events. Notable ones include Hatsune Miku's titular tour Miku Expo and Hololive's concerts with many of their idols from their English, Japanese and Indonesian branches. Some notable events including virtual singers and influencers have included: Hatsune Miku opening for Lady Gaga in 2014 and Hoshimachi Suisei's concerts at the famous Budokan venue in Japan and her addition to the Forbes Japan list of '30 Under 30' individuals who are changing the world in their respective fields. == Benefits and criticism == From a branding perspective, virtual influencers are perceived to be much less likely to be mired in scandals. In China, celebrities caught in bad publicity such as singer Wang Leehom and entertainer Kris Wu have heightened the appeal of virtual influencers, since their existence relies entirely on computer-generated imagery and they are therefore unlikely to cause any damage to a brand's image by association. Some studies have also suggested that Generation Z consumers have a unique appetite for virtual idols and influencers, since they grew up in the age of the internet. Studies also show that human-like appearance of virtual influencers show higher message credibility than anime-like virtual influencers. Scholars and commentators have also questioned the ethics and cultural impact of virtual influencers, arguing that computer-generated personas can entrench unrealistic beauty standards while diffusing accountability for labor, identity, and consent. Business and marketing analysts have also warned that disclosure and governance remain inconsistent, recommending clearer guardrails and transparency when brands deploy synthetic spokespeople. In 2025, reporting highlighted concerns that AI-driven "virtual humans" could displace human creators and sales workers, intensifying debates over the future of creative labor and authenticity online. == Notable examples == === Virtual bands === Eternity - A South Korean virtual idol group formed by Pulse9. Gorillaz - A virtual band formed in 1998. K/DA - A virtual K-pop girl group created as part of the League of Legends video game franchise. MAVE: - A South Korean virtual girl group formed in 2023 by Metaverse Entertainment. Pentakill - A virtual heavy metal band created as part of the League of Legends video game franchise. Plave (band) - A South Korean virtual boy band formed by VLast. Squid Sisters and Off the Hook - Two virtual pop idol duos as part of the Splatoon series. Studio Killers - A Finnish-Danish-British virtual band formed in 2011. === Vocaloids === Hatsune Miku (modeled after Saki Fujita) Kagamine Rin/Len (modeled after Asami Shimoda) Megurine Luka (modeled after Yū Asakawa) Meiko (modeled after Meiko Haigō) Kaito (modeled after Naoto Fūga) === VTubers === Kano Kizuna AI Neuro-sama VShojo Ironmouse Projekt Melody Nijisanji Hololive Akai Haato Gawr Gura Hoshimachi Suisei Natsuiro Matsuri === Other examples === Ami Yamato Crazy Frog FN Meka IA Kuki AI Kyoko Date Kyra Miquela Naevis Shudu Gram

    Read more →
  • Automotive security

    Automotive security

    Automotive security refers to the branch of computer security focused on the cyber risks related to the automotive context. The increasingly high number of ECUs in vehicles and, alongside, the implementation of multiple different means of communication from and towards the vehicle in a remote and wireless manner led to the necessity of a branch of cybersecurity dedicated to the threats associated with vehicles. Not to be confused with automotive safety. == Causes == The implementation of multiple ECUs (Electronic Control Units) inside vehicles began in the early '70s thanks to the development of integrated circuits and microprocessors that made it economically feasible to produce the ECUs on a large scale. Since then the number of ECUs has increased to up to 100 per vehicle. These units nowadays control almost everything in the vehicle, from simple tasks such as activating the wipers to more safety-related ones like brake-by-wire or ABS (Anti-lock Braking System). Autonomous driving is also strongly reliant on the implementation of new, complex ECUs such as the ADAS, alongside sensors (lidars and radars) and their control units. Inside the vehicle, the ECUs are connected with each other through cabled or wireless communication networks, such as CAN bus (controller area network), MOST bus (Media Oriented System Transport), FlexRay (Automotive Network Communications Protocol) or RF (radio frequency) as in many implementations of TPMSs (tire-pressure monitoring systems). Many of these ECUs require data received through these networks that arrive from various sensors to operate and use such data to modify the behavior of the vehicle (e.g., the cruise control modifies the vehicle's speed depending on signals arriving from a button usually located on the steering wheel). Since the development of cheap wireless communication technologies such as Bluetooth, LTE, Wi-Fi, RFID and similar, automotive producers and OEMs have designed ECUs that implement such technologies with the goal of improving the experience of the driver and passengers. Safety-related systems such as the OnStar from General Motors, telematic units, communication between smartphones and the vehicle's speakers through Bluetooth, Android Auto and Apple CarPlay. == Threat model == Threat models of the automotive world are based on both real-world and theoretically possible attacks. Most real-world attacks aim at the safety of the people in and around the car, by modifying the cyber-physical capabilities of the vehicle (e.g., steering, braking, accelerating without requiring actions from the driver), while theoretical attacks have been supposed to focus also on privacy-related goals, such as obtaining GPS data on the vehicle, or capturing microphone signals and similar. Regarding the attack surfaces of the vehicle, they are usually divided in long-range, short-range, and local attack surfaces: LTE and DSRC can be considered long-range ones, while Bluetooth and Wi-Fi are usually considered short-range although still wireless. Finally, USB, OBD-II and all the attack surfaces that require physical access to the car are defined as local. An attacker that is able to implement the attack through a long-range surface is considered stronger and more dangerous than the one that requires physical access to the vehicle. In 2015 the possibility of attacks on vehicles already on the market has been proven possible by Miller and Valasek, that managed to disrupt the driving of a Jeep Cherokee while remotely connecting to it through remote wireless communication. === Controller area network attacks === The most common network used in vehicles and the one that is mainly used for safety-related communication is CAN, due to its real-time properties, simplicity, and cheapness. For this reason the majority of real-world attacks have been implemented against ECUs connected through this type of network. The majority of attacks demonstrated either against actual vehicles or in testbeds fall in one or more of the following categories: ==== Sniffing ==== Sniffing in the computer security field generally refers to the possibility of intercepting and logging packets or more generally data from a network. In the case of CAN, since it is a bus network, every node listens to all communication on the network. It is useful for the attacker to read data to learn the behavior of the other nodes of the network before implementing the actual attack. Usually, the final goal of the attacker is not to simply sniff the data on CAN, since the packets passing on this type of network are not usually valuable just to read. ==== Denial of service ==== Denial of service (DoS) in information security is usually described as an attack that has the objective of making a machine or a network unavailable. DoS attacks against ECUs connected to CAN buses can be done both against the network, by abusing the arbitration protocol used by CAN to always win the arbitration, and targeting the single ECU, by abusing the error handling protocol of CAN. In this second case the attacker flags the messages of the victim as faulty to convince the victim of being broken and therefore shut itself off the network. ==== Spoofing ==== Spoofing attacks comprise all cases in which an attacker, by falsifying data, sends messages pretending to be another node of the network. In automotive security usually spoofing attacks are divided into masquerade and replay attacks. Replay attacks are defined as all those where the attacker pretends to be the victim and sends sniffed data that the victim sent in a previous iteration of authentication. Masquerade attacks are, on the contrary, spoofing attacks where the data payload has been created by the attacker. == Real life automotive threat example == Security researchers Charlie Miller and Chris Valasek have successfully demonstrated remote access to a wide variety of vehicle controls using a Jeep Cherokee as the target. They were able to control the radio, environmental controls, windshield wipers, and certain engine and brake functions. The method used to hack the system was implementation of pre-programmed chip into the controller area network (CAN) bus. By inserting this chip into the CAN bus, he was able to send arbitrary message to CAN bus. One other thing that Miller has pointed out is the danger of the CAN bus, as it broadcasts the signal which the message can be caught by the hackers throughout the network. The control of the vehicle was all done remotely, manipulating the system without any physical interaction. Miller states that he could control any of some 1.4 million vehicles in the United States regardless of the location or distance, the only thing needed is for someone to turn on the vehicle to gain access. The work by Miller and Valasek replicated earlier work completed and published by academics in 2010 and 2011 on a different vehicle. The earlier work demonstrated the ability to compromise a vehicle remotely, over multiple wireless channels (including cellular), and the ability to remotely control critical components on the vehicle post-compromise, including the telematics unit and the car's brakes. While the earlier academic work was publicly visible, both in peer-reviewed scholarly publications and in the press, the Miller and Valesek work received even greater public visibility. == Security measures == The increasing complexity of devices and networks in the automotive context requires the application of security measures to limit the capabilities of a potential attacker. Since the early 2000 many different countermeasures have been proposed and, in some cases, applied. Following, a list of the most common security measures: Sub-networks: to limit the attacker capabilities even if he/she manages to access the vehicle from remote through a remotely connected ECU, the networks of the vehicle are divided in multiple sub-networks, and the most critical ECUs are not placed in the same sub-networks of the ECUs that can be accessed from remote. Gateways: the sub-networks are divided by secure gateways or firewalls that block messages from crossing from a sub-network to the other if they were not intended to. Intrusion Detection Systems (IDS): on each critical sub-network, one of the nodes (ECUs) connected to it has the goal of reading all data passing on the sub-network and detect messages that, given some rules, are considered malicious (made by an attacker). The arbitrary messages can be caught by the passenger by using IDS which will notify the owner regarding with unexpected message. Authentication protocols: in order to implement authentication on networks where it is not already implemented (such as CAN), it is possible to design an authentication protocol that works on the higher layers of the ISO OSI model, by using part of the data payload of a message to authenticate the message itself. Hardware Security Modules: since many ECUs are not powerful enough to keep real-time delays whi

    Read more →
  • European Grid Infrastructure

    European Grid Infrastructure

    EGI (originally an initialism for European Grid Infrastructure) is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation. As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation). == Name == Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate. To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI. == Structure == === EGI Federation === The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation. === EGI Foundation === The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed Artificial Intelligence/Machine Learning, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term. === EGI Council === The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem. === EGI Services === EGI offers a suite of services to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing. == History == In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications. On 28 February 2003, the first software release of LCG-MW was published. gLite, the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation. 2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation. A few years later, in 2010, EGI was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper on a 'European Open Science Cloud for Research'. With the EOSC-hub project in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, EOSC Life and EOSC Synergy. With EGI-ACE and its contribution to EOSC Future, EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, and with EOSC Beyond it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.

    Read more →
  • Sentiment analysis

    Sentiment analysis

    Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu

    Read more →
  • What I eat in a day video

    What I eat in a day video

    "What I eat in a day" videos are a trend on several social media platforms where a person describes all the meals and snacks that they eat during a given day, often as part of a given diet. The videos, shared on platforms including Twitter, TikTok and YouTube, become increasingly popular in 2020, with some of them accumulating millions of views, and they are considered a profitable industry for the people making them. Some have raised concerns that the videos may promote an unrealistic standard for healthy eating and contribute to the development of eating disorders. == Format == These videos often feature a montage of the food that the creator eats over the course of the day, sometimes with the associated calorie count of the foods that they describe. Unlike related mukbang videos, however, in which participants eat large amounts of food, the diets described are often restrictive. However, other videos are labeled as "unhealthy" and depict large portion sizes and higher amounts of processed food. == Popularity == "What I eat in a day" videos have existed for a long time, especially on YouTube, but they have become much more widespread in recent years. This phenomenon is self-reinforcing because when social media users watch or like these videos they are likely to see more of them in the future. Indeed, some of the most successful videos have tens of millions of view each. == Criticism and controversy == Several dieticians and mental health professionals over the impacts that these videos can have, as they can advocate a restrictive style of eating and not "promote body diversity." They have also raised concerns that this trend could contribute to a rise in disordered eating, especially since use of social media is known to increase feelings of negative body image. This trend is particularly prevalent among young adults, which are also the group with the highest vulnerability to eating disorders. More recently, a portion of these videos have begun to challenge diets and depict more realistic ways of eating in order to reduce the potential consequences of the trend.

    Read more →
  • Tweak programming environment

    Tweak programming environment

    Tweak is a graphical user interface (GUI) layer written by Andreas Raab for the Squeak development environment, which in turn is an integrated development environment based on the Smalltalk-80 computer programming language. Tweak is an alternative to an earlier graphic user interface layer called Morphic. Development began in 2001. Applications that use the Tweak software include Sophie (version 1), a multimedia and e-book authoring system, and a family of virtual world systems: Open Cobalt, Teleplace, OpenQwaq, 3d ICC's Immersive Terf and the Croquet Project. == Influences == An experimental version of Etoys, a programming environment for children, used Tweak instead of Morphic. Etoys was a major influence on a similar Squeak-based programming environment known as Scratch.

    Read more →
  • Social television

    Social television

    Social television is the union of television and social media. Millions of people now share their TV experience with other viewers on social media such as Twitter and Facebook using smartphones and tablets. TV networks and rights holders are increasingly sharing video clips on social platforms to monetise engagement and drive tune-in. The social TV market covers the technologies that support communication and social interaction around TV as well as companies that study television-related social behavior and measure social media activities tied to specific TV broadcasts – many of which have attracted significant investment from established media and technology companies. The market is also seeing numerous tie-ups between broadcasters and social networking players such as Twitter and Facebook. The market is expected to be worth $256bn by 2017. Social TV was named one of the 10 most important emerging technologies by the MIT Technology Review on Social TV in 2010. And in 2011, David Rowan, the editor of Wired magazine, named Social TV at number three of six in his peek into 2011 and what tech trends to expect to get traction. Ynon Kreiz, CEO of the Endemol Group told the audience at the Digital Life Design (DLD) conference in January 2011: "Everyone says that social television will be big. I think it's not going to be big—it's going to be huge". Much of the investment in the earlier years of social TV went into standalone social TV apps. The industry believed these apps would provide an appealing and complimentary consumer experience which could then be monetized with ads. These apps featured TV listings, check-ins, stickers and synchronised second-screen content but struggled to attract users away from Twitter and Facebook. Most of these companies have since gone out of business or been acquired amid a wave of consolidation and the market has instead focused on the activities of the social media channels themselves – such as Twitter Amplify, Facebook Suggested Videos and Snapchat Discover – and the technologies that support them. == Twitter == Twitter and Facebook are both helping users connect around media, which can provoke strong debate and engagement. Both social platforms want to be the 'digital watercooler' and host conversation around TV because the engagement and data about what media people consume can then be used to generate advertising revenue. As an open platform, conversation on Twitter is closely aligned with real-time events. In May 2013, it launched Twitter Amplify – an advertising product for media and consumer brands. With Amplify, Twitter runs video highlights from major live broadcasts, with advertisers' names and messages playing before the clip. By February 2014, all four major U.S. TV networks had signed up to the Amplify program, bringing a variety of premium TV content onto the social platform in the form of in-tweet real-time video clips. In June 2014, Twitter acquired its Twitter Amplify partner in the U.S. SnappyTV, a company that was helping broadcasters and rights holders to share video content both organically across social and via Twitter's Amplify program. Twitter continues to rely on Grabyo, which has also struck numerous deals with some of the largest broadcasters and rights holders in Europe and North America to share video content across Facebook and Twitter. == Facebook == Facebook made significant changes to its platform in 2014 including updates to its algorithm to enhance how it serves video in users' feeds. It also launched video autoplay to get users to watch the videos in their feeds. It rapidly surpassed Twitter and by the end of 2014 it was enjoying three billion video views a day on its platform and had announced a partnership with the NFL, one of Twitter's most active Twitter Amplify partners. In April 2015, at its F8 Developer Conference, it revealed it was working with Grabyo among other technology partners to bring video onto its platform. Then in July it announced it would be launching Facebook Suggested Videos, bringing related videos and ads to anyone that clicks on a video – a move that not only competed with Twitter's commercial video offering but also put it in direct competition with YouTube. == TV Time == TV Time is a television dedicated social network that allows users to keep track of the television series they watch, as well as films. It also allows them to express their reaction to the media they have seen with episode specific voting for favorite characters and emotional reaction to episodes, as well as commenting in episode restrictive pages. This way users are able to avoid spoilers while also finding a precise audience and community for each of their interactions, as opposed to bigger, non-television dedicated social medias such as Facebook and Twitter where the likelihood of unintentionally reading spoilers is much higher. TV Time offers an analytics service called "TVLytics" where the votes and reactions collected from users can be studied for research and television production purposes. == Advertising == According to Businessinsider.com, there are variety of applications for social TV, including support for TV ad sales, optimizing TV ad buys, making ad buys more efficient, as a complement to audience measurement, and eventually, audience forecasting and real-time optimization. Social TV data can ease access to focus groups and may create a positive feedback loop for generating ultra-sticky TV programming and multi-screen ad campaigns. == In numbers == Viewers share their TV experience on social media in real-time as events unfold: between 88-100m Facebook users login to the platform during the primetime hours of 8pm – 11pm in the US. The volume of social media engagement in TV is also rising – according to Nielsen SocialGuide, there was a 38% increase in tweets about TV in 2013 to 263m. For the 2014 Super Bowl, Twitter reported that a record 24.9 million tweets about the game were sent during the telecast, peaking at 381,605 tweets per minute. Facebook reported that 50 million people discussed the Super Bowl, generating 185 million interactions. The 2014 Oscars generated 5m tweets, viewed by an audience of 37m unique Twitter users and delivering 3.3bn impressions globally as conversation and key moments were shared virally across the platform. In 2014 the All England Lawn Tennis Club (AELTC), hosts of Wimbledon, used Grabyo to share video content across social. The videos were viewed 3.5 million times across Facebook and Twitter. In partnered with Grabyo again in 2015 and the videos generated over 48 million views across Facebook and Twitter. == Television shows with social integration == Here are some examples of how TV executives are integrating social elements with TV shows: C-SPAN streamed tweets from US Senators and Representatives during the quorum call The Voice had the judges of the program tweet during the show and the posts scrolls on the bottom of the screen. The use of Twitter also led to an increase in viewers. "Glee" Entertainment Weekly created a second screen viewing platform for the Glee season 3 premiere. == Related publications == Erika Jonietz. "Making TV Social, Virtually" MIT Technology Review. (January 11, 2010) AmigoTV (Alcatel-Lucent; Coppens et al.) – 2004 www.ist-ipmedianet.org/Alcatel_EuroiTV2004_AmigoTV_short_paper_S4-2.pdf Nextream (MIT Media Lab, Martin et al.) – 2010 Social Interactive Television: Immersive Shared Experiences and Perspectives (P. Cesar, D. Geerts, and K. Chorianopoulos (eds.)) – 2009 Social TV and the Emergence of Interactive TV – Multimedia Research Group – November 2010 Interactive Social TV on Service Oriented Environments: Challenges and Enablers (May 2011) == Systems == Boxee – acquired by Samsung GetGlue – acquired by i.TV Grabyo KIT digital Miso TV Tank Top TV WiO Xbox Live

    Read more →
  • BitFunnel

    BitFunnel

    BitFunnel is the search engine indexing algorithm and a set of components used in the Bing search engine, which were made open source in 2016. BitFunnel uses bit-sliced signatures instead of an inverted index in an attempt to reduce operations cost. == History == Progress on the implementation of BitFunnel was made public in early 2016, with the expectation that there would be a usable implementation later that year. In September 2016, the source code was made available via GitHub. A paper discussing the BitFunnel algorithm and implementation was released as through the Special Interest Group on Information Retrieval of the Association for Computing Machinery in 2017 and won the Best Paper Award. == Components == BitFunnel consists of three major components: BitFunnel – the text search/retrieval system itself WorkBench – a tool for preparing text for use in BitFunnel NativeJIT – a software component that takes expressions that use C data structures and transforms them into highly optimized assembly code == Algorithm == === Initial problem and solution overview === The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through inverted indexes, where each searchable item is maintained with a map of keywords. In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions. === Theoretical implementation of bit-string signatures === The signature of a document (D) can be described as the logical-or of its term signatures: S D → = ⋃ t ∈ D S t → {\displaystyle {\overrightarrow {S_{D}}}=\bigcup _{t\in D}{\overrightarrow {S_{t}}}} Similarly, a query for a document (Q) can be defined as a union: S Q → = ⋃ t ∈ Q S t → {\displaystyle {\overrightarrow {S_{Q}}}=\bigcup _{t\in Q}{\overrightarrow {S_{t}}}} Additionally, a document D is a member of the set M' when the following condition is satisfied: S Q → ∩ S D → = S Q → {\displaystyle {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}} This knowledge is then combined to produce a formula where M' is identified by documents which match the query signature: M ′ = { D ∈ C ∣ S Q → ∩ S D → = S Q → } {\displaystyle M'=\left\{D\in C\mid {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}\right\}} These steps and their proofs are discussed in the 2017 paper. === Pseudocode for bit-string signatures === This algorithm is described in the 2017 paper. M ′ = ∅ foreach D ∈ C do if S D → ∩ S Q → = S Q → then M ′ = M ′ ∪ { D } endif endfor {\displaystyle {\begin{array}{l}M'=\emptyset \\{\texttt {foreach}}\ D\in C\ {\texttt {do}}\\\qquad {\texttt {if}}\ {\overrightarrow {S_{D}}}\cap {\overrightarrow {S_{Q}}}={\overrightarrow {S_{Q}}}\ {\texttt {then}}\\\qquad \qquad M'=M'\cup \{D\}\\\qquad {\texttt {endif}}\\{\texttt {endfor}}\end{array}}}

    Read more →
  • Letter frequency

    Letter frequency

    Letter frequency is the number of times letters of the alphabet appear on average in written language. Letter frequency analysis dates back to the Arab mathematician Al-Kindi (c. AD 801–873), who formally developed the method to break ciphers. Letter frequency analysis gained importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform. Linguists use letter frequency analysis as a rudimentary technique for language identification, where it is particularly effective as an indication of whether an unknown writing system is alphabetic, syllabic, or logographic. The use of letter frequencies and frequency analysis plays a fundamental role in cryptograms and several word puzzle games, including hangman, Scrabble, Wordle and the television game show Wheel of Fortune. One of the earliest descriptions in classical literature of applying the knowledge of English letter frequency to solving a cryptogram is found in Edgar Allan Poe's famous story "The Gold-Bug", where the method is successfully applied to decipher a message giving the location of a treasure hidden by Captain Kidd. Herbert S. Zim, in his classic introductory cryptography text Codes and Secret Writing, gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKJXZQ", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC". Different ways of counting can produce somewhat different orders. Letter frequencies also have a strong effect on the design of some keyboard layouts. The most frequent letters are placed on the home row of the Blickensderfer typewriter, the Dvorak keyboard layout, Colemak and other optimized layouts, while the commonly used QWERTY layout places common letters apart from each other to prevent typewriter jamming. == Background == The frequency of letters in text has been studied for use in cryptanalysis, and frequency analysis in particular, dating back to the Arab mathematician al-Kindi (c. AD 801–873 ), who formally developed the method (the ciphers breakable by this technique go back at least to the Caesar cipher used by Julius Caesar, so this method could have been explored in classical times). Letter frequency analysis gained additional importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform, as evidenced by the variations in letter compartment size in typographer's type cases. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. However, most languages have a characteristic distribution which is strongly apparent in longer texts. Even language changes as extreme as from Old English to modern English (regarded as mutually unintelligible) show strong trends in related letter frequencies: over a small sample of Biblical passages, from most frequent to least frequent, enaid sorhm tgþlwu æcfy ðbpxz of Old English compares to eotha sinrd luymw fgcbp kvjqxz of modern English, with the most extreme differences concerning letterforms not shared. Linotype machines for the English language assumed the letter order, from most to least common, to be etaoin shrdlu cmfwyp vbgkqj xz based on the experience and custom of manual compositors. The equivalent for the French language was elaoin sdrétu cmfhyp vbgwqj xz. Arranging the alphabet in Morse into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order, yields e it san hurdm wgvlfbk opxcz jyq. Letter frequency was used by other telegraph systems, such as the Murray Code. Similar ideas are used in modern data-compression techniques such as Huffman coding. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. For instance, ⟨d⟩ occurs with greater frequency in fiction, as most fiction is written in past tense and thus most verbs will end in the inflectional suffix -ed / -d. One cannot write an essay about x-rays without using ⟨x⟩ frequently, and the essay will have an idiosyncratic letter frequency if the essay is about, say, Queen Zelda of Zanzibar requesting X-rays from Qatar to examine hypoxia in zebras. Different authors have habits which can be reflected in their use of letters. Hemingway's writing style, for example, is visibly different from Faulkner's. Letter, bigram, trigram, word frequencies, word length, and sentence length can be calculated for specific authors and used to prove or disprove authorship of texts, even for authors whose styles are not so divergent. Accurate average letter frequencies can only be gleaned by analyzing a large amount of representative text. With the availability of modern computing and collections of large text corpora, such calculations are easily made. Examples can be drawn from a variety of sources (press reporting, religious texts, scientific texts and general fiction) and there are differences especially for general fiction with the position of ⟨h⟩ and ⟨i⟩, with ⟨h⟩ becoming more common. Different dialects of a language will also affect a letter's frequency. For example, an author in the United States would produce something in which ⟨z⟩ is more common than an author in the United Kingdom writing on the same topic: words like "analyze", "apologize", and "recognize" contain the letter in American English, whereas the same words are spelled "analyse", "apologise", and "recognise" in British English. This would highly affect the frequency of the letter ⟨z⟩, as it is rarely used by British writers in the English language. The "top twelve" letters constitute about 80% of the total usage. The "top eight" letters constitute about 65% of the total usage. Letter frequency as a function of rank can be fitted well by several rank functions, with the two-parameter Cocho/Beta rank function being the best. Another rank function with no adjustable free parameter also fits the letter frequency distribution reasonably well (the same function has been used to fit the amino acid frequency in protein sequences.) A spy using the VIC cipher or some other cipher based on a straddling checkerboard typically uses a mnemonic such as "a sin to err" (dropping the second "r") or "at one sir" to remember the top eight characters. == Relative frequencies of letters in the English language == There are three ways to count letter frequency that result in very different charts for common letters. The first method, used in the chart below, is to count letter frequency in lemmas of a dictionary. The lemma is the word in its canonical form. The second method is to include all word variants when counting, such as "abstracts", "abstracted" and "abstracting" and not just the lemma of "abstract". This second method results in letters like ⟨s⟩ appearing much more frequently, such as when counting letters from lists of the most used English words on the Internet. ⟨s⟩ is especially common in inflected words (non-lemma forms) because it is added to form plurals and third person singular present tense verbs. A final method is to count letters based on their frequency of use in actual texts, resulting in certain letter combinations like ⟨th⟩ becoming more common due to the frequent use of common words like "the", "then", "both", "this", etc. Absolute usage frequency measures like this are used when creating keyboard layouts or letter frequencies in old fashioned printing presses. An analysis of entries in the Concise Oxford dictionary, ignoring frequency of word use, gives an order of "EARIOTNSLCUDPMHGBFYWKVXZJQ". The letter-frequency table above is taken from Pavel Mička's website, which cites Robert Lewand's Cryptological Mathematics. According to Lewand, arranged from most to least common in appearance, the letters are: etaoinshrdlcumwfgypbvkjxqz. Lewand's ordering differs slightly from others, such as Cornell University Math Explorer's Project, which produced a table after measuring 40,000 words. In English, the space character occurs almost twice as frequently as the top letter (⟨e⟩) and the non-alphabetic characters (digits, punctuation, etc.) collectively occupy the fourth position (having already included the space) between ⟨t⟩ and ⟨a⟩. == Relative frequencies of the first letters of a word in the English language == The frequency of the first letters of words or names is helpful in pre-assigning space in physical files and indexes. Given 26 filing cabinet drawers, rather than a 1:1 assignment of one drawer to one letter of the alphabet, it is often useful to use a more equal-frequency-letter code by assigning several low-frequency letters to the same drawer (often one drawer is labeled VWXYZ), and to split up the most-frequent initial letters (⟨s, a, c⟩) into several drawers (often 6 drawers Aa-An, Ao-Az, Ca-Cj, Ck-Cz, Sa-Si, Sj-Sz). The same system is used in some mult

    Read more →
  • Personal cloud

    Personal cloud

    A personal cloud is a collection of digital content and services that are accessible from any device through the Internet. It is not a tangible entity, but a place that gives users the ability to store, synchronize, stream and share content on a relative core, moving from one platform, screen and location to another. Created on connected services and applications, it reflects and sets consumer expectations for how next-generation computing services will work. The four primary types of personal cloud in use today are: Online cloud, NAS device cloud, server device cloud, and home-made clouds. == Online cloud == The online cloud is sometimes referred to as the public cloud. It is the cloud computing model where online resources like software and data storage are made available over the Internet. Typically, an individual or organization has little control over the ecosystem in which the online cloud is hosted, and the core infrastructure is shared between many individuals and organizations. The data and applications provided by the service provider are logically segregated so that only those authorized are allowed access. == NAS device cloud == A network-attached storage (NAS) device is a computer connected to a network that provides only file-based data storage services to other devices on the network. Although it may technically be possible to run other software on a NAS device, it is not designed to be a general purpose server. Cloud NAS is remote storage that is accessed over the Internet as if it were local. A cloud NAS is often used for backups and archiving. One of the benefits of NAS Cloud is that data in the cloud can be accessed at any time from anywhere. The main drawback, however, is that the speed of the transfer rate is only as fast as the network connection the data is accessed over and can therefore be fairly slow. == Server device cloud == In many ways cloud servers work in the same way as physical servers but the functions they perform can be very different. Typically, the cloud server is an on-premises device that is connected to the Internet and gives users the functions available on the online cloud but with the added benefit and security of the files being in their control on their premises. The server cloud has been historically enterprise-based deployed by businesses needing an in-house cloud. However, there are also in-house options available for individual users. == Home-made clouds == For the more technologically proficient user a common solution for using a personal cloud is to create a home-made cloud system by connecting an external USB hard drive to a Wi-Fi router. This enables both wired and wireless computers to access the USB hard drive and use it for storage or for retrieving files a user needs to share on the network thereby acting like a cloud. Setting up a personal cloud requires a user to have particular skills in technology and network setup. One of the risks associated with improper setup is security, and leaving the files accessible to anyone with technical knowledge. Not every router supports this type of access and modification.

    Read more →
  • Critical data studies

    Critical data studies

    Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data. Interest in this unique field of critical data studies began in 2011 with scholars danah boyd and Kate Crawford posing various questions for the critical study of big data and recognizing its potential threatening impacts on society and culture. It was not until 2014, and more exploration and conversations, that critical data studies was officially coined by scholars Craig Dalton and Jim Thatcher. They put a large emphasis on understanding the context of big data in order to approach it more critically. Researchers such as David Ribes, Robert Soden, Seyram Avle, Sarah E. Fox, and Phoebe Sengers focus on understanding data as a historical artifact and taking an interdisciplinary approach towards critical data studies. Other key scholars in this discipline include Rob Kitchin and Tracey P. Lauriault who focus on reevaluating data through different spheres. Various critical frameworks that can be applied to analyze big data include Feminist, Anti-Racist, Queer, Indigenous, Decolonial, Anti-Ableist, as well as Symbolic and Synthetic data science. These frameworks help to make sense of the data by addressing power, biases, privacy, consent, and underrepresentation or misrepresentation concerns that exist in data as well as how to approach and analyze this data with a more equitable mindset. == Motivation == In their article in which they coin the term 'critical data studies,' Dalton and Thatcher also provide several justifications as to why data studies is a discipline worthy of a critical approach. First, 'big data' is an important aspect of twenty-first century society, and the analysis of 'big data' allows for a deeper understanding of what is happening and for what reasons. Big data is important to critical data studies because it is the type of data used within this field. Big data does not necessarily refer to a large data set, it can have a data set with millions of rows, but also a data set that just has a wide variety and expansive scope of data with a smaller type of dataset. As well as having whole populations in the data set and not just sample sizes. Furthermore, big data as a technological tool and the information that it yields are not neutral, according to Dalton and Thatcher, making it worthy of critical analysis in order to identify and address its biases. Building off this idea, another justification for a critical approach is that the relationship between big data and society is an important one, and therefore worthy of study. Ribes et. al. argue there is a need for an interdisciplinary understanding of data as a historical artifact as a motivating aspect of critical data studies.The overarching consensus in the Computer-Supported Cooperative Work (CSCW) field, is that people should speak for the data, and not let the data speak for itself. The sources of big data and it’s relationship to varied metadata can be a complicated one, which leads to data disorder and a need for an ethical analysis. Additionally, Iliadis and Russo (2016) have called for studying data assemblages. This is to say, data has innate technological, political, social, and economic histories that should be taken into consideration. Kitchin argues data is almost never raw, and it is almost always cooked, meaning that it is always spoken for by the data scientists utilizing it. Thus, Big Data should be open to a variety of perspectives, especially those of cultural and philosophical nature. Further, data contains hidden histories, ideologies, and philosophies. Big data technology can cause significant changes in society's structure and in the everyday lives of people, and, being a product of society, big data technology is worthy of sociological investigation. Moreover, data sets are almost never completely without any influence. Rather, data are shaped by the vision or goals of those gathering the data, and during the data collection process, certain things are quantified, stored, sorted and even discarded by the research team. A critical approach is thus necessary in order to understand and reveal the intent behind the information being presented.One of these critical approaches has been through feminist data studies. This method applies feminist principles to critical studies and data collecting and analysis. The goal of this is to address the power imbalance in data science and society. According to Catherine D’Ignazio and Lauren F. Klein, a power analysis can be performed by examining power, challenging power, evaluating emotion and embodiment, rethinking binaries and hierarchies, embracing pluralism, considering context, and making labor visible. Feminist data studies is part of the movement towards making data to benefit everyone and not to increase existing inequalities. Moreover, data alone cannot speak for themselves; in order to possess any concrete meaning, data must be accompanied by theoretical insight or alternative quantitative or qualitative research measures. Based on different social topics such as anti-racist data studies, critical data studies give a focus on those social issues concerning data. Specifically in anti-racist data studies they use a classification approach to get representation for those within that community. Desmond Upton Patton and others used their own classification system in the communities of Chicago to help target and reduce violence with young teens on twitter. They had students in those communities help them to decipher the terminology and emojis of these teens to target the language used in tweets that followed with violence outside of the computer screens. This is just one real world example of critical data studies and its application. Dalton and Thatcher argue that if one were to only think of data in terms of its exploitative power, there is no possibility of using data for revolutionary, liberatory purposes. Finally, Dalton and Thatcher propose that a critical approach in studying data allows for 'big data' to be combined with older, 'small data,' and thus create more thorough research, opening up more opportunities, questions and topics to be explored. == Issues and concerns for critical data scholars == Data plays a pivotal role in the emerging knowledge economy, driving productivity, competitiveness, efficiency, sustainability, and capital accumulation. The ethical, political, and economic dimensions of data dynamically evolve across space and time, influenced by changing regimes, technologies, and priorities. Technically, the focus lies on handling, storing, and analyzing vast data sets, utilizing machine learning-based data mining and analytics. This technological advancement raises concerns about data quality, encompassing validity, reliability, authenticity, usability, and lineage. The use of data in modern society brings about new ways of understanding and measuring the world, but also brings with it certain concerns or issues. Data scholars attempt to bring some of these issues to light in their quest to be critical of data. Technical and organizational issues could include the scope of the data set, meaning there is too little or too much data to work with, leading to inaccurate results. It becomes crucial for critical data scholars to carefully consider the adequacy of data volume for their analyses. The quality of the data itself is another facet of concern. The data itself could be of poor quality, such as an incomplete or messy data set with missing or inaccurate data values. This would lead researchers to have to make edits and assumptions about the data itself. Addressing these issues often requires scholars to make edits and assumptions about the data to ensure its reliability and relevance. Data scientists could have improper access to the actual data set, limiting their abilities to analyze it. Linnet Taylor explains how gaps in data can arise when people of varying levels of power have certain rights to their data sources. These people in power can control what data is collected, how it is displayed and how it is analyzed. The capabilities of the research team also play a crucial role in the quality of data analytics. The research team may have inadequate skills or organizational capabilities which leads to the actual analytics performed on the dataset to be biased. This can also lead to ecological fallacies, meaning an assumption is made about an individual based on data or results from a larger group of people. These technical and organizational challenges highlight the complexity of working with data and

    Read more →
  • White-box cryptography

    White-box cryptography

    In cryptography, the white-box model refers to an extreme attack scenario, in which an adversary has full unrestricted access to a cryptographic implementation, most commonly of a block cipher such as the Advanced Encryption Standard (AES). A variety of security goals may be posed (see the section below), the most fundamental being "unbreakability", requiring that any (bounded) attacker should not be able to extract the secret key hardcoded in the implementation, while at the same time the implementation must be fully functional. In contrast, the black-box model only provides an oracle access to the analyzed cryptographic primitive (in the form of encryption and/or decryption queries). There is also a model in-between, the so-called gray-box model, which corresponds to additional information leakage from the implementation, more commonly referred to as side-channel leakage. White-box cryptography is a practice and study of techniques for designing and attacking white-box implementations. It has many applications, including digital rights management (DRM), pay television, protection of cryptographic keys in the presence of malware, mobile payments and cryptocurrency wallets. Examples of DRM systems employing white-box implementations include CSS and Widevine. White-box cryptography is closely related to the more general notions of obfuscation, in particular, to Black-box obfuscation, proven to be impossible, and to Indistinguishability obfuscation, constructed recently under well-founded assumptions but so far being infeasible to implement in practice. As of January 2023, there are no publicly known unbroken white-box designs of standard symmetric encryption schemes. On the other hand, there exist many unbroken white-box implementations of dedicated block ciphers designed specifically to achieve incompressibility (see § Security goals). == Security goals == Depending on the application, different security goals may be required from a white-box implementation. Specifically, for symmetric-key algorithms the following are distinguished: Unbreakability is the most fundamental goal requiring that a bounded attacker should not be able to recover the secret key embedded in the white-box implementation. Without this requirement, all other security goals are unreachable since a successful attacker can simply use a reference implementation of the encryption scheme together with the extracted key. One-wayness requires that a white-box implementation of an encryption scheme can not be used by a bounded attacker to decrypt ciphertexts. This requirement essentially turns a symmetric encryption scheme into a public-key encryption scheme, where the white-box implementation plays the role of the public key associated to the embedded secret key. This idea was proposed already in the famous work of Diffie and Hellman in 1976 as a potential public-key encryption candidate. Code lifting security is an informal requirement on the context, in which the white-box program is being executed. It demands that an attacker can not extract a functional copy of the program. This goal is particularly relevant in the DRM setting. Code obfuscation techniques are often used to achieve this goal. A commonly used technique is to compose the white-box implementation with so-called external encodings. These are lightweight secret encodings that modify the function computed by the white-box part of an application. It is required that their effect is canceled in other parts of the application in an obscure way, using code obfuscation techniques. Alternatively, the canceling counterparts can be applied on a remote server. Incompressibility requires that an attacker can not significantly compress a given white-box implementation. This can be seen as a way to achieve code lifting security (see above), since exfiltrating a large program from a constrained device (for example, an embedded or a mobile device) can be time-consuming and may be easy to detect by a firewall. Examples of incompressible designs include SPACE cipher, SPNbox, WhiteKey and WhiteBlock. These ciphers use large lookup tables that can be pseudorandomly generated from a secret master key. Although this makes the recovery of the master key hard, the lookup tables themselves play the role of an equivalent secret key. Thus, unbreakability is achieved only partially. Traceability (Traitor tracing) requires that each distributed white-box implementation contains a digital watermark allowing identification of the guilty user in case the white-box program is being leaked and distributed publicly. == History == The white-box model with initial attempts of white-box DES and AES implementations were first proposed by Chow, Eisen, Johnson and van Oorshot in 2003. The designs were based on representing the cipher as a network of lookup tables and obfuscating the tables by composing them with small (4- or 8-bit) random encodings. Such protection satisfied a property that each single obfuscated table individually does not contain any information about the secret key. Therefore, a potential attacker has to combine several tables in their analysis. The first two schemes were broken in 2004 by Billet, Gilbert, and Ech-Chatbi using structural cryptanalysis. The attack was subsequently called "the BGE attack". The numerous consequent design attempts (2005-2022) were quickly broken by practical dedicated attacks. In 2016, Bos, Hubain, Michiels and Teuwen showed that an adaptation of standard side-channel power analysis attacks can be used to efficiently and fully automatically break most existing white-box designs. This result created a new research direction about generic attacks (correlation-based, algebraic, fault injection) and protections against them. == Competitions == Four editions of the WhibOx contest were held in 2017, 2019, 2021 and 2024 respectively. These competitions invited white-box designers both from academia and industry to submit their implementation in the form of (possibly obfuscated) C code. At the same time, everyone could attempt to attack these programs and recover the embedded secret key. Each of these competitions lasted for about 4-5 months. WhibOx 2017 / CHES 2017 Capture the Flag Challenge targeted the standard AES block cipher. Among 94 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for 28 days. WhibOx 2019 / CHES 2019 Capture the Flag Challenge again targeted the AES block cipher. Among 27 submitted implementations, 3 programs stayed unbroken throughout the competition, but were broken after 51 days since the publication. WhibOx 2021 / CHES 2021 Capture the Flag Challenge changed the target to ECDSA, a digital signature scheme based on elliptic curves. Among 97 submitted implementations, all were broken within at most 2 days. WhibOx 2024 / CHES 2024 Capture the Flag Challenge again targeted ECDSA. Among 47 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for almost 5 days.

    Read more →